MCQs Generator

MCQs Generator - Fixed Responsive Header
Home » Directory » 1000 Data Analysis MCQ » 50 Regression Analysis in Data Analysis MCQs

50 Regression Analysis in Data Analysis MCQs

These 50 MCQs covers fundamental concepts in regression analysis, including linear and multiple regression, assumptions, diagnostics, and interpretation. Ideal for students and professionals in data analysis to test understanding of predictive modeling techniques.

1. What does linear regression model the relationship between?

a) Two categorical variables
b) A dependent variable and one or more independent variables
c) Only independent variables
d) Time series only
✅ Correct Answer: b) A dependent variable and one or more independent variables
📝 Explanation:
Linear regression predicts a continuous outcome (Y) as a linear function of predictors (X).

2. The slope coefficient in simple linear regression represents

a) The intercept
b) Change in Y for a one-unit change in X
c) The correlation
d) The error term
✅ Correct Answer: b) Change in Y for a one-unit change in X
📝 Explanation:
β1 = ΔY / ΔX, holding other factors constant.

3. R-squared measures

a) Total variance
b) Proportion of variance in Y explained by X
c) Residual sum of squares
d) Standard error
✅ Correct Answer: b) Proportion of variance in Y explained by X
📝 Explanation:
Ranges from 0 to 1; higher values indicate better fit.

4. The assumption of linearity in regression means

a) Constant variance
b) Relationship between X and Y is linear
c) No multicollinearity
d) Independent errors
✅ Correct Answer: b) Relationship between X and Y is linear
📝 Explanation:
Verified via scatter plots or residual plots.

5. Homoscedasticity refers to

a) Constant variance of residuals
b) Normal distribution of errors
c) No autocorrelation
d) Linear trend
✅ Correct Answer: a) Constant variance of residuals
📝 Explanation:
Tested with Breusch-Pagan; violations suggest heteroscedasticity.

6. In multiple regression, multicollinearity is detected using

a) VIF (Variance Inflation Factor)
b) R-squared
c) F-test
d) t-test
✅ Correct Answer: a) VIF (Variance Inflation Factor)
📝 Explanation:
VIF > 5-10 indicates high collinearity among predictors.

7. The intercept in regression is

a) Expected Y when all X=0
b) Slope
c) Error
d) Correlation
✅ Correct Answer: a) Expected Y when all X=0
📝 Explanation:
β0; may lack interpretation if X=0 is outside range.

8. Residuals are

a) Predicted values
b) Observed minus predicted values
c) Independent variables
d) Coefficients
✅ Correct Answer: b) Observed minus predicted values
📝 Explanation:
Used for diagnostics; should be randomly distributed.

9. The F-test in regression tests

a) Overall model significance
b) Individual coefficients
c) Linearity
d) Normality
✅ Correct Answer: a) Overall model significance
📝 Explanation:
H0: all β=0; low p-value indicates model explains variance.

10. t-test for coefficients tests

a) H0: β=0
b) H0: R²=0
c) Intercept only
d) Variance equality
✅ Correct Answer: a) H0: β=0
📝 Explanation:
Significance of individual predictors.

11. Adjusted R-squared accounts for

a) Number of predictors
b) Sample size
c) Both a and b
d) Residuals
✅ Correct Answer: c) Both a and b
📝 Explanation:
Penalizes adding irrelevant variables; better for model comparison.

12. Outliers in regression can be detected using

a) Leverage and Cook's distance
b) R-squared
c) Slope
d) Intercept
✅ Correct Answer: a) Leverage and Cook's distance
📝 Explanation:
High values indicate influential points affecting fit.

13. The standard error of the estimate is

a) Root mean squared error
b) Average residual
c) Variance of Y
d) Correlation
✅ Correct Answer: a) Root mean squared error
📝 Explanation:
Measures prediction accuracy; √(SSE/(n-k-1)).

14. Autocorrelation in residuals is tested with

a) Durbin-Watson test
b) Breusch-Pagan
c) Shapiro-Wilk
d) VIF
✅ Correct Answer: a) Durbin-Watson test
📝 Explanation:
Values near 2 indicate no serial correlation; common in time series.

15. Logistic regression is used for

a) Continuous outcomes
b) Binary or categorical outcomes
c) Time series
d) Clustering
✅ Correct Answer: b) Binary or categorical outcomes
📝 Explanation:
Models log-odds; uses sigmoid function.

16. In ridge regression, the penalty is on

a) Sum of squared coefficients
b) Absolute coefficients
c) Interactions
d) Residuals
✅ Correct Answer: a) Sum of squared coefficients
📝 Explanation:
L2 regularization; reduces multicollinearity.

17. Lasso regression uses

a) L1 penalty
b) L2 penalty
c) No penalty
d) Quadratic penalty
✅ Correct Answer: a) L1 penalty
📝 Explanation:
Sum of absolute coefficients; performs variable selection.

18. Polynomial regression extends linear by

a) Adding higher powers of X
b) Multiple X variables
c) Interactions
d) Log transformation
✅ Correct Answer: a) Adding higher powers of X
📝 Explanation:
Captures non-linear relationships; risks overfitting.

19. The coefficient of determination is

a) R-squared
b) Correlation coefficient
c) Standard error
d) F-statistic
✅ Correct Answer: a) R-squared
📝 Explanation:
1 - (SS_res / SS_tot).

20. Normality of residuals is tested with

a) Q-Q plot or Shapiro-Wilk
b) Scatter plot
c) Histogram of X
d) VIF plot
✅ Correct Answer: a) Q-Q plot or Shapiro-Wilk
📝 Explanation:
Assumption for inference; affects confidence intervals.

21. In OLS regression, the goal is to minimize

a) Sum of squared residuals
b) Absolute residuals
c) Maximum residual
d) Variance
✅ Correct Answer: a) Sum of squared residuals
📝 Explanation:
Least squares method; unbiased under assumptions.

22. Heteroscedasticity can be addressed by

a) Weighted least squares
b) Increasing sample size
c) Both a and b
d) Adding more variables
✅ Correct Answer: c) Both a and b
📝 Explanation:
Or transformations like log(Y).

23. The Durbin-Watson statistic ranges from

a) 0 to 2
b) 0 to 4
c) -2 to 2
d) 1 to 3
✅ Correct Answer: b) 0 to 4
📝 Explanation:
Around 2 is ideal; mean; adds dispersion parameter.

24. The Mallow's Cp selects models by

a) Bias + variance estimate
b) R-squared
c) AIC
d) BIC
✅ Correct Answer: a) Bias + variance estimate
📝 Explanation:
Cp ≈ p; for subset selection.

25. In Cox proportional hazards, the assumption is

a) Constant hazard ratio over time
b) Linear time
c) Normal errors
d) Equal variances
✅ Correct Answer: a) Constant hazard ratio over time
📝 Explanation:
Tested with Schoenfeld residuals.

26. The RMSPE is

a) Root mean squared percentage error
b) Residual mean square
c) Percentage R-squared
d) Error variance
✅ Correct Answer: a) Root mean squared percentage error
📝 Explanation:
Scale-free accuracy measure.

27. Backward elimination starts with

a) All variables, removes insignificant
b) None, adds
c) Random
d) Principal components
✅ Correct Answer: a) All variables, removes insignificant
📝 Explanation:
Stepwise; based on p-values.

28. The studentized residual is

a) Residual divided by its SE
b) Leverage-adjusted
c) Both
d) Raw residual
✅ Correct Answer: a) Residual divided by its SE
📝 Explanation:
|e| > 3 flags outliers.

29. In probit regression, the link is

a) Cumulative normal
b) Logit
c) Log
d) Identity
✅ Correct Answer: a) Cumulative normal
📝 Explanation:
For binary; inverse Φ(βX) = P(Y=1).

30. The Vuong test compares

a) Non-nested GLMs
b) Nested
c) Variances
d) Means
✅ Correct Answer: a) Non-nested GLMs
📝 Explanation:
Likelihood-based model selection.

31. Partial regression plots show

a) Marginal effect of X controlling others
b) Total effect
c) Interaction
d) Residual vs fitted
✅ Correct Answer: a) Marginal effect of X controlling others
📝 Explanation:
Residuals of Y on other X vs residuals of this X.

32. The mean squared prediction error is

a) Average (Y - Ŷ)^2 on new data
b) Training MSE
c) R-squared
d) Adjusted R²
✅ Correct Answer: a) Average (Y - Ŷ)^2 on new data
📝 Explanation:
For out-of-sample performance.

33. In ARIMA regression, it models

a) Time-varying errors
b) Static
c) Spatial
d) Categorical
✅ Correct Answer: a) Time-varying errors
📝 Explanation:
Auto-regressive integrated moving average.

34. The concordance correlation coefficient measures

a) Agreement beyond correlation
b) Bias
c) Precision
d) Both b and c
✅ Correct Answer: a) Agreement beyond correlation
📝 Explanation:
ρ_c = ρ / √(1 + bias).

35. Kernel regression is a

a) Non-parametric method
b) Parametric
c) Linear
d) Stepwise
✅ Correct Answer: a) Non-parametric method
📝 Explanation:
Nadaraya-Watson; local weighting.

36. The Ramsey RESET test detects

a) Functional form misspecification
b) Multicollinearity
c) Heteroscedasticity
d) Autocorrelation
✅ Correct Answer: a) Functional form misspecification
📝 Explanation:
Adds powers of fitted values.

37. In survival regression, the baseline hazard is

a) Non-parametric in Cox
b) Parametric
c) Linear
d) Constant
✅ Correct Answer: a) Non-parametric in Cox
📝 Explanation:
h(t|X=0); estimated via Breslow.

38. The MAPE is sensitive to

a) Zero values
b) Outliers
c) Scale
d) Normality
✅ Correct Answer: a) Zero values
📝 Explanation:
Mean absolute percentage error; undefined if Y=0.

39. Best subset selection evaluates

a) All possible subsets
b) Forward only
c) Backward
d) Stepwise
✅ Correct Answer: a) All possible subsets
📝 Explanation:
2^p models; computationally intensive.

40. The externally studentized residual excludes

a) The point itself in SE calculation
b) All points
c) Leverage
d) Intercept
✅ Correct Answer: a) The point itself in SE calculation
📝 Explanation:
For detecting outliers influencing fit.

41. Ordered logit is for

a) Ordinal outcomes
b) Nominal
c) Binary
d) Continuous
✅ Correct Answer: a) Ordinal outcomes
📝 Explanation:
Cumulative logits; proportional odds assumption.

42. The score test in GLM is

a) Efficient under H0
b) Wald
c) LR
d) All same
✅ Correct Answer: a) Efficient under H0
📝 Explanation:
Gradient of log-likelihood; no full fit needed.

43. In spatial regression, SAR models

a) Spatial autocorrelation in errors
b) Lag dependence
c) Both
d) No spatial
✅ Correct Answer: c) Both
📝 Explanation:
Spatial autoregressive; Wy = ρWy + Xβ + ε.

44. The Theil's U statistic compares forecasts to

a) Naive method
b) OLS
c) Ridge
d) Lasso
✅ Correct Answer: a) Naive method
📝 Explanation:
U < 1 better than no-change forecast.

45. LOESS regression uses

a) Local polynomials
b) Global linear
c) Splines
d) Kernels only
✅ Correct Answer: a) Local polynomials
📝 Explanation:
Locally estimated scatterplot smoothing.

46. The Link test in regression checks

a) Omitted variables or specification
b) Multicollinearity
c) Heteroscedasticity
d) Normality
✅ Correct Answer: a) Omitted variables or specification
📝 Explanation:
Regress fitted and squared fitted; significant squared indicates misspec.

47. In accelerated failure time models, it assumes

a) Log-linear effect on time
b) Proportional hazards
c) Weibull distribution
d) Exponential
✅ Correct Answer: a) Log-linear effect on time
📝 Explanation:
log(T) = βX + σW; parametric survival.

48. The SMAPE averages

a) Symmetric absolute percentage errors
b) Squared errors
c) Relative errors
d) Forecast errors
✅ Correct Answer: a) Symmetric absolute percentage errors
📝 Explanation:
Handles zero issues in MAPE.

49. All subset regression is exhaustive but

a) Computationally expensive
b) Fast
c) Approximate
d) Stepwise
✅ Correct Answer: a) Computationally expensive
📝 Explanation:
For p>20, use branch and bound.

50. The deleted residual is

a) Prediction error leaving out i
b) Internal
c) Studentized
d) Leverage
✅ Correct Answer: a) Prediction error leaving out i
📝 Explanation:
PRESS component.

51. Multinomial logit for

a) Nominal outcomes >2 categories
b) Ordinal
c) Binary
d) Continuous
✅ Correct Answer: a) Nominal outcomes >2 categories
📝 Explanation:
One vs all; IIA assumption.

52. The Wald test is

a) Asymptotic for large samples
b) Exact
c) Small sample
d) Non-parametric
✅ Correct Answer: a) Asymptotic for large samples
📝 Explanation:
(β-hat / SE)^2 ~ χ².

53. In SEM, path analysis is

a) Observed variables regression
b) Latent only
c) CFA
d) EFA
✅ Correct Answer: a) Observed variables regression
📝 Explanation:
Special case of SEM without latents.

54. The MASE normalizes errors by

a) In-sample naive forecast
b) Scale
c) Variance
d) SD
✅ Correct Answer: a) In-sample naive forecast
📝 Explanation:
Mean absolute scaled error; scale-free.

55. Smoothing splines minimize

a) RSS + smoothness penalty
b) Only RSS
c) Knots
d) Degrees
✅ Correct Answer: a) RSS + smoothness penalty
📝 Explanation:
λ tunes fit vs smoothness.

56. The CUSUM test detects

a) Structural breaks
b) Autocorrelation
c) Heteroscedasticity
d) Normality
✅ Correct Answer: a) Structural breaks
📝 Explanation:
Cumulative sum of residuals.

57. In frailty models, it accounts for

a) Unobserved heterogeneity
b) Observed covariates
c) Time
d) Censoring
✅ Correct Answer: a) Unobserved heterogeneity
📝 Explanation:
Random effects in survival.
Previous: 50 Hypothesis Testing in Data Analysis - MCQs
Next: 60 Important Correlation and Covariance MCQs
NewDescriptive, Inferential, and Time Series Statistics in Data Analysis - MCQs

100 Descriptive, Inferential, and Time Series Statistics in Data Analysis - MCQs

100 challenging multiple-choice questions on descriptive statistics, inferential methods, and time series analysis. Inspired by real data science and analytics…

By MCQs Generator
NewHypothesis Testing in Data Analysis

50 Hypothesis Testing in Data Analysis - MCQs

This set of 50 MCQs explores key concepts in hypothesis testing, including null and alternative hypotheses, p-values, test statistics, error…

By MCQs Generator
NewExploratory Data Analysis (EDA) MCQs

130 Exploratory Data Analysis (EDA) MCQs

MCQs cover the fundamentals of Exploratory Data Analysis, covering data summarization, visualization techniques, handling anomalies, and inferring patterns from datasets.…

By MCQs Generator

Detailed Explanation ×

Loading usage info...

Generating comprehensive explanation...