50 Regression Analysis in Data Analysis MCQs

Category: 1000 Data Analysis MCQDate: Published: November 8, 2025Posted by: MCQs Generator

These 50 MCQs covers fundamental concepts in regression analysis, including linear and multiple regression, assumptions, diagnostics, and interpretation. Ideal for students and professionals in data analysis to test understanding of predictive modeling techniques.

1. What does linear regression model the relationship between?

a) Two categorical variables

b) A dependent variable and one or more independent variables

c) Only independent variables

d) Time series only

✅ Correct Answer: b) A dependent variable and one or more independent variables

📝 Explanation:

Linear regression predicts a continuous outcome (Y) as a linear function of predictors (X).

2. The slope coefficient in simple linear regression represents

a) The intercept

b) Change in Y for a one-unit change in X

c) The correlation

d) The error term

✅ Correct Answer: b) Change in Y for a one-unit change in X

📝 Explanation:

β1 = ΔY / ΔX, holding other factors constant.

3. R-squared measures

a) Total variance

b) Proportion of variance in Y explained by X

c) Residual sum of squares

d) Standard error

✅ Correct Answer: b) Proportion of variance in Y explained by X

📝 Explanation:

Ranges from 0 to 1; higher values indicate better fit.

4. The assumption of linearity in regression means

a) Constant variance

b) Relationship between X and Y is linear

c) No multicollinearity

d) Independent errors

✅ Correct Answer: b) Relationship between X and Y is linear

📝 Explanation:

Verified via scatter plots or residual plots.

5. Homoscedasticity refers to

a) Constant variance of residuals

b) Normal distribution of errors

c) No autocorrelation

d) Linear trend

✅ Correct Answer: a) Constant variance of residuals

📝 Explanation:

Tested with Breusch-Pagan; violations suggest heteroscedasticity.

6. In multiple regression, multicollinearity is detected using

a) VIF (Variance Inflation Factor)

b) R-squared

c) F-test

d) t-test

✅ Correct Answer: a) VIF (Variance Inflation Factor)

📝 Explanation:

VIF > 5-10 indicates high collinearity among predictors.

7. The intercept in regression is

a) Expected Y when all X=0

b) Slope

c) Error

d) Correlation

✅ Correct Answer: a) Expected Y when all X=0

📝 Explanation:

β0; may lack interpretation if X=0 is outside range.

8. Residuals are

a) Predicted values

b) Observed minus predicted values

c) Independent variables

d) Coefficients

✅ Correct Answer: b) Observed minus predicted values

📝 Explanation:

Used for diagnostics; should be randomly distributed.

9. The F-test in regression tests

a) Overall model significance

b) Individual coefficients

c) Linearity

d) Normality

✅ Correct Answer: a) Overall model significance

📝 Explanation:

H0: all β=0; low p-value indicates model explains variance.

10. t-test for coefficients tests

a) H0: β=0

b) H0: R²=0

c) Intercept only

d) Variance equality

✅ Correct Answer: a) H0: β=0

📝 Explanation:

Significance of individual predictors.

11. Adjusted R-squared accounts for

a) Number of predictors

b) Sample size

c) Both a and b

d) Residuals

✅ Correct Answer: c) Both a and b

📝 Explanation:

Penalizes adding irrelevant variables; better for model comparison.

12. Outliers in regression can be detected using

a) Leverage and Cook's distance

b) R-squared

c) Slope

d) Intercept

✅ Correct Answer: a) Leverage and Cook's distance

📝 Explanation:

High values indicate influential points affecting fit.

13. The standard error of the estimate is

a) Root mean squared error

b) Average residual

c) Variance of Y

d) Correlation

✅ Correct Answer: a) Root mean squared error

📝 Explanation:

Measures prediction accuracy; √(SSE/(n-k-1)).

14. Autocorrelation in residuals is tested with

a) Durbin-Watson test

b) Breusch-Pagan

c) Shapiro-Wilk

d) VIF

✅ Correct Answer: a) Durbin-Watson test

📝 Explanation:

Values near 2 indicate no serial correlation; common in time series.

15. Logistic regression is used for

a) Continuous outcomes

b) Binary or categorical outcomes

c) Time series

d) Clustering

✅ Correct Answer: b) Binary or categorical outcomes

📝 Explanation:

Models log-odds; uses sigmoid function.

16. In ridge regression, the penalty is on

a) Sum of squared coefficients

b) Absolute coefficients

c) Interactions

d) Residuals

✅ Correct Answer: a) Sum of squared coefficients

📝 Explanation:

L2 regularization; reduces multicollinearity.

17. Lasso regression uses

a) L1 penalty

b) L2 penalty

c) No penalty

d) Quadratic penalty

✅ Correct Answer: a) L1 penalty

📝 Explanation:

Sum of absolute coefficients; performs variable selection.

18. Polynomial regression extends linear by

a) Adding higher powers of X

b) Multiple X variables

c) Interactions

d) Log transformation

✅ Correct Answer: a) Adding higher powers of X

📝 Explanation:

Captures non-linear relationships; risks overfitting.

19. The coefficient of determination is

a) R-squared

b) Correlation coefficient

c) Standard error

d) F-statistic

✅ Correct Answer: a) R-squared

📝 Explanation:

1 - (SS_res / SS_tot).

20. Normality of residuals is tested with

a) Q-Q plot or Shapiro-Wilk

b) Scatter plot

c) Histogram of X

d) VIF plot

✅ Correct Answer: a) Q-Q plot or Shapiro-Wilk

📝 Explanation:

Assumption for inference; affects confidence intervals.

21. In OLS regression, the goal is to minimize

a) Sum of squared residuals

b) Absolute residuals

c) Maximum residual

d) Variance

✅ Correct Answer: a) Sum of squared residuals

📝 Explanation:

Least squares method; unbiased under assumptions.

22. Heteroscedasticity can be addressed by

a) Weighted least squares

b) Increasing sample size

c) Both a and b

d) Adding more variables

✅ Correct Answer: c) Both a and b

📝 Explanation:

Or transformations like log(Y).

23. The Durbin-Watson statistic ranges from

a) 0 to 2

b) 0 to 4

c) -2 to 2

d) 1 to 3

✅ Correct Answer: b) 0 to 4

📝 Explanation:

Around 2 is ideal; mean; adds dispersion parameter.

24. The Mallow's Cp selects models by

a) Bias + variance estimate

b) R-squared

c) AIC

d) BIC

✅ Correct Answer: a) Bias + variance estimate

📝 Explanation:

Cp ≈ p; for subset selection.

25. In Cox proportional hazards, the assumption is

a) Constant hazard ratio over time

b) Linear time

c) Normal errors

d) Equal variances

✅ Correct Answer: a) Constant hazard ratio over time

📝 Explanation:

Tested with Schoenfeld residuals.

26. The RMSPE is

a) Root mean squared percentage error

b) Residual mean square

c) Percentage R-squared

d) Error variance

✅ Correct Answer: a) Root mean squared percentage error

📝 Explanation:

Scale-free accuracy measure.

27. Backward elimination starts with

a) All variables, removes insignificant

b) None, adds

c) Random

d) Principal components

✅ Correct Answer: a) All variables, removes insignificant

📝 Explanation:

Stepwise; based on p-values.

28. The studentized residual is

a) Residual divided by its SE

b) Leverage-adjusted

c) Both

d) Raw residual

✅ Correct Answer: a) Residual divided by its SE

📝 Explanation:

|e| > 3 flags outliers.

29. In probit regression, the link is

a) Cumulative normal

b) Logit

c) Log

d) Identity

✅ Correct Answer: a) Cumulative normal

📝 Explanation:

For binary; inverse Φ(βX) = P(Y=1).

30. The Vuong test compares

a) Non-nested GLMs

b) Nested

c) Variances

d) Means

✅ Correct Answer: a) Non-nested GLMs

📝 Explanation:

Likelihood-based model selection.

31. Partial regression plots show

a) Marginal effect of X controlling others

b) Total effect

c) Interaction

d) Residual vs fitted

✅ Correct Answer: a) Marginal effect of X controlling others

📝 Explanation:

Residuals of Y on other X vs residuals of this X.

32. The mean squared prediction error is

a) Average (Y - Ŷ)^2 on new data

b) Training MSE

c) R-squared

d) Adjusted R²

✅ Correct Answer: a) Average (Y - Ŷ)^2 on new data

📝 Explanation:

For out-of-sample performance.

33. In ARIMA regression, it models

a) Time-varying errors

b) Static

c) Spatial

d) Categorical

✅ Correct Answer: a) Time-varying errors

📝 Explanation:

Auto-regressive integrated moving average.

34. The concordance correlation coefficient measures

a) Agreement beyond correlation

b) Bias

c) Precision

d) Both b and c

✅ Correct Answer: a) Agreement beyond correlation

📝 Explanation:

ρ_c = ρ / √(1 + bias).

35. Kernel regression is a

a) Non-parametric method

b) Parametric

c) Linear

d) Stepwise

✅ Correct Answer: a) Non-parametric method

📝 Explanation:

Nadaraya-Watson; local weighting.

36. The Ramsey RESET test detects

a) Functional form misspecification

b) Multicollinearity

c) Heteroscedasticity

d) Autocorrelation

✅ Correct Answer: a) Functional form misspecification

📝 Explanation:

Adds powers of fitted values.

37. In survival regression, the baseline hazard is

a) Non-parametric in Cox

b) Parametric

c) Linear

d) Constant

✅ Correct Answer: a) Non-parametric in Cox

📝 Explanation:

h(t|X=0); estimated via Breslow.

38. The MAPE is sensitive to

a) Zero values

b) Outliers

c) Scale

d) Normality

✅ Correct Answer: a) Zero values

📝 Explanation:

Mean absolute percentage error; undefined if Y=0.

39. Best subset selection evaluates

a) All possible subsets

b) Forward only

c) Backward

d) Stepwise

✅ Correct Answer: a) All possible subsets

📝 Explanation:

2^p models; computationally intensive.

40. The externally studentized residual excludes

a) The point itself in SE calculation

b) All points

c) Leverage

d) Intercept

✅ Correct Answer: a) The point itself in SE calculation

📝 Explanation:

For detecting outliers influencing fit.

41. Ordered logit is for

a) Ordinal outcomes

b) Nominal

c) Binary

d) Continuous

✅ Correct Answer: a) Ordinal outcomes

📝 Explanation:

Cumulative logits; proportional odds assumption.

42. The score test in GLM is

a) Efficient under H0

b) Wald

c) LR

d) All same

✅ Correct Answer: a) Efficient under H0

📝 Explanation:

Gradient of log-likelihood; no full fit needed.

43. In spatial regression, SAR models

a) Spatial autocorrelation in errors

b) Lag dependence

c) Both

d) No spatial

✅ Correct Answer: c) Both

📝 Explanation:

Spatial autoregressive; Wy = ρWy + Xβ + ε.

44. The Theil's U statistic compares forecasts to

a) Naive method

b) OLS

c) Ridge

d) Lasso

✅ Correct Answer: a) Naive method

📝 Explanation:

U < 1 better than no-change forecast.

45. LOESS regression uses

a) Local polynomials

b) Global linear

c) Splines

d) Kernels only

✅ Correct Answer: a) Local polynomials

📝 Explanation:

Locally estimated scatterplot smoothing.

46. The Link test in regression checks

a) Omitted variables or specification

b) Multicollinearity

c) Heteroscedasticity

d) Normality

✅ Correct Answer: a) Omitted variables or specification

📝 Explanation:

Regress fitted and squared fitted; significant squared indicates misspec.

47. In accelerated failure time models, it assumes

a) Log-linear effect on time

b) Proportional hazards

c) Weibull distribution

d) Exponential

✅ Correct Answer: a) Log-linear effect on time

📝 Explanation:

log(T) = βX + σW; parametric survival.

48. The SMAPE averages

a) Symmetric absolute percentage errors

b) Squared errors

c) Relative errors

d) Forecast errors

✅ Correct Answer: a) Symmetric absolute percentage errors

📝 Explanation:

Handles zero issues in MAPE.

49. All subset regression is exhaustive but

a) Computationally expensive

b) Fast

c) Approximate

d) Stepwise

✅ Correct Answer: a) Computationally expensive

📝 Explanation:

For p>20, use branch and bound.

50. The deleted residual is

a) Prediction error leaving out i

b) Internal

c) Studentized

d) Leverage

✅ Correct Answer: a) Prediction error leaving out i

📝 Explanation:

PRESS component.

51. Multinomial logit for

a) Nominal outcomes >2 categories

b) Ordinal

c) Binary

d) Continuous

✅ Correct Answer: a) Nominal outcomes >2 categories

📝 Explanation:

One vs all; IIA assumption.

52. The Wald test is

a) Asymptotic for large samples

b) Exact

c) Small sample

d) Non-parametric

✅ Correct Answer: a) Asymptotic for large samples

📝 Explanation:

(β-hat / SE)^2 ~ χ².

53. In SEM, path analysis is

a) Observed variables regression

b) Latent only

c) CFA

d) EFA

✅ Correct Answer: a) Observed variables regression

📝 Explanation:

Special case of SEM without latents.

54. The MASE normalizes errors by

a) In-sample naive forecast

b) Scale

c) Variance

d) SD

✅ Correct Answer: a) In-sample naive forecast

📝 Explanation:

Mean absolute scaled error; scale-free.

55. Smoothing splines minimize

a) RSS + smoothness penalty

b) Only RSS

c) Knots

d) Degrees

✅ Correct Answer: a) RSS + smoothness penalty

📝 Explanation:

λ tunes fit vs smoothness.

56. The CUSUM test detects

a) Structural breaks

b) Autocorrelation

c) Heteroscedasticity

d) Normality

✅ Correct Answer: a) Structural breaks

📝 Explanation:

Cumulative sum of residuals.

57. In frailty models, it accounts for

a) Unobserved heterogeneity

b) Observed covariates

c) Time

d) Censoring

✅ Correct Answer: a) Unobserved heterogeneity

📝 Explanation:

Random effects in survival.

New

100 Descriptive, Inferential, and Time Series Statistics in Data Analysis - MCQs

100 challenging multiple-choice questions on descriptive statistics, inferential methods, and time series analysis. Inspired by real data science and analytics…

November 8, 2025

By MCQs Generator

New

50 Hypothesis Testing in Data Analysis - MCQs

This set of 50 MCQs explores key concepts in hypothesis testing, including null and alternative hypotheses, p-values, test statistics, error…

November 8, 2025

By MCQs Generator

New

130 Exploratory Data Analysis (EDA) MCQs

MCQs cover the fundamentals of Exploratory Data Analysis, covering data summarization, visualization techniques, handling anomalies, and inferring patterns from datasets.…

November 8, 2025

By MCQs Generator

50 Regression Analysis in Data Analysis MCQs

1. What does linear regression model the relationship between?

2. The slope coefficient in simple linear regression represents

3. R-squared measures

4. The assumption of linearity in regression means

5. Homoscedasticity refers to

6. In multiple regression, multicollinearity is detected using

7. The intercept in regression is

8. Residuals are

9. The F-test in regression tests

10. t-test for coefficients tests

11. Adjusted R-squared accounts for

12. Outliers in regression can be detected using

13. The standard error of the estimate is

14. Autocorrelation in residuals is tested with

15. Logistic regression is used for

16. In ridge regression, the penalty is on

17. Lasso regression uses

18. Polynomial regression extends linear by

19. The coefficient of determination is

20. Normality of residuals is tested with

21. In OLS regression, the goal is to minimize

22. Heteroscedasticity can be addressed by

23. The Durbin-Watson statistic ranges from

24. The Mallow's Cp selects models by

25. In Cox proportional hazards, the assumption is

26. The RMSPE is

27. Backward elimination starts with

28. The studentized residual is

29. In probit regression, the link is

30. The Vuong test compares

31. Partial regression plots show

32. The mean squared prediction error is

33. In ARIMA regression, it models

34. The concordance correlation coefficient measures

35. Kernel regression is a

36. The Ramsey RESET test detects

37. In survival regression, the baseline hazard is

38. The MAPE is sensitive to

39. Best subset selection evaluates

40. The externally studentized residual excludes

41. Ordered logit is for

42. The score test in GLM is

43. In spatial regression, SAR models

44. The Theil's U statistic compares forecasts to

45. LOESS regression uses

46. The Link test in regression checks

47. In accelerated failure time models, it assumes

48. The SMAPE averages

49. All subset regression is exhaustive but

50. The deleted residual is

51. Multinomial logit for

52. The Wald test is

53. In SEM, path analysis is

54. The MASE normalizes errors by

55. Smoothing splines minimize

56. The CUSUM test detects

57. In frailty models, it accounts for

Related Posts

100 Descriptive, Inferential, and Time Series Statistics in Data Analysis - MCQs

50 Hypothesis Testing in Data Analysis - MCQs

130 Exploratory Data Analysis (EDA) MCQs

Related Categories

1000 Big Data Technologies MCQ

1000 Data Visualization MCQ

1000 Statistics for Data Science MCQ

Free Tools

Detailed Explanation ×