50 Regression Analysis in Data Analysis MCQs

1 min read
[flat_pm id="7169"]

These 50 MCQs covers fundamental concepts in regression analysis, including linear and multiple regression, assumptions, diagnostics, and interpretation. Ideal for students and professionals in data analysis to test understanding of predictive modeling techniques.

1. What does linear regression model the relationship between?

a) Two categorical variables
b) A dependent variable and one or more independent variables
c) Only independent variables
d) Time series only
Correct Answer: b) A dependent variable and one or more independent variables
📝 Explanation:
Linear regression predicts a continuous outcome (Y) as a linear function of predictors (X).

2. The slope coefficient in simple linear regression represents

a) The intercept
b) Change in Y for a one-unit change in X
c) The correlation
d) The error term
Correct Answer: b) Change in Y for a one-unit change in X
📝 Explanation:
β1 = ΔY / ΔX, holding other factors constant.

3. R-squared measures

a) Total variance
b) Proportion of variance in Y explained by X
c) Residual sum of squares
d) Standard error
Correct Answer: b) Proportion of variance in Y explained by X
📝 Explanation:
Ranges from 0 to 1; higher values indicate better fit.

4. The assumption of linearity in regression means

a) Constant variance
b) Relationship between X and Y is linear
c) No multicollinearity
d) Independent errors
Correct Answer: b) Relationship between X and Y is linear
📝 Explanation:
Verified via scatter plots or residual plots.

5. Homoscedasticity refers to

a) Constant variance of residuals
b) Normal distribution of errors
c) No autocorrelation
d) Linear trend
Correct Answer: a) Constant variance of residuals
📝 Explanation:
Tested with Breusch-Pagan; violations suggest heteroscedasticity.

6. In multiple regression, multicollinearity is detected using

a) VIF (Variance Inflation Factor)
b) R-squared
c) F-test
d) t-test
Correct Answer: a) VIF (Variance Inflation Factor)
📝 Explanation:
VIF > 5-10 indicates high collinearity among predictors.

7. The intercept in regression is

a) Expected Y when all X=0
b) Slope
c) Error
d) Correlation
Correct Answer: a) Expected Y when all X=0
📝 Explanation:
β0; may lack interpretation if X=0 is outside range.

8. Residuals are

a) Predicted values
b) Observed minus predicted values
c) Independent variables
d) Coefficients
Correct Answer: b) Observed minus predicted values
📝 Explanation:
Used for diagnostics; should be randomly distributed.

9. The F-test in regression tests

a) Overall model significance
b) Individual coefficients
c) Linearity
d) Normality
Correct Answer: a) Overall model significance
📝 Explanation:
H0: all β=0; low p-value indicates model explains variance.

10. t-test for coefficients tests

a) H0: β=0
b) H0: R²=0
c) Intercept only
d) Variance equality
Correct Answer: a) H0: β=0
📝 Explanation:
Significance of individual predictors.

11. Adjusted R-squared accounts for

a) Number of predictors
b) Sample size
c) Both a and b
d) Residuals
Correct Answer: c) Both a and b
📝 Explanation:
Penalizes adding irrelevant variables; better for model comparison.

12. Outliers in regression can be detected using

a) Leverage and Cook's distance
b) R-squared
c) Slope
d) Intercept
Correct Answer: a) Leverage and Cook's distance
📝 Explanation:
High values indicate influential points affecting fit.

13. The standard error of the estimate is

a) Root mean squared error
b) Average residual
c) Variance of Y
d) Correlation
Correct Answer: a) Root mean squared error
📝 Explanation:
Measures prediction accuracy; √(SSE/(n-k-1)).

14. Autocorrelation in residuals is tested with

a) Durbin-Watson test
b) Breusch-Pagan
c) Shapiro-Wilk
d) VIF
Correct Answer: a) Durbin-Watson test
📝 Explanation:
Values near 2 indicate no serial correlation; common in time series.

15. Logistic regression is used for

a) Continuous outcomes
b) Binary or categorical outcomes
c) Time series
d) Clustering
Correct Answer: b) Binary or categorical outcomes
📝 Explanation:
Models log-odds; uses sigmoid function.

16. In ridge regression, the penalty is on

a) Sum of squared coefficients
b) Absolute coefficients
c) Interactions
d) Residuals
Correct Answer: a) Sum of squared coefficients
📝 Explanation:
L2 regularization; reduces multicollinearity.

17. Lasso regression uses

a) L1 penalty
b) L2 penalty
c) No penalty
d) Quadratic penalty
Correct Answer: a) L1 penalty
📝 Explanation:
Sum of absolute coefficients; performs variable selection.

18. Polynomial regression extends linear by

a) Adding higher powers of X
b) Multiple X variables
c) Interactions
d) Log transformation
Correct Answer: a) Adding higher powers of X
📝 Explanation:
Captures non-linear relationships; risks overfitting.

19. The coefficient of determination is

a) R-squared
b) Correlation coefficient
c) Standard error
d) F-statistic
Correct Answer: a) R-squared
📝 Explanation:
1 - (SS_res / SS_tot).

20. Normality of residuals is tested with

a) Q-Q plot or Shapiro-Wilk
b) Scatter plot
c) Histogram of X
d) VIF plot
Correct Answer: a) Q-Q plot or Shapiro-Wilk
📝 Explanation:
Assumption for inference; affects confidence intervals.

21. In OLS regression, the goal is to minimize

a) Sum of squared residuals
b) Absolute residuals
c) Maximum residual
d) Variance
Correct Answer: a) Sum of squared residuals
📝 Explanation:
Least squares method; unbiased under assumptions.

22. Heteroscedasticity can be addressed by

a) Weighted least squares
b) Increasing sample size
c) Both a and b
d) Adding more variables
Correct Answer: c) Both a and b
📝 Explanation:
Or transformations like log(Y).

23. The Durbin-Watson statistic ranges from

a) 0 to 2
b) 0 to 4
c) -2 to 2
d) 1 to 3
Correct Answer: b) 0 to 4
📝 Explanation:
Around 2 is ideal; mean; adds dispersion parameter.

24. The Mallow's Cp selects models by

a) Bias + variance estimate
b) R-squared
c) AIC
d) BIC
Correct Answer: a) Bias + variance estimate
📝 Explanation:
Cp ≈ p; for subset selection.

25. In Cox proportional hazards, the assumption is

a) Constant hazard ratio over time
b) Linear time
c) Normal errors
d) Equal variances
Correct Answer: a) Constant hazard ratio over time
📝 Explanation:
Tested with Schoenfeld residuals.

26. The RMSPE is

a) Root mean squared percentage error
b) Residual mean square
c) Percentage R-squared
d) Error variance
Correct Answer: a) Root mean squared percentage error
📝 Explanation:
Scale-free accuracy measure.

27. Backward elimination starts with

a) All variables, removes insignificant
b) None, adds
c) Random
d) Principal components
Correct Answer: a) All variables, removes insignificant
📝 Explanation:
Stepwise; based on p-values.

28. The studentized residual is

a) Residual divided by its SE
b) Leverage-adjusted
c) Both
d) Raw residual
Correct Answer: a) Residual divided by its SE
📝 Explanation:
|e| > 3 flags outliers.

29. In probit regression, the link is

a) Cumulative normal
b) Logit
c) Log
d) Identity
Correct Answer: a) Cumulative normal
📝 Explanation:
For binary; inverse Φ(βX) = P(Y=1).

30. The Vuong test compares

a) Non-nested GLMs
b) Nested
c) Variances
d) Means
Correct Answer: a) Non-nested GLMs
📝 Explanation:
Likelihood-based model selection.

31. Partial regression plots show

a) Marginal effect of X controlling others
b) Total effect
c) Interaction
d) Residual vs fitted
Correct Answer: a) Marginal effect of X controlling others
📝 Explanation:
Residuals of Y on other X vs residuals of this X.

32. The mean squared prediction error is

a) Average (Y - Ŷ)^2 on new data
b) Training MSE
c) R-squared
d) Adjusted R²
Correct Answer: a) Average (Y - Ŷ)^2 on new data
📝 Explanation:
For out-of-sample performance.

33. In ARIMA regression, it models

a) Time-varying errors
b) Static
c) Spatial
d) Categorical
Correct Answer: a) Time-varying errors
📝 Explanation:
Auto-regressive integrated moving average.

34. The concordance correlation coefficient measures

a) Agreement beyond correlation
b) Bias
c) Precision
d) Both b and c
Correct Answer: a) Agreement beyond correlation
📝 Explanation:
ρ_c = ρ / √(1 + bias).

35. Kernel regression is a

a) Non-parametric method
b) Parametric
c) Linear
d) Stepwise
Correct Answer: a) Non-parametric method
📝 Explanation:
Nadaraya-Watson; local weighting.

36. The Ramsey RESET test detects

a) Functional form misspecification
b) Multicollinearity
c) Heteroscedasticity
d) Autocorrelation
Correct Answer: a) Functional form misspecification
📝 Explanation:
Adds powers of fitted values.

37. In survival regression, the baseline hazard is

a) Non-parametric in Cox
b) Parametric
c) Linear
d) Constant
Correct Answer: a) Non-parametric in Cox
📝 Explanation:
h(t|X=0); estimated via Breslow.

38. The MAPE is sensitive to

a) Zero values
b) Outliers
c) Scale
d) Normality
Correct Answer: a) Zero values
📝 Explanation:
Mean absolute percentage error; undefined if Y=0.

39. Best subset selection evaluates

a) All possible subsets
b) Forward only
c) Backward
d) Stepwise
Correct Answer: a) All possible subsets
📝 Explanation:
2^p models; computationally intensive.

40. The externally studentized residual excludes

a) The point itself in SE calculation
b) All points
c) Leverage
d) Intercept
Correct Answer: a) The point itself in SE calculation
📝 Explanation:
For detecting outliers influencing fit.

41. Ordered logit is for

a) Ordinal outcomes
b) Nominal
c) Binary
d) Continuous
Correct Answer: a) Ordinal outcomes
📝 Explanation:
Cumulative logits; proportional odds assumption.

42. The score test in GLM is

a) Efficient under H0
b) Wald
c) LR
d) All same
Correct Answer: a) Efficient under H0
📝 Explanation:
Gradient of log-likelihood; no full fit needed.

43. In spatial regression, SAR models

a) Spatial autocorrelation in errors
b) Lag dependence
c) Both
d) No spatial
Correct Answer: c) Both
📝 Explanation:
Spatial autoregressive; Wy = ρWy + Xβ + ε.

44. The Theil's U statistic compares forecasts to

a) Naive method
b) OLS
c) Ridge
d) Lasso
Correct Answer: a) Naive method
📝 Explanation:
U < 1 better than no-change forecast.

45. LOESS regression uses

a) Local polynomials
b) Global linear
c) Splines
d) Kernels only
Correct Answer: a) Local polynomials
📝 Explanation:
Locally estimated scatterplot smoothing.

46. The Link test in regression checks

a) Omitted variables or specification
b) Multicollinearity
c) Heteroscedasticity
d) Normality
Correct Answer: a) Omitted variables or specification
📝 Explanation:
Regress fitted and squared fitted; significant squared indicates misspec.

47. In accelerated failure time models, it assumes

a) Log-linear effect on time
b) Proportional hazards
c) Weibull distribution
d) Exponential
Correct Answer: a) Log-linear effect on time
📝 Explanation:
log(T) = βX + σW; parametric survival.

48. The SMAPE averages

a) Symmetric absolute percentage errors
b) Squared errors
c) Relative errors
d) Forecast errors
Correct Answer: a) Symmetric absolute percentage errors
📝 Explanation:
Handles zero issues in MAPE.

49. All subset regression is exhaustive but

a) Computationally expensive
b) Fast
c) Approximate
d) Stepwise
Correct Answer: a) Computationally expensive
📝 Explanation:
For p>20, use branch and bound.

50. The deleted residual is

a) Prediction error leaving out i
b) Internal
c) Studentized
d) Leverage
Correct Answer: a) Prediction error leaving out i
📝 Explanation:
PRESS component.

51. Multinomial logit for

a) Nominal outcomes >2 categories
b) Ordinal
c) Binary
d) Continuous
Correct Answer: a) Nominal outcomes >2 categories
📝 Explanation:
One vs all; IIA assumption.

52. The Wald test is

a) Asymptotic for large samples
b) Exact
c) Small sample
d) Non-parametric
Correct Answer: a) Asymptotic for large samples
📝 Explanation:
(β-hat / SE)^2 ~ χ².

53. In SEM, path analysis is

a) Observed variables regression
b) Latent only
c) CFA
d) EFA
Correct Answer: a) Observed variables regression
📝 Explanation:
Special case of SEM without latents.

54. The MASE normalizes errors by

a) In-sample naive forecast
b) Scale
c) Variance
d) SD
Correct Answer: a) In-sample naive forecast
📝 Explanation:
Mean absolute scaled error; scale-free.

55. Smoothing splines minimize

a) RSS + smoothness penalty
b) Only RSS
c) Knots
d) Degrees
Correct Answer: a) RSS + smoothness penalty
📝 Explanation:
λ tunes fit vs smoothness.

56. The CUSUM test detects

a) Structural breaks
b) Autocorrelation
c) Heteroscedasticity
d) Normality
Correct Answer: a) Structural breaks
📝 Explanation:
Cumulative sum of residuals.

57. In frailty models, it accounts for

a) Unobserved heterogeneity
b) Observed covariates
c) Time
d) Censoring
Correct Answer: a) Unobserved heterogeneity
📝 Explanation:
Random effects in survival.

[flat_pm id="7165"]
[flat_pm id="7166"]
[flat_pm id="7168"]
← Previous: 50 Hypothesis Testing in Data Analysis - MCQs
Next →: 60 Important Correlation and Covariance MCQs
Hypothesis Testing in Data Analysis

50 Hypothesis Testing in Data Analysis - MCQs

This set of 50 MCQs explores key concepts in hypothesis testing, including null and alternative hypotheses, p-values, test statistics, error…

By MCQs Generator
Correlation and Covariance

60 Important Correlation and Covariance MCQs

This set of 60 MCQs covers the fundamentals of correlation and covariance, including types like Pearson and Spearman, their calculations,…

By MCQs Generator
120 Data Cleaning and Preprocessing in Data Analysis - MCQs

120 Data Cleaning and Preprocessing in Data Analysis - MCQs

120 industry-level multiple-choice questions on data cleaning, handling missing values, outliers, encoding, scaling, and preprocessing pipelines—modeled after real data scientist…

By MCQs Generator
[flat_pm id="7160"]