100 Descriptive, Inferential, and Time Series Statistics in Data Analysis - MCQs

1 min read
[flat_pm id="7169"]

100 challenging multiple-choice questions on descriptive statistics, inferential methods, and time series analysis. Inspired by real data science and analytics interview questions from FAANG, consulting firms, and quant roles.

1. Which measure of central tendency is most affected by extreme outliers?

a) Median
b) Mode
c) Mean
d) Midrange
Correct Answer: c) Mean
📝 Explanation:
The mean incorporates every value and is pulled toward extreme scores.

2. In a positively skewed distribution, the correct order of mean, median, and mode is:

a) Mean > Median > Mode
b) Mode > Median > Mean
c) Median > Mean > Mode
d) Mean > Mode > Median
Correct Answer: a) Mean > Median > Mode
📝 Explanation:
The tail on the right pulls the mean highest, followed by median, then mode.

3. The interquartile range (IQR) is calculated as:

a) Q3 + Q1
b) Q3 – Q1
c) Q2 – Q1
d) Q3 / Q1
Correct Answer: b) Q3 – Q1
📝 Explanation:
IQR measures spread between the 75th and 25th percentiles.

4. Which of the following is NOT a measure of dispersion?

a) Range
b) Variance
c) Kurtosis
d) Median
Correct Answer: d) Median
📝 Explanation:
Median describes location, not spread.

5. The empirical rule applies to data that are approximately:

a) Uniformly distributed
b) Normally distributed
c) Exponentially distributed
d) Binomially distributed
Correct Answer: b) Normally distributed
📝 Explanation:
About 68%, 95%, and 99.7% fall within 1, 2, and 3 standard deviations.

6. Pearson's correlation coefficient is undefined when:

a) Variance of X or Y is zero
b) Sample size is less than 3
c) Data are ordinal
d) Relationship is nonlinear
Correct Answer: a) Variance of X or Y is zero
📝 Explanation:
Division by zero standard deviation makes r undefined.

7. Boxplot whiskers typically extend to:

a) Minimum and maximum values
b) 1.5 × IQR beyond Q1 and Q3
c) Mean ± 3σ
d) Q1 – Q3
Correct Answer: b) 1.5 × IQR beyond Q1 and Q3
📝 Explanation:
Points outside are flagged as potential outliers.

8. The coefficient of variation (CV) is useful for comparing:

a) Skewness across datasets
b) Relative variability when units or means differ
c) Absolute dispersion only
d) Correlation strength
Correct Answer: b) Relative variability when units or means differ
📝 Explanation:
CV = (σ / μ) × 100% standardizes dispersion.

9. Z-score of a value x is given by:

a) (x – μ) / σ
b) (x + μ) / σ
c) (μ – x) / σ
d) (x – median) / IQR
Correct Answer: a) (x – μ) / σ
📝 Explanation:
Measures deviations in standard deviation units.

10. Chebyshev’s theorem guarantees at least what proportion within k standard deviations (k>1)?

a) 1 – 1/k
b) 1 – 1/k²
c) 1 – 1/(2k)
d) 1 – 2/k²
Correct Answer: b) 1 – 1/k²
📝 Explanation:
Applies to any distribution with finite variance.

11. The sampling distribution of the sample mean becomes approximately normal when n ≥ 30 due to:

a) Law of large numbers
b) Central Limit Theorem
c) Bayes’ theorem
d) Chebyshev’s inequality
Correct Answer: b) Central Limit Theorem
📝 Explanation:
CLT justifies normality for large samples regardless of population shape.

12. A Type I error occurs when we:

a) Fail to reject a false null
b) Reject a true null
c) Reject a false null
d) Fail to reject a true null
Correct Answer: b) Reject a true null
📝 Explanation:
False positive; probability equals α.

13. The p-value is the probability of obtaining a test statistic at least as extreme as observed, assuming:

a) H₁ is true
b) H₀ is true
c) Sample size is large
d) Data are normal
Correct Answer: b) H₀ is true
📝 Explanation:
Small p-value casts doubt on the null.

14. For a two-tailed z-test at α = 0.05, the critical values are:

a) ±1.645
b) ±1.96
c) ±2.33
d) ±2.58
Correct Answer: b) ±1.96
📝 Explanation:
5% split equally in both tails.

15. The standard error of the mean is σ / √n; when σ is unknown we use:

a) Population variance
b) Sample standard deviation s
c) Median absolute deviation
d) Range / 4
Correct Answer: b) Sample standard deviation s
📝 Explanation:
Leads to t-distribution with n–1 df.

16. Confidence interval width is proportional to:

a) 1 / √n
b) 1 / n
c) √n
d) n
Correct Answer: a) 1 / √n
📝 Explanation:
Larger samples yield narrower intervals.

17. Power of a test = 1 – β, where β is:

a) Type I error rate
b) Type II error rate
c) Significance level
d) Confidence level
Correct Answer: b) Type II error rate
📝 Explanation:
Probability of failing to detect a true effect.

18. For proportions, the standard error is √[p(1–p)/n]; for confidence intervals we often use:

a) Sample proportion p̂
b) Population proportion p
c) 0.5 for maximum variability
d) 1/n
Correct Answer: a) Sample proportion p̂
📝 Explanation:
Wald interval: p̂ ± z√[p̂(1–p̂)/n].

19. The chi-square test for independence tests whether:

a) Row and column variables are associated
b) Means of two groups differ
c) Variance equals a constant
d) Data follow a normal curve
Correct Answer: a) Row and column variables are associated
📝 Explanation:
Compares observed vs expected frequencies.

20. ANOVA tests equality of:

a) Two population means
b) Three or more population means
c) Variances across groups
d) Proportions
Correct Answer: b) Three or more population means
📝 Explanation:
F-statistic = MSB / MSW.

21. A time series is stationary if:

a) Mean, variance, and autocovariance are time-invariant
b) Trend is linear
c) Seasonality is present
d) Data are i.i.d.
Correct Answer: a) Mean, variance, and autocovariance are time-invariant
📝 Explanation:
Strict stationarity requires constant distribution.

22. The ACF at lag k measures:

a) Correlation between y_t and y_{t–k}
b) Variance of y_t
c) Trend strength
d) Seasonal period
Correct Answer: a) Correlation between y_t and y_{t–k}
📝 Explanation:
Helps identify MA order.

23. In an AR(1) model y_t = φ y_{t–1} + ε_t, |φ| < 1 ensures:

a) Stationarity
b) Explosive behavior
c) Unit root
d) Seasonality
Correct Answer: a) Stationarity
📝 Explanation:
Root outside unit circle.

24. Differencing a series once removes:

a) Linear trend
b) Constant mean
c) Seasonality of period 1
d) White noise
Correct Answer: a) Linear trend
📝 Explanation:
Δy_t = y_t – y_{t–1} eliminates polynomial trend of order 1.

25. The PACF of an MA(1) process cuts off after:

a) Lag 1
b) Lag 2
c) Never cuts off
d) Lag 0
Correct Answer: a) Lag 1
📝 Explanation:
Partial correlation beyond lag 1 is zero.

26. ADF test null hypothesis is:

a) Series has a unit root (non-stationary)
b) Series is stationary
c) Series has trend
d) Series is white noise
Correct Answer: a) Series has a unit root (non-stationary)
📝 Explanation:
Reject H0 ⇒ stationary.

27. In SARIMA(p,d,q)(P,D,Q)s, 's' denotes:

a) Seasonal period
b) Differencing order
c) AR order
d) MA order
Correct Answer: a) Seasonal period
📝 Explanation:
Common values: 12 (monthly), 4 (quarterly).

28. AIC penalizes models for:

a) Number of parameters
b) Residual variance only
c) Outliers
d) Forecast horizon
Correct Answer: a) Number of parameters
📝 Explanation:
Lower AIC indicates better balance of fit and complexity.

29. White noise has ACF values approximately:

a) Zero for all lags > 0
b) One at lag 0, zero elsewhere
c) Decaying exponentially
d) Significant at seasonal lags
Correct Answer: a) Zero for all lags > 0
📝 Explanation:
Uncorrelated errors.

30. Holt-Winters additive model is suitable when seasonal fluctuations:

a) Are roughly constant in size
b) Increase with the level
c) Are multiplicative
d) Are absent
Correct Answer: a) Are roughly constant in size
📝 Explanation:
Multiplicative version for proportional seasonality.

31. The variance of a dataset {2, 4, 6, 8, 10} is:

a) 6
b) 8
c) 10
d) 12
Correct Answer: b) 8
📝 Explanation:
Mean = 6; variance = [(−4)² + (−2)² + 0 + 2² + 4²]/4 = 40/5 = 8.

32. For the data {1, 3, 3, 6}, the mode is:

a) 1
b) 3
c) 6
d) Bimodal
Correct Answer: b) 3
📝 Explanation:
3 appears twice; others once.

33. Skewness coefficient > 0 indicates:

a) Symmetric distribution
b) Left-skewed
c) Right-skewed
d) Platykurtic
Correct Answer: c) Right-skewed
📝 Explanation:
Longer tail on the positive side.

34. Percentile rank of the median is:

a) 25th
b) 50th
c) 75th
d) 100th
Correct Answer: b) 50th
📝 Explanation:
Half the data lie below the median.

35. Geometric mean is preferred for:

a) Averaging rates of change
b) Additive data
c) Nominal variables
d) Counts
Correct Answer: a) Averaging rates of change
📝 Explanation:
Handles compounding/multiplicative processes.

36. Covariance of a variable with itself equals its:

a) Variance
b) Standard deviation
c) Mean
d) Range
Correct Answer: a) Variance
📝 Explanation:
Cov(X,X) = Var(X).

37. Spearman’s rank correlation is based on:

a) Original values
b) Ranks of the data
c) Deviations from mean
d) Log-transformed values
Correct Answer: b) Ranks of the data
📝 Explanation:
Non-parametric measure of monotonic relationship.

38. The 95% CI for μ when n=25, x̄=50, s=10 (t-critical ≈ 2.064) is:

a) 50 ± 4.13
b) 50 ± 3.92
c) 50 ± 2.06
d) 50 ± 1.96
Correct Answer: a) 50 ± 4.13
📝 Explanation:
Margin = 2.064 × (10/√25) = 2.064 × 2 ≈ 4.13.

39. A test statistic z = 2.5, α = 0.01 two-tailed; decision is:

a) Reject H0
b) Fail to reject H0
c) p = 0.0124
d) Need df
Correct Answer: b) Fail to reject H0
📝 Explanation:
Critical |z| = 2.576 > 2.5.

40. Minimum sample size for proportion CI with margin of error E=0.03, p*=0.5, z=1.96:

a) 1068
b) 752
c) 385
d) 267
Correct Answer: a) 1068
📝 Explanation:
n = (1.96² × 0.5 × 0.5) / 0.03² ≈ 1067.11 → 1068.

41. For paired t-test, degrees of freedom =

a) n₁ + n₂ – 2
b) n – 1
c) 2n – 1
d) n
Correct Answer: b) n – 1
📝 Explanation:
Based on differences (n pairs).

42. F-test is used to compare:

a) Two variances
b) Two means
c) Proportions
d) Correlations
Correct Answer: a) Two variances
📝 Explanation:
H0: σ₁² = σ₂².

43. Mann-Whitney U tests difference in:

a) Medians of two independent samples
b) Means of paired data
c) Variances
d) Proportions
Correct Answer: a) Medians of two independent samples
📝 Explanation:
Non-parametric alternative to two-sample t-test.

44. Kruskal-Wallis is the non-parametric version of:

a) One-way ANOVA
b) Paired t-test
c) Chi-square goodness-of-fit
d) Linear regression
Correct Answer: a) One-way ANOVA
📝 Explanation:
Compares three or more independent groups.

45. The Durbin-Watson statistic near 2 indicates:

a) No autocorrelation
b) Positive autocorrelation
c) Negative autocorrelation
d) Heteroscedasticity
Correct Answer: a) No autocorrelation
📝 Explanation:
Range 0–4; 2 is ideal.

46. In exponential smoothing, α close to 1 gives more weight to:

a) Recent observations
b) Older observations
c) Seasonal component
d) Trend
Correct Answer: a) Recent observations
📝 Explanation:
Higher α reacts faster to changes.

47. An AR(2) model requires PACF significant at:

a) Lags 1 and 2
b) Lag 1 only
c) All lags
d) Lag 2 only
Correct Answer: a) Lags 1 and 2
📝 Explanation:
PACF cuts off after p.

48. Seasonal differencing of period 12 is denoted as:

a) ∇₁₂ y_t
b) ∇ y_t
c) log y_t
d) y_t – y_{t–1}
Correct Answer: a) ∇₁₂ y_t
📝 Explanation:
∇₁₂ y_t = y_t – y_{t–12}.

49. Ljung-Box test checks for:

a) Lack of autocorrelation in residuals
b) Normality
c) Stationarity
d) Homoscedasticity
Correct Answer: a) Lack of autocorrelation in residuals
📝 Explanation:
High p-value supports white noise.

50. KPSS test null hypothesis is:

a) Stationarity
b) Unit root
c) Trend
d) Seasonality
Correct Answer: a) Stationarity
📝 Explanation:
Complements ADF; fail to reject ⇒ stationary.

51. In ARIMA, 'I' stands for:

a) Integrated
b) Invertible
c) Independent
d) Intercept
Correct Answer: a) Integrated
📝 Explanation:
Order of differencing to achieve stationarity.

52. Forecasts from a random walk model are:

a) Last observed value
b) Mean of series
c) Zero
d) Trend line
Correct Answer: a) Last observed value
📝 Explanation:
y_{t+1} = y_t + ε_{t+1} ⇒ best forecast = y_t.

53. Granger causality tests whether past values of X improve prediction of:

a) Y beyond its own past
b) X itself
c) Error term
d) Trend
Correct Answer: a) Y beyond its own past
📝 Explanation:
Rejects non-causality if X lags are significant in Y equation.

54. Variance inflation factor (VIF) > 10 suggests:

a) Severe multicollinearity
b) Heteroscedasticity
c) Autocorrelation
d) Non-normality
Correct Answer: a) Severe multicollinearity
📝 Explanation:
VIF_j = 1 / (1 – R_j²).

55. The coefficient of determination R² represents:

a) Proportion of variance explained
b) Correlation coefficient
c) Slope of regression
d) p-value
Correct Answer: a) Proportion of variance explained
📝 Explanation:
SSR / SST.

56. In simple linear regression, the least squares slope b1 =

a) Cov(X,Y) / Var(X)
b) Cov(X,Y) / Var(Y)
c) Mean(Y) / Mean(X)
d) SD(Y) / SD(X)
Correct Answer: a) Cov(X,Y) / Var(X)
📝 Explanation:
Minimizes sum of squared residuals.

57. Residual standard error estimates:

a) σ, the error standard deviation
b) β0
c) R²
d) F-statistic
Correct Answer: a) σ, the error standard deviation
📝 Explanation:
√(SSE / (n–2)).

58. For logistic regression, the link function is:

a) Logit
b) Probit
c) Log
d) Identity
Correct Answer: a) Logit
📝 Explanation:
log(p/(1–p)) = β0 + β1x.

59. The odds ratio exp(β) = 1.5 means:

a) 50% increase in odds per unit increase in x
b) 1.5 times the probability
c) Log odds increase by 1.5
d) Probability increases by 0.5
Correct Answer: a) 50% increase in odds per unit increase in x
📝 Explanation:
Odds multiply by exp(β).

60. Poisson regression is suitable for:

a) Count data
b) Binary outcomes
c) Continuous positive data
d) Time to event
Correct Answer: a) Count data
📝 Explanation:
Mean = variance.

61. In survival analysis, Kaplan-Meier estimates:

a) Survival function non-parametrically
b) Hazard ratio
c) Median survival time only
d) Proportional hazards
Correct Answer: a) Survival function non-parametrically
📝 Explanation:
Product-limit estimator handles censoring.

62. Cox proportional hazards assumption is violated if:

a) Hazard ratios change over time
b) Log(-log(S(t))) curves are parallel
c) Schoenfeld residuals show no trend
d) p-value < 0.05
Correct Answer: a) Hazard ratios change over time
📝 Explanation:
Check via time-dependent covariates or plots.

63. Bayesian inference updates beliefs using:

a) Prior × Likelihood → Posterior
b) Likelihood only
c) Posterior × Data
d) Prior / Likelihood
Correct Answer: a) Prior × Likelihood → Posterior
📝 Explanation:
Bayes’ theorem: P(θ|data) ∝ P(data|θ) P(θ).

64. Conjugate prior for normal mean (known variance) is:

a) Normal
b) Gamma
c) Beta
d) Inverse-gamma
Correct Answer: a) Normal
📝 Explanation:
Posterior remains normal.

65. MCMC methods are used to:

a) Sample from complex posterior distributions
b) Compute exact integrals
c) Perform hypothesis tests
d) Calculate p-values
Correct Answer: a) Sample from complex posterior distributions
📝 Explanation:
Markov Chain Monte Carlo approximates posteriors.

66. Bootstrapping estimates sampling distribution by:

a) Resampling with replacement from the data
b) Using parametric assumptions
c) Increasing sample size
d) Theoretical formulas
Correct Answer: a) Resampling with replacement from the data
📝 Explanation:
Non-parametric; empirical distribution.

67. Percentile bootstrap CI uses:

a) 2.5th and 97.5th percentiles of bootstrap statistics
b) Mean ± 1.96 SE
c) t-distribution
d) Chi-square
Correct Answer: a) 2.5th and 97.5th percentiles of bootstrap statistics
📝 Explanation:
For 95% CI from B=1000 replicates.

68. Cross-validation is primarily used to assess:

a) Model generalization error
b) p-values
c) Confidence intervals
d) Residual normality
Correct Answer: a) Model generalization error
📝 Explanation:
K-fold CV estimates out-of-sample performance.

69. The bias-variance tradeoff implies that overly complex models tend to:

a) Overfit (high variance, low bias)
b) Underfit (high bias, low variance)
c) Balance perfectly
d) Have zero error
Correct Answer: a) Overfit (high variance, low bias)
📝 Explanation:
Capture noise, not just signal.

70. Principal Component Analysis (PCA) maximizes:

a) Variance along projected directions
b) Correlation between variables
c) Mean squared error
d) Entropy
Correct Answer: a) Variance along projected directions
📝 Explanation:
Eigenvectors of covariance matrix.

71. The scree plot helps determine number of PCs by looking for:

a) Elbow in eigenvalue decline
b) Linear trend
c) Constant variance
d) Zero eigenvalues
Correct Answer: a) Elbow in eigenvalue decline
📝 Explanation:
Retain components before sharp drop.

72. In k-means clustering, the objective is to minimize:

a) Within-cluster sum of squares
b) Between-cluster sum of squares
c) Total sum of squares
d) Silhouette score
Correct Answer: a) Within-cluster sum of squares
📝 Explanation:
Iterative assignment and centroid update.

73. Silhouette coefficient ranges from:

a) –1 to +1
b) 0 to 1
c) –∞ to +∞
d) 0 to ∞
Correct Answer: a) –1 to +1
📝 Explanation:
Higher values indicate better cluster separation.

74. A contingency table with all expected frequencies ≥ 5 is required for:

a) Chi-square test validity
b) t-test
c) ANOVA
d) Regression
Correct Answer: a) Chi-square test validity
📝 Explanation:
Approximation to chi-square distribution.

75. McNemar’s test is used for:

a) Paired binary data
b) Independent proportions
c) Continuous paired data
d) Multiple groups
Correct Answer: a) Paired binary data
📝 Explanation:
Tests marginal homogeneity.

76. The central limit theorem requires random samples that are:

a) Independent and identically distributed
b) Normally distributed
c) Paired
d) Stratified
Correct Answer: a) Independent and identically distributed
📝 Explanation:
For large n, sample mean ≈ normal.

77. Degrees of freedom for two-sample t-test (unequal variances) is approximately:

a) Welch-Satterthwaite formula
b) n1 + n2 – 1
c) n1 + n2 – 2
d) min(n1, n2) – 1
Correct Answer: a) Welch-Satterthwaite formula
📝 Explanation:
Conservative; avoids equal variance assumption.

78. Effect size Cohen’s d = (μ1 – μ2) / σ; d = 0.8 is considered:

a) Large
b) Medium
c) Small
d) Negligible
Correct Answer: a) Large
📝 Explanation:
Benchmarks: 0.2 small, 0.5 medium, 0.8 large.

79. Multiple R² in regression can be artificially inflated by:

a) Adding irrelevant predictors
b) Removing outliers
c) Transforming variables
d) Increasing sample size
Correct Answer: a) Adding irrelevant predictors
📝 Explanation:
Use adjusted R² to penalize extra terms.

80. Homoscedasticity means residuals have:

a) Constant variance across predicted values
b) Zero mean
c) Normal distribution
d) No autocorrelation
Correct Answer: a) Constant variance across predicted values
📝 Explanation:
Breusch-Pagan test checks this.

81. Durbin-Watson values < 1 typically indicate:

a) Strong positive autocorrelation
b) No autocorrelation
c) Negative autocorrelation
d) Heteroscedasticity
Correct Answer: a) Strong positive autocorrelation
📝 Explanation:
Values near 0 = positive; near 4 = negative.

82. In time series decomposition, the remainder after trend and seasonal removal is:

a) Irregular (random) component
b) Cyclical component
c) Trend
d) Seasonal index
Correct Answer: a) Irregular (random) component
📝 Explanation:
Should resemble white noise if model is good.

83. STL decomposition stands for:

a) Seasonal-Trend decomposition using LOESS
b) Simple Time Series Linear
c) Standard Trend Line
d) Smooth Time Lag
Correct Answer: a) Seasonal-Trend decomposition using LOESS
📝 Explanation:
Robust, flexible for varying seasonal patterns.

84. ACF of a seasonal series with period 12 shows spikes at:

a) Multiples of 12
b) Lag 1 only
c) All lags
d) Lag 0
Correct Answer: a) Multiples of 12
📝 Explanation:
Indicates need for seasonal AR/MA terms.

85. Box-Cox transformation is applied to stabilize:

a) Variance (heteroscedasticity)
b) Mean
c) Trend
d) Seasonality
Correct Answer: a) Variance (heteroscedasticity)
📝 Explanation:
y(λ) = (y^λ – 1)/λ or log(y) for λ=0.

86. Over-differencing a stationary series introduces:

a) MA(1) structure with negative coefficient
b) Unit root
c) Trend
d) Seasonality
Correct Answer: a) MA(1) structure with negative coefficient
📝 Explanation:
ACF shows spike at lag 1, then near zero.

87. The optimal ARIMA model often has residuals with:

a) No significant ACF/PACF spikes (white noise)
b) Significant lag 1
c) Linear trend
d) Seasonal pattern
Correct Answer: a) No significant ACF/PACF spikes (white noise)
📝 Explanation:
Ljung-Box p > 0.05 supports adequacy.

88. In VAR models, each variable is modeled as a function of:

a) Its own lags and lags of all other variables
b) Only its own lags
c) Exogenous variables
d) Trend only
Correct Answer: a) Its own lags and lags of all other variables
📝 Explanation:
Vector autoregression for multivariate time series.

89. Impulse response function traces effect of a shock in one variable on:

a) Future values of all variables
b) Past values
c) Error term
d) Constant term
Correct Answer: a) Future values of all variables
📝 Explanation:
Shows dynamic responses in VAR.

90. Cointegration means two non-stationary series have:

a) A stationary linear combination
b) Identical trends
c) Zero correlation
d) Same variance
Correct Answer: a) A stationary linear combination
📝 Explanation:
Long-run equilibrium relationship.

91. Johansen test is used to detect:

a) Number of cointegrating relationships
b) Unit roots
c) Granger causality
d) ARCH effects
Correct Answer: a) Number of cointegrating relationships
📝 Explanation:
Trace and max-eigenvalue statistics.

92. ARCH model tests for:

a) Time-varying volatility (heteroscedasticity)
b) Mean reversion
c) Trend
d) Seasonality
Correct Answer: a) Time-varying volatility (heteroscedasticity)
📝 Explanation:
Variance depends on past squared errors.

93. GARCH(1,1) models volatility as:

a) σ²_t = α₀ + α₁ ε²_{t–1} + β₁ σ²_{t–1}
b) σ_t = α₀ + α₁ |ε_{t–1}|
c) σ_t = constant
d) σ²_t = ε²_{t–1}
Correct Answer: a) σ²_t = α₀ + α₁ ε²_{t–1} + β₁ σ²_{t–1}
📝 Explanation:
Combines ARCH and persistence.

94. The Jarque-Bera test assesses:

a) Normality of residuals (skewness + kurtosis)
b) Autocorrelation
c) Stationarity
d) Homoscedasticity
Correct Answer: a) Normality of residuals (skewness + kurtosis)
📝 Explanation:
JB statistic ~ χ²(2).

95. Shapiro-Wilk test null hypothesis is:

a) Data come from a normal distribution
b) Data are skewed
c) Variance is constant
d) Mean equals zero
Correct Answer: a) Data come from a normal distribution
📝 Explanation:
Powerful for small samples.

96. Levene’s test checks for:

a) Equality of variances across groups
b) Equality of means
c) Normality
d) Independence
Correct Answer: a) Equality of variances across groups
📝 Explanation:
Robust to non-normality.

97. The Bonferroni correction adjusts α by:

a) Dividing by number of tests
b) Multiplying by number of tests
c) Using α/2
d) Square root
Correct Answer: a) Dividing by number of tests
📝 Explanation:
Controls family-wise error rate conservatively.

98. Holm’s method is a:

a) Step-down multiple comparison procedure
b) Parametric test
c) Non-parametric ANOVA
d) Bayesian test
Correct Answer: a) Step-down multiple comparison procedure
📝 Explanation:
Less conservative than Bonferroni.

99. Tukey’s HSD test is used after ANOVA to compare:

a) All pairwise means
b) Means vs control
c) Variances
d) Medians
Correct Answer: a) All pairwise means
📝 Explanation:
Honest Significant Difference; assumes equal variances.

100. Dunnett’s test compares:

a) Multiple treatments vs a single control
b) All pairs
c) Proportions
d) Correlations
Correct Answer: a) Multiple treatments vs a single control
📝 Explanation:
Fewer comparisons, higher power.

101. The Q-Q plot assesses normality by plotting:

a) Sample quantiles vs theoretical normal quantiles
b) Residuals vs fitted
c) ACF
d) Histogram
Correct Answer: a) Sample quantiles vs theoretical normal quantiles
📝 Explanation:
Straight line indicates normality.

102. A leverage point in regression has high:

a) Distance from mean of X (hat diagonal)
b) Residual
c) Cook’s distance
d) Standardized coefficient
Correct Answer: a) Distance from mean of X (hat diagonal)
📝 Explanation:
h_ii > 2p/n flags potential leverage.

103. Cook’s distance measures:

a) Influence of an observation on all fitted values
b) Residual size
c) Multicollinearity
d) Heteroscedasticity
Correct Answer: a) Influence of an observation on all fitted values
📝 Explanation:
Large values (> 4/n) indicate influential points.

104. The partial F-test in regression compares:

a) Nested models (reduced vs full)
b) Two independent samples
c) Variances
d) Proportions
Correct Answer: a) Nested models (reduced vs full)
📝 Explanation:
Tests significance of added predictors.

105. Ridge regression adds penalty:

a) λ Σ β_j² (L2)
b) λ Σ |β_j| (L1)
c) λ Σ (β_j – 1)²
d) No penalty
Correct Answer: a) λ Σ β_j² (L2)
📝 Explanation:
Shrinks coefficients, handles multicollinearity.

106. Lasso can perform variable selection because it:

a) Sets some coefficients exactly to zero
b) Shrinks all equally
c) Increases variance
d) Removes intercept
Correct Answer: a) Sets some coefficients exactly to zero
📝 Explanation:
L1 penalty promotes sparsity.

107. Elastic Net combines:

a) Ridge and Lasso penalties
b) Ridge and OLS
c) Lasso and PCR
d) PCR and PLS
Correct Answer: a) Ridge and Lasso penalties
📝 Explanation:
Useful when p > n or high correlation.

[flat_pm id="7165"]
[flat_pm id="7166"]
[flat_pm id="7168"]
Next →: 120 Data Cleaning and Preprocessing in Data Analysis - MCQs
Exploratory Data Analysis (EDA) MCQs

130 Exploratory Data Analysis (EDA) MCQs

MCQs cover the fundamentals of Exploratory Data Analysis, covering data summarization, visualization techniques, handling anomalies, and inferring patterns from datasets.…

By MCQs Generator
Hypothesis Testing in Data Analysis

50 Hypothesis Testing in Data Analysis - MCQs

This set of 50 MCQs explores key concepts in hypothesis testing, including null and alternative hypotheses, p-values, test statistics, error…

By MCQs Generator
50 Regression Analysis in Data Analysis - MCQs

50 Regression Analysis in Data Analysis MCQs

These 50 MCQs covers fundamental concepts in regression analysis, including linear and multiple regression, assumptions, diagnostics, and interpretation. Ideal for…

By MCQs Generator
[flat_pm id="7160"]