100 Descriptive, Inferential, and Time Series Statistics in Data Analysis - MCQs

Q: 1. Which measure of central tendency is most affected by extreme outliers?

See the full post for the detailed answer.

Q: 2. In a positively skewed distribution, the correct order of mean, median, and mode is:

See the full post for the detailed answer.

Q: 3. The interquartile range (IQR) is calculated as:

See the full post for the detailed answer.

Q: 4. Which of the following is NOT a measure of dispersion?

See the full post for the detailed answer.

Q: 5. The empirical rule applies to data that are approximately:

See the full post for the detailed answer.

Q: 6. Pearson's correlation coefficient is undefined when:

See the full post for the detailed answer.

Q: 7. Boxplot whiskers typically extend to:

See the full post for the detailed answer.

Q: 8. The coefficient of variation (CV) is useful for comparing:

See the full post for the detailed answer.

Q: 9. Z-score of a value x is given by:

See the full post for the detailed answer.

Q: 10. Chebyshev’s theorem guarantees at least what proportion within k standard deviations (k>1)?

See the full post for the detailed answer.

1. Which measure of central tendency is most affected by extreme outliers?

a) Median

b) Mode

c) Mean

d) Midrange

Correct Answer: c) Mean

Explanation:

The mean incorporates every value and is pulled toward extreme scores.

2. In a positively skewed distribution, the correct order of mean, median, and mode is:

a) Mean > Median > Mode

b) Mode > Median > Mean

c) Median > Mean > Mode

d) Mean > Mode > Median

Correct Answer: a) Mean > Median > Mode

Explanation:

The tail on the right pulls the mean highest, followed by median, then mode.

3. The interquartile range (IQR) is calculated as:

a) Q3 + Q1

b) Q3 – Q1

c) Q2 – Q1

d) Q3 / Q1

Correct Answer: b) Q3 – Q1

Explanation:

IQR measures spread between the 75th and 25th percentiles.

4. Which of the following is NOT a measure of dispersion?

a) Range

b) Variance

c) Kurtosis

d) Median

Correct Answer: d) Median

Explanation:

Median describes location, not spread.

5. The empirical rule applies to data that are approximately:

a) Uniformly distributed

b) Normally distributed

c) Exponentially distributed

d) Binomially distributed

Correct Answer: b) Normally distributed

Explanation:

About 68%, 95%, and 99.7% fall within 1, 2, and 3 standard deviations.

6. Pearson's correlation coefficient is undefined when:

a) Variance of X or Y is zero

b) Sample size is less than 3

c) Data are ordinal

d) Relationship is nonlinear

Correct Answer: a) Variance of X or Y is zero

Explanation:

Division by zero standard deviation makes r undefined.

7. Boxplot whiskers typically extend to:

a) Minimum and maximum values

b) 1.5 × IQR beyond Q1 and Q3

c) Mean ± 3σ

d) Q1 – Q3

Correct Answer: b) 1.5 × IQR beyond Q1 and Q3

Explanation:

Points outside are flagged as potential outliers.

8. The coefficient of variation (CV) is useful for comparing:

a) Skewness across datasets

b) Relative variability when units or means differ

c) Absolute dispersion only

d) Correlation strength

Correct Answer: b) Relative variability when units or means differ

Explanation:

CV = (σ / μ) × 100% standardizes dispersion.

9. Z-score of a value x is given by:

a) (x – μ) / σ

b) (x + μ) / σ

c) (μ – x) / σ

d) (x – median) / IQR

Correct Answer: a) (x – μ) / σ

Explanation:

Measures deviations in standard deviation units.

10. Chebyshev’s theorem guarantees at least what proportion within k standard deviations (k>1)?

a) 1 – 1/k

b) 1 – 1/k²

c) 1 – 1/(2k)

d) 1 – 2/k²

Correct Answer: b) 1 – 1/k²

Explanation:

Applies to any distribution with finite variance.

11. The sampling distribution of the sample mean becomes approximately normal when n ≥ 30 due to:

a) Law of large numbers

b) Central Limit Theorem

c) Bayes’ theorem

d) Chebyshev’s inequality

Correct Answer: b) Central Limit Theorem

Explanation:

CLT justifies normality for large samples regardless of population shape.

12. A Type I error occurs when we:

a) Fail to reject a false null

b) Reject a true null

c) Reject a false null

d) Fail to reject a true null

Correct Answer: b) Reject a true null

Explanation:

False positive; probability equals α.

13. The p-value is the probability of obtaining a test statistic at least as extreme as observed, assuming:

a) H₁ is true

b) H₀ is true

c) Sample size is large

d) Data are normal

Correct Answer: b) H₀ is true

Explanation:

Small p-value casts doubt on the null.

14. For a two-tailed z-test at α = 0.05, the critical values are:

a) ±1.645

b) ±1.96

c) ±2.33

d) ±2.58

Correct Answer: b) ±1.96

Explanation:

5% split equally in both tails.

15. The standard error of the mean is σ / √n; when σ is unknown we use:

a) Population variance

b) Sample standard deviation s

c) Median absolute deviation

d) Range / 4

Correct Answer: b) Sample standard deviation s

Explanation:

Leads to t-distribution with n–1 df.

16. Confidence interval width is proportional to:

a) 1 / √n

b) 1 / n

c) √n

d) n

Correct Answer: a) 1 / √n

Explanation:

Larger samples yield narrower intervals.

17. Power of a test = 1 – β, where β is:

a) Type I error rate

b) Type II error rate

c) Significance level

d) Confidence level

Correct Answer: b) Type II error rate

Explanation:

Probability of failing to detect a true effect.

18. For proportions, the standard error is √[p(1–p)/n]; for confidence intervals we often use:

a) Sample proportion p̂

b) Population proportion p

c) 0.5 for maximum variability

d) 1/n

Correct Answer: a) Sample proportion p̂

Explanation:

Wald interval: p̂ ± z√[p̂(1–p̂)/n].

19. The chi-square test for independence tests whether:

a) Row and column variables are associated

b) Means of two groups differ

c) Variance equals a constant

d) Data follow a normal curve

Correct Answer: a) Row and column variables are associated

Explanation:

Compares observed vs expected frequencies.

20. ANOVA tests equality of:

a) Two population means

b) Three or more population means

c) Variances across groups

d) Proportions

Correct Answer: b) Three or more population means

Explanation:

F-statistic = MSB / MSW.

21. A time series is stationary if:

a) Mean, variance, and autocovariance are time-invariant

b) Trend is linear

c) Seasonality is present

d) Data are i.i.d.

Correct Answer: a) Mean, variance, and autocovariance are time-invariant

Explanation:

Strict stationarity requires constant distribution.

22. The ACF at lag k measures:

a) Correlation between y_t and y_{t–k}

b) Variance of y_t

c) Trend strength

d) Seasonal period

Correct Answer: a) Correlation between y_t and y_{t–k}

Explanation:

Helps identify MA order.

23. In an AR(1) model y_t = φ y_{t–1} + ε_t, |φ| < 1 ensures:

a) Stationarity

b) Explosive behavior

c) Unit root

d) Seasonality

Correct Answer: a) Stationarity

Explanation:

Root outside unit circle.

24. Differencing a series once removes:

a) Linear trend

b) Constant mean

c) Seasonality of period 1

d) White noise

Correct Answer: a) Linear trend

Explanation:

Δy_t = y_t – y_{t–1} eliminates polynomial trend of order 1.

25. The PACF of an MA(1) process cuts off after:

a) Lag 1

b) Lag 2

c) Never cuts off

d) Lag 0

Correct Answer: a) Lag 1

Explanation:

Partial correlation beyond lag 1 is zero.

26. ADF test null hypothesis is:

a) Series has a unit root (non-stationary)

b) Series is stationary

c) Series has trend

d) Series is white noise

Correct Answer: a) Series has a unit root (non-stationary)

Explanation:

Reject H0 ⇒ stationary.

27. In SARIMA(p,d,q)(P,D,Q)s, 's' denotes:

a) Seasonal period

b) Differencing order

c) AR order

d) MA order

Correct Answer: a) Seasonal period

Explanation:

Common values: 12 (monthly), 4 (quarterly).

28. AIC penalizes models for:

a) Number of parameters

b) Residual variance only

c) Outliers

d) Forecast horizon

Correct Answer: a) Number of parameters

Explanation:

Lower AIC indicates better balance of fit and complexity.

29. White noise has ACF values approximately:

a) Zero for all lags > 0

b) One at lag 0, zero elsewhere

c) Decaying exponentially

d) Significant at seasonal lags

Correct Answer: a) Zero for all lags > 0

Explanation:

Uncorrelated errors.

30. Holt-Winters additive model is suitable when seasonal fluctuations:

a) Are roughly constant in size

b) Increase with the level

c) Are multiplicative

d) Are absent

Correct Answer: a) Are roughly constant in size

Explanation:

Multiplicative version for proportional seasonality.

31. The variance of a dataset {2, 4, 6, 8, 10} is:

a) 6

b) 8

c) 10

d) 12

Correct Answer: b) 8

Explanation:

Mean = 6; variance = [(−4)² + (−2)² + 0 + 2² + 4²]/4 = 40/5 = 8.

32. For the data {1, 3, 3, 6}, the mode is:

a) 1

b) 3

c) 6

d) Bimodal

Correct Answer: b) 3

Explanation:

3 appears twice; others once.

33. Skewness coefficient > 0 indicates:

a) Symmetric distribution

b) Left-skewed

c) Right-skewed

d) Platykurtic

Correct Answer: c) Right-skewed

Explanation:

Longer tail on the positive side.

34. Percentile rank of the median is:

a) 25th

b) 50th

c) 75th

d) 100th

Correct Answer: b) 50th

Explanation:

Half the data lie below the median.

35. Geometric mean is preferred for:

a) Averaging rates of change

b) Additive data

c) Nominal variables

d) Counts

Correct Answer: a) Averaging rates of change

Explanation:

Handles compounding/multiplicative processes.

36. Covariance of a variable with itself equals its:

a) Variance

b) Standard deviation

c) Mean

d) Range

Correct Answer: a) Variance

Explanation:

Cov(X,X) = Var(X).

37. Spearman’s rank correlation is based on:

a) Original values

b) Ranks of the data

c) Deviations from mean

d) Log-transformed values

Correct Answer: b) Ranks of the data

Explanation:

Non-parametric measure of monotonic relationship.

38. The 95% CI for μ when n=25, x̄=50, s=10 (t-critical ≈ 2.064) is:

a) 50 ± 4.13

b) 50 ± 3.92

c) 50 ± 2.06

d) 50 ± 1.96

Correct Answer: a) 50 ± 4.13

Explanation:

Margin = 2.064 × (10/√25) = 2.064 × 2 ≈ 4.13.

39. A test statistic z = 2.5, α = 0.01 two-tailed; decision is:

a) Reject H0

b) Fail to reject H0

c) p = 0.0124

d) Need df

Correct Answer: b) Fail to reject H0

Explanation:

Critical |z| = 2.576 > 2.5.

40. Minimum sample size for proportion CI with margin of error E=0.03, p*=0.5, z=1.96:

a) 1068

b) 752

c) 385

d) 267

Correct Answer: a) 1068

Explanation:

n = (1.96² × 0.5 × 0.5) / 0.03² ≈ 1067.11 → 1068.

41. For paired t-test, degrees of freedom =

a) n₁ + n₂ – 2

b) n – 1

c) 2n – 1

d) n

Correct Answer: b) n – 1

Explanation:

Based on differences (n pairs).

42. F-test is used to compare:

a) Two variances

b) Two means

c) Proportions

d) Correlations

Correct Answer: a) Two variances

Explanation:

H0: σ₁² = σ₂².

43. Mann-Whitney U tests difference in:

a) Medians of two independent samples

b) Means of paired data

c) Variances

d) Proportions

Correct Answer: a) Medians of two independent samples

Explanation:

Non-parametric alternative to two-sample t-test.

44. Kruskal-Wallis is the non-parametric version of:

a) One-way ANOVA

b) Paired t-test

c) Chi-square goodness-of-fit

d) Linear regression

Correct Answer: a) One-way ANOVA

Explanation:

Compares three or more independent groups.

45. The Durbin-Watson statistic near 2 indicates:

a) No autocorrelation

b) Positive autocorrelation

c) Negative autocorrelation

d) Heteroscedasticity

Correct Answer: a) No autocorrelation

Explanation:

Range 0–4; 2 is ideal.

46. In exponential smoothing, α close to 1 gives more weight to:

a) Recent observations

b) Older observations

c) Seasonal component

d) Trend

Correct Answer: a) Recent observations

Explanation:

Higher α reacts faster to changes.

47. An AR(2) model requires PACF significant at:

a) Lags 1 and 2

b) Lag 1 only

c) All lags

d) Lag 2 only

Correct Answer: a) Lags 1 and 2

Explanation:

PACF cuts off after p.

48. Seasonal differencing of period 12 is denoted as:

a) ∇₁₂ y_t

b) ∇ y_t

c) log y_t

d) y_t – y_{t–1}

Correct Answer: a) ∇₁₂ y_t

Explanation:

∇₁₂ y_t = y_t – y_{t–12}.

49. Ljung-Box test checks for:

a) Lack of autocorrelation in residuals

b) Normality

c) Stationarity

d) Homoscedasticity

Correct Answer: a) Lack of autocorrelation in residuals

Explanation:

High p-value supports white noise.

50. KPSS test null hypothesis is:

a) Stationarity

b) Unit root

c) Trend

d) Seasonality

Correct Answer: a) Stationarity

Explanation:

Complements ADF; fail to reject ⇒ stationary.

51. In ARIMA, 'I' stands for:

a) Integrated

b) Invertible

c) Independent

d) Intercept

Correct Answer: a) Integrated

Explanation:

Order of differencing to achieve stationarity.

52. Forecasts from a random walk model are:

a) Last observed value

b) Mean of series

c) Zero

d) Trend line

Correct Answer: a) Last observed value

Explanation:

y_{t+1} = y_t + ε_{t+1} ⇒ best forecast = y_t.

53. Granger causality tests whether past values of X improve prediction of:

a) Y beyond its own past

b) X itself

c) Error term

d) Trend

Correct Answer: a) Y beyond its own past

Explanation:

Rejects non-causality if X lags are significant in Y equation.

54. Variance inflation factor (VIF) > 10 suggests:

a) Severe multicollinearity

b) Heteroscedasticity

c) Autocorrelation

d) Non-normality

Correct Answer: a) Severe multicollinearity

Explanation:

VIF_j = 1 / (1 – R_j²).

55. The coefficient of determination R² represents:

a) Proportion of variance explained

b) Correlation coefficient

c) Slope of regression

d) p-value

Correct Answer: a) Proportion of variance explained

Explanation:

SSR / SST.

56. In simple linear regression, the least squares slope b1 =

a) Cov(X,Y) / Var(X)

b) Cov(X,Y) / Var(Y)

c) Mean(Y) / Mean(X)

d) SD(Y) / SD(X)

Correct Answer: a) Cov(X,Y) / Var(X)

Explanation:

Minimizes sum of squared residuals.

57. Residual standard error estimates:

a) σ, the error standard deviation

b) β0

c) R²

d) F-statistic

Correct Answer: a) σ, the error standard deviation

Explanation:

√(SSE / (n–2)).

58. For logistic regression, the link function is:

a) Logit

b) Probit

c) Log

d) Identity

Correct Answer: a) Logit

Explanation:

log(p/(1–p)) = β0 + β1x.

59. The odds ratio exp(β) = 1.5 means:

a) 50% increase in odds per unit increase in x

b) 1.5 times the probability

c) Log odds increase by 1.5

d) Probability increases by 0.5

Correct Answer: a) 50% increase in odds per unit increase in x

Explanation:

Odds multiply by exp(β).

60. Poisson regression is suitable for:

a) Count data

b) Binary outcomes

c) Continuous positive data

d) Time to event

Correct Answer: a) Count data

Explanation:

Mean = variance.

61. In survival analysis, Kaplan-Meier estimates:

a) Survival function non-parametrically

b) Hazard ratio

c) Median survival time only

d) Proportional hazards

Correct Answer: a) Survival function non-parametrically

Explanation:

Product-limit estimator handles censoring.

62. Cox proportional hazards assumption is violated if:

a) Hazard ratios change over time

b) Log(-log(S(t))) curves are parallel

c) Schoenfeld residuals show no trend

d) p-value < 0.05

Correct Answer: a) Hazard ratios change over time

Explanation:

Check via time-dependent covariates or plots.

63. Bayesian inference updates beliefs using:

a) Prior × Likelihood → Posterior

b) Likelihood only

c) Posterior × Data

d) Prior / Likelihood

Correct Answer: a) Prior × Likelihood → Posterior

Explanation:

Bayes’ theorem: P(θ|data) ∝ P(data|θ) P(θ).

64. Conjugate prior for normal mean (known variance) is:

a) Normal

b) Gamma

c) Beta

d) Inverse-gamma

Correct Answer: a) Normal

Explanation:

Posterior remains normal.

65. MCMC methods are used to:

a) Sample from complex posterior distributions

b) Compute exact integrals

c) Perform hypothesis tests

d) Calculate p-values

Correct Answer: a) Sample from complex posterior distributions

Explanation:

Markov Chain Monte Carlo approximates posteriors.

66. Bootstrapping estimates sampling distribution by:

a) Resampling with replacement from the data

b) Using parametric assumptions

c) Increasing sample size

d) Theoretical formulas

Correct Answer: a) Resampling with replacement from the data

Explanation:

Non-parametric; empirical distribution.

67. Percentile bootstrap CI uses:

a) 2.5th and 97.5th percentiles of bootstrap statistics

b) Mean ± 1.96 SE

c) t-distribution

d) Chi-square

Correct Answer: a) 2.5th and 97.5th percentiles of bootstrap statistics

Explanation:

For 95% CI from B=1000 replicates.

68. Cross-validation is primarily used to assess:

a) Model generalization error

b) p-values

c) Confidence intervals

d) Residual normality

Correct Answer: a) Model generalization error

Explanation:

K-fold CV estimates out-of-sample performance.

69. The bias-variance tradeoff implies that overly complex models tend to:

a) Overfit (high variance, low bias)

b) Underfit (high bias, low variance)

c) Balance perfectly

d) Have zero error

Correct Answer: a) Overfit (high variance, low bias)

Explanation:

Capture noise, not just signal.

70. Principal Component Analysis (PCA) maximizes:

a) Variance along projected directions

b) Correlation between variables

c) Mean squared error

d) Entropy

Correct Answer: a) Variance along projected directions

Explanation:

Eigenvectors of covariance matrix.

71. The scree plot helps determine number of PCs by looking for:

a) Elbow in eigenvalue decline

b) Linear trend

c) Constant variance

d) Zero eigenvalues

Correct Answer: a) Elbow in eigenvalue decline

Explanation:

Retain components before sharp drop.

72. In k-means clustering, the objective is to minimize:

a) Within-cluster sum of squares

b) Between-cluster sum of squares

c) Total sum of squares

d) Silhouette score

Correct Answer: a) Within-cluster sum of squares

Explanation:

Iterative assignment and centroid update.

73. Silhouette coefficient ranges from:

a) –1 to +1

b) 0 to 1

c) –∞ to +∞

d) 0 to ∞

Correct Answer: a) –1 to +1

Explanation:

Higher values indicate better cluster separation.

74. A contingency table with all expected frequencies ≥ 5 is required for:

a) Chi-square test validity

b) t-test

c) ANOVA

d) Regression

Correct Answer: a) Chi-square test validity

Explanation:

Approximation to chi-square distribution.

75. McNemar’s test is used for:

a) Paired binary data

b) Independent proportions

c) Continuous paired data

d) Multiple groups

Correct Answer: a) Paired binary data

Explanation:

Tests marginal homogeneity.

76. The central limit theorem requires random samples that are:

a) Independent and identically distributed

b) Normally distributed

c) Paired

d) Stratified

Correct Answer: a) Independent and identically distributed

Explanation:

For large n, sample mean ≈ normal.

77. Degrees of freedom for two-sample t-test (unequal variances) is approximately:

a) Welch-Satterthwaite formula

b) n1 + n2 – 1

c) n1 + n2 – 2

d) min(n1, n2) – 1

Correct Answer: a) Welch-Satterthwaite formula

Explanation:

Conservative; avoids equal variance assumption.

78. Effect size Cohen’s d = (μ1 – μ2) / σ; d = 0.8 is considered:

a) Large

b) Medium

c) Small

d) Negligible

Correct Answer: a) Large

Explanation:

Benchmarks: 0.2 small, 0.5 medium, 0.8 large.

79. Multiple R² in regression can be artificially inflated by:

a) Adding irrelevant predictors

b) Removing outliers

c) Transforming variables

d) Increasing sample size

Correct Answer: a) Adding irrelevant predictors

Explanation:

Use adjusted R² to penalize extra terms.

80. Homoscedasticity means residuals have:

a) Constant variance across predicted values

b) Zero mean

c) Normal distribution

d) No autocorrelation

Correct Answer: a) Constant variance across predicted values

Explanation:

Breusch-Pagan test checks this.

81. Durbin-Watson values < 1 typically indicate:

a) Strong positive autocorrelation

b) No autocorrelation

c) Negative autocorrelation

d) Heteroscedasticity

Correct Answer: a) Strong positive autocorrelation

Explanation:

Values near 0 = positive; near 4 = negative.

82. In time series decomposition, the remainder after trend and seasonal removal is:

a) Irregular (random) component

b) Cyclical component

c) Trend

d) Seasonal index

Correct Answer: a) Irregular (random) component

Explanation:

Should resemble white noise if model is good.

83. STL decomposition stands for:

a) Seasonal-Trend decomposition using LOESS

b) Simple Time Series Linear

c) Standard Trend Line

d) Smooth Time Lag

Correct Answer: a) Seasonal-Trend decomposition using LOESS

Explanation:

Robust, flexible for varying seasonal patterns.

84. ACF of a seasonal series with period 12 shows spikes at:

a) Multiples of 12

b) Lag 1 only

c) All lags

d) Lag 0

Correct Answer: a) Multiples of 12

Explanation:

Indicates need for seasonal AR/MA terms.

85. Box-Cox transformation is applied to stabilize:

a) Variance (heteroscedasticity)

b) Mean

c) Trend

d) Seasonality

Correct Answer: a) Variance (heteroscedasticity)

Explanation:

y(λ) = (y^λ – 1)/λ or log(y) for λ=0.

86. Over-differencing a stationary series introduces:

a) MA(1) structure with negative coefficient

b) Unit root

c) Trend

d) Seasonality

Correct Answer: a) MA(1) structure with negative coefficient

Explanation:

ACF shows spike at lag 1, then near zero.

87. The optimal ARIMA model often has residuals with:

a) No significant ACF/PACF spikes (white noise)

b) Significant lag 1

c) Linear trend

d) Seasonal pattern

Correct Answer: a) No significant ACF/PACF spikes (white noise)

Explanation:

Ljung-Box p > 0.05 supports adequacy.

88. In VAR models, each variable is modeled as a function of:

a) Its own lags and lags of all other variables

b) Only its own lags

c) Exogenous variables

d) Trend only

Correct Answer: a) Its own lags and lags of all other variables

Explanation:

Vector autoregression for multivariate time series.

89. Impulse response function traces effect of a shock in one variable on:

a) Future values of all variables

b) Past values

c) Error term

d) Constant term

Correct Answer: a) Future values of all variables

Explanation:

Shows dynamic responses in VAR.

90. Cointegration means two non-stationary series have:

a) A stationary linear combination

b) Identical trends

c) Zero correlation

d) Same variance

Correct Answer: a) A stationary linear combination

Explanation:

Long-run equilibrium relationship.

91. Johansen test is used to detect:

a) Number of cointegrating relationships

b) Unit roots

c) Granger causality

d) ARCH effects

Correct Answer: a) Number of cointegrating relationships

Explanation:

Trace and max-eigenvalue statistics.

92. ARCH model tests for:

a) Time-varying volatility (heteroscedasticity)

b) Mean reversion

c) Trend

d) Seasonality

Correct Answer: a) Time-varying volatility (heteroscedasticity)

Explanation:

Variance depends on past squared errors.

93. GARCH(1,1) models volatility as:

a) σ²_t = α₀ + α₁ ε²_{t–1} + β₁ σ²_{t–1}

b) σ_t = α₀ + α₁ |ε_{t–1}|

c) σ_t = constant

d) σ²_t = ε²_{t–1}

Correct Answer: a) σ²_t = α₀ + α₁ ε²_{t–1} + β₁ σ²_{t–1}

Explanation:

Combines ARCH and persistence.

94. The Jarque-Bera test assesses:

a) Normality of residuals (skewness + kurtosis)

b) Autocorrelation

c) Stationarity

d) Homoscedasticity

Correct Answer: a) Normality of residuals (skewness + kurtosis)

Explanation:

JB statistic ~ χ²(2).

95. Shapiro-Wilk test null hypothesis is:

a) Data come from a normal distribution

b) Data are skewed

c) Variance is constant

d) Mean equals zero

Correct Answer: a) Data come from a normal distribution

Explanation:

Powerful for small samples.

96. Levene’s test checks for:

a) Equality of variances across groups

b) Equality of means

c) Normality

d) Independence

Correct Answer: a) Equality of variances across groups

Explanation:

Robust to non-normality.

97. The Bonferroni correction adjusts α by:

a) Dividing by number of tests

b) Multiplying by number of tests

c) Using α/2

d) Square root

Correct Answer: a) Dividing by number of tests

Explanation:

Controls family-wise error rate conservatively.

98. Holm’s method is a:

a) Step-down multiple comparison procedure

b) Parametric test

c) Non-parametric ANOVA

d) Bayesian test

Correct Answer: a) Step-down multiple comparison procedure

Explanation:

Less conservative than Bonferroni.

99. Tukey’s HSD test is used after ANOVA to compare:

a) All pairwise means

b) Means vs control

c) Variances

d) Medians

Correct Answer: a) All pairwise means

Explanation:

Honest Significant Difference; assumes equal variances.

100. Dunnett’s test compares:

a) Multiple treatments vs a single control

b) All pairs

c) Proportions

d) Correlations

Correct Answer: a) Multiple treatments vs a single control

Explanation:

Fewer comparisons, higher power.

101. The Q-Q plot assesses normality by plotting:

a) Sample quantiles vs theoretical normal quantiles

b) Residuals vs fitted

c) ACF

d) Histogram

Correct Answer: a) Sample quantiles vs theoretical normal quantiles

Explanation:

Straight line indicates normality.

102. A leverage point in regression has high:

a) Distance from mean of X (hat diagonal)

b) Residual

c) Cook’s distance

d) Standardized coefficient

Correct Answer: a) Distance from mean of X (hat diagonal)

Explanation:

h_ii > 2p/n flags potential leverage.

103. Cook’s distance measures:

a) Influence of an observation on all fitted values

b) Residual size

c) Multicollinearity

d) Heteroscedasticity

Correct Answer: a) Influence of an observation on all fitted values

Explanation:

Large values (> 4/n) indicate influential points.

104. The partial F-test in regression compares:

a) Nested models (reduced vs full)

b) Two independent samples

c) Variances

d) Proportions

Correct Answer: a) Nested models (reduced vs full)

Explanation:

Tests significance of added predictors.

105. Ridge regression adds penalty:

a) λ Σ β_j² (L2)

b) λ Σ |β_j| (L1)

c) λ Σ (β_j – 1)²

d) No penalty

Correct Answer: a) λ Σ β_j² (L2)

Explanation:

Shrinks coefficients, handles multicollinearity.

106. Lasso can perform variable selection because it:

a) Sets some coefficients exactly to zero

b) Shrinks all equally

c) Increases variance

d) Removes intercept

Correct Answer: a) Sets some coefficients exactly to zero

Explanation:

L1 penalty promotes sparsity.

107. Elastic Net combines:

a) Ridge and Lasso penalties

b) Ridge and OLS

c) Lasso and PCR

d) PCR and PLS

Correct Answer: a) Ridge and Lasso penalties

Explanation:

Useful when p > n or high correlation.