100 Descriptive, Inferential, and Time Series Statistics in Data Analysis - MCQs

Category: 1000 Data Analysis MCQDate: Published: November 8, 2025Posted by: MCQs Generator

100 challenging multiple-choice questions on descriptive statistics, inferential methods, and time series analysis. Inspired by real data science and analytics interview questions from FAANG, consulting firms, and quant roles.

1. Which measure of central tendency is most affected by extreme outliers?

a) Median

b) Mode

c) Mean

d) Midrange

✅ Correct Answer: c) Mean

📝 Explanation:

The mean incorporates every value and is pulled toward extreme scores.

2. In a positively skewed distribution, the correct order of mean, median, and mode is:

a) Mean > Median > Mode

b) Mode > Median > Mean

c) Median > Mean > Mode

d) Mean > Mode > Median

✅ Correct Answer: a) Mean > Median > Mode

📝 Explanation:

The tail on the right pulls the mean highest, followed by median, then mode.

3. The interquartile range (IQR) is calculated as:

a) Q3 + Q1

b) Q3 – Q1

c) Q2 – Q1

d) Q3 / Q1

✅ Correct Answer: b) Q3 – Q1

📝 Explanation:

IQR measures spread between the 75th and 25th percentiles.

4. Which of the following is NOT a measure of dispersion?

a) Range

b) Variance

c) Kurtosis

d) Median

✅ Correct Answer: d) Median

📝 Explanation:

Median describes location, not spread.

5. The empirical rule applies to data that are approximately:

a) Uniformly distributed

b) Normally distributed

c) Exponentially distributed

d) Binomially distributed

✅ Correct Answer: b) Normally distributed

📝 Explanation:

About 68%, 95%, and 99.7% fall within 1, 2, and 3 standard deviations.

6. Pearson's correlation coefficient is undefined when:

a) Variance of X or Y is zero

b) Sample size is less than 3

c) Data are ordinal

d) Relationship is nonlinear

✅ Correct Answer: a) Variance of X or Y is zero

📝 Explanation:

Division by zero standard deviation makes r undefined.

7. Boxplot whiskers typically extend to:

a) Minimum and maximum values

b) 1.5 × IQR beyond Q1 and Q3

c) Mean ± 3σ

d) Q1 – Q3

✅ Correct Answer: b) 1.5 × IQR beyond Q1 and Q3

📝 Explanation:

Points outside are flagged as potential outliers.

8. The coefficient of variation (CV) is useful for comparing:

a) Skewness across datasets

b) Relative variability when units or means differ

c) Absolute dispersion only

d) Correlation strength

✅ Correct Answer: b) Relative variability when units or means differ

📝 Explanation:

CV = (σ / μ) × 100% standardizes dispersion.

9. Z-score of a value x is given by:

a) (x – μ) / σ

b) (x + μ) / σ

c) (μ – x) / σ

d) (x – median) / IQR

✅ Correct Answer: a) (x – μ) / σ

📝 Explanation:

Measures deviations in standard deviation units.

10. Chebyshev’s theorem guarantees at least what proportion within k standard deviations (k>1)?

a) 1 – 1/k

b) 1 – 1/k²

c) 1 – 1/(2k)

d) 1 – 2/k²

✅ Correct Answer: b) 1 – 1/k²

📝 Explanation:

Applies to any distribution with finite variance.

11. The sampling distribution of the sample mean becomes approximately normal when n ≥ 30 due to:

a) Law of large numbers

b) Central Limit Theorem

c) Bayes’ theorem

d) Chebyshev’s inequality

✅ Correct Answer: b) Central Limit Theorem

📝 Explanation:

CLT justifies normality for large samples regardless of population shape.

12. A Type I error occurs when we:

a) Fail to reject a false null

b) Reject a true null

c) Reject a false null

d) Fail to reject a true null

✅ Correct Answer: b) Reject a true null

📝 Explanation:

False positive; probability equals α.

13. The p-value is the probability of obtaining a test statistic at least as extreme as observed, assuming:

a) H₁ is true

b) H₀ is true

c) Sample size is large

d) Data are normal

✅ Correct Answer: b) H₀ is true

📝 Explanation:

Small p-value casts doubt on the null.

14. For a two-tailed z-test at α = 0.05, the critical values are:

a) ±1.645

b) ±1.96

c) ±2.33

d) ±2.58

✅ Correct Answer: b) ±1.96

📝 Explanation:

5% split equally in both tails.

15. The standard error of the mean is σ / √n; when σ is unknown we use:

a) Population variance

b) Sample standard deviation s

c) Median absolute deviation

d) Range / 4

✅ Correct Answer: b) Sample standard deviation s

📝 Explanation:

Leads to t-distribution with n–1 df.

16. Confidence interval width is proportional to:

a) 1 / √n

b) 1 / n

c) √n

d) n

✅ Correct Answer: a) 1 / √n

📝 Explanation:

Larger samples yield narrower intervals.

17. Power of a test = 1 – β, where β is:

a) Type I error rate

b) Type II error rate

c) Significance level

d) Confidence level

✅ Correct Answer: b) Type II error rate

📝 Explanation:

Probability of failing to detect a true effect.

18. For proportions, the standard error is √[p(1–p)/n]; for confidence intervals we often use:

a) Sample proportion p̂

b) Population proportion p

c) 0.5 for maximum variability

d) 1/n

✅ Correct Answer: a) Sample proportion p̂

📝 Explanation:

Wald interval: p̂ ± z√[p̂(1–p̂)/n].

19. The chi-square test for independence tests whether:

a) Row and column variables are associated

b) Means of two groups differ

c) Variance equals a constant

d) Data follow a normal curve

✅ Correct Answer: a) Row and column variables are associated

📝 Explanation:

Compares observed vs expected frequencies.

20. ANOVA tests equality of:

a) Two population means

b) Three or more population means

c) Variances across groups

d) Proportions

✅ Correct Answer: b) Three or more population means

📝 Explanation:

F-statistic = MSB / MSW.

21. A time series is stationary if:

a) Mean, variance, and autocovariance are time-invariant

b) Trend is linear

c) Seasonality is present

d) Data are i.i.d.

✅ Correct Answer: a) Mean, variance, and autocovariance are time-invariant

📝 Explanation:

Strict stationarity requires constant distribution.

22. The ACF at lag k measures:

a) Correlation between y_t and y_{t–k}

b) Variance of y_t

c) Trend strength

d) Seasonal period

✅ Correct Answer: a) Correlation between y_t and y_{t–k}

📝 Explanation:

Helps identify MA order.

23. In an AR(1) model y_t = φ y_{t–1} + ε_t, |φ| < 1 ensures:

a) Stationarity

b) Explosive behavior

c) Unit root

d) Seasonality

✅ Correct Answer: a) Stationarity

📝 Explanation:

Root outside unit circle.

24. Differencing a series once removes:

a) Linear trend

b) Constant mean

c) Seasonality of period 1

d) White noise

✅ Correct Answer: a) Linear trend

📝 Explanation:

Δy_t = y_t – y_{t–1} eliminates polynomial trend of order 1.

25. The PACF of an MA(1) process cuts off after:

a) Lag 1

b) Lag 2

c) Never cuts off

d) Lag 0

✅ Correct Answer: a) Lag 1

📝 Explanation:

Partial correlation beyond lag 1 is zero.

26. ADF test null hypothesis is:

a) Series has a unit root (non-stationary)

b) Series is stationary

c) Series has trend

d) Series is white noise

✅ Correct Answer: a) Series has a unit root (non-stationary)

📝 Explanation:

Reject H0 ⇒ stationary.

27. In SARIMA(p,d,q)(P,D,Q)s, 's' denotes:

a) Seasonal period

b) Differencing order

c) AR order

d) MA order

✅ Correct Answer: a) Seasonal period

📝 Explanation:

Common values: 12 (monthly), 4 (quarterly).

28. AIC penalizes models for:

a) Number of parameters

b) Residual variance only

c) Outliers

d) Forecast horizon

✅ Correct Answer: a) Number of parameters

📝 Explanation:

Lower AIC indicates better balance of fit and complexity.

29. White noise has ACF values approximately:

a) Zero for all lags > 0

b) One at lag 0, zero elsewhere

c) Decaying exponentially

d) Significant at seasonal lags

✅ Correct Answer: a) Zero for all lags > 0

📝 Explanation:

Uncorrelated errors.

30. Holt-Winters additive model is suitable when seasonal fluctuations:

a) Are roughly constant in size

b) Increase with the level

c) Are multiplicative

d) Are absent

✅ Correct Answer: a) Are roughly constant in size

📝 Explanation:

Multiplicative version for proportional seasonality.

31. The variance of a dataset {2, 4, 6, 8, 10} is:

a) 6

b) 8

c) 10

d) 12

✅ Correct Answer: b) 8

📝 Explanation:

Mean = 6; variance = [(−4)² + (−2)² + 0 + 2² + 4²]/4 = 40/5 = 8.

32. For the data {1, 3, 3, 6}, the mode is:

a) 1

b) 3

c) 6

d) Bimodal

✅ Correct Answer: b) 3

📝 Explanation:

3 appears twice; others once.

33. Skewness coefficient > 0 indicates:

a) Symmetric distribution

b) Left-skewed

c) Right-skewed

d) Platykurtic

✅ Correct Answer: c) Right-skewed

📝 Explanation:

Longer tail on the positive side.

34. Percentile rank of the median is:

a) 25th

b) 50th

c) 75th

d) 100th

✅ Correct Answer: b) 50th

📝 Explanation:

Half the data lie below the median.

35. Geometric mean is preferred for:

a) Averaging rates of change

b) Additive data

c) Nominal variables

d) Counts

✅ Correct Answer: a) Averaging rates of change

📝 Explanation:

Handles compounding/multiplicative processes.

36. Covariance of a variable with itself equals its:

a) Variance

b) Standard deviation

c) Mean

d) Range

✅ Correct Answer: a) Variance

📝 Explanation:

Cov(X,X) = Var(X).

37. Spearman’s rank correlation is based on:

a) Original values

b) Ranks of the data

c) Deviations from mean

d) Log-transformed values

✅ Correct Answer: b) Ranks of the data

📝 Explanation:

Non-parametric measure of monotonic relationship.

38. The 95% CI for μ when n=25, x̄=50, s=10 (t-critical ≈ 2.064) is:

a) 50 ± 4.13

b) 50 ± 3.92

c) 50 ± 2.06

d) 50 ± 1.96

✅ Correct Answer: a) 50 ± 4.13

📝 Explanation:

Margin = 2.064 × (10/√25) = 2.064 × 2 ≈ 4.13.

39. A test statistic z = 2.5, α = 0.01 two-tailed; decision is:

a) Reject H0

b) Fail to reject H0

c) p = 0.0124

d) Need df

✅ Correct Answer: b) Fail to reject H0

📝 Explanation:

Critical |z| = 2.576 > 2.5.

40. Minimum sample size for proportion CI with margin of error E=0.03, p*=0.5, z=1.96:

a) 1068

b) 752

c) 385

d) 267

✅ Correct Answer: a) 1068

📝 Explanation:

n = (1.96² × 0.5 × 0.5) / 0.03² ≈ 1067.11 → 1068.

41. For paired t-test, degrees of freedom =

a) n₁ + n₂ – 2

b) n – 1

c) 2n – 1

d) n

✅ Correct Answer: b) n – 1

📝 Explanation:

Based on differences (n pairs).

42. F-test is used to compare:

a) Two variances

b) Two means

c) Proportions

d) Correlations

✅ Correct Answer: a) Two variances

📝 Explanation:

H0: σ₁² = σ₂².

43. Mann-Whitney U tests difference in:

a) Medians of two independent samples

b) Means of paired data

c) Variances

d) Proportions

✅ Correct Answer: a) Medians of two independent samples

📝 Explanation:

Non-parametric alternative to two-sample t-test.

44. Kruskal-Wallis is the non-parametric version of:

a) One-way ANOVA

b) Paired t-test

c) Chi-square goodness-of-fit

d) Linear regression

✅ Correct Answer: a) One-way ANOVA

📝 Explanation:

Compares three or more independent groups.

45. The Durbin-Watson statistic near 2 indicates:

a) No autocorrelation

b) Positive autocorrelation

c) Negative autocorrelation

d) Heteroscedasticity

✅ Correct Answer: a) No autocorrelation

📝 Explanation:

Range 0–4; 2 is ideal.

46. In exponential smoothing, α close to 1 gives more weight to:

a) Recent observations

b) Older observations

c) Seasonal component

d) Trend

✅ Correct Answer: a) Recent observations

📝 Explanation:

Higher α reacts faster to changes.

47. An AR(2) model requires PACF significant at:

a) Lags 1 and 2

b) Lag 1 only

c) All lags

d) Lag 2 only

✅ Correct Answer: a) Lags 1 and 2

📝 Explanation:

PACF cuts off after p.

48. Seasonal differencing of period 12 is denoted as:

a) ∇₁₂ y_t

b) ∇ y_t

c) log y_t

d) y_t – y_{t–1}

✅ Correct Answer: a) ∇₁₂ y_t

📝 Explanation:

∇₁₂ y_t = y_t – y_{t–12}.

49. Ljung-Box test checks for:

a) Lack of autocorrelation in residuals

b) Normality

c) Stationarity

d) Homoscedasticity

✅ Correct Answer: a) Lack of autocorrelation in residuals

📝 Explanation:

High p-value supports white noise.

50. KPSS test null hypothesis is:

a) Stationarity

b) Unit root

c) Trend

d) Seasonality

✅ Correct Answer: a) Stationarity

📝 Explanation:

Complements ADF; fail to reject ⇒ stationary.

51. In ARIMA, 'I' stands for:

a) Integrated

b) Invertible

c) Independent

d) Intercept

✅ Correct Answer: a) Integrated

📝 Explanation:

Order of differencing to achieve stationarity.

52. Forecasts from a random walk model are:

a) Last observed value

b) Mean of series

c) Zero

d) Trend line

✅ Correct Answer: a) Last observed value

📝 Explanation:

y_{t+1} = y_t + ε_{t+1} ⇒ best forecast = y_t.

53. Granger causality tests whether past values of X improve prediction of:

a) Y beyond its own past

b) X itself

c) Error term

d) Trend

✅ Correct Answer: a) Y beyond its own past

📝 Explanation:

Rejects non-causality if X lags are significant in Y equation.

54. Variance inflation factor (VIF) > 10 suggests:

a) Severe multicollinearity

b) Heteroscedasticity

c) Autocorrelation

d) Non-normality

✅ Correct Answer: a) Severe multicollinearity

📝 Explanation:

VIF_j = 1 / (1 – R_j²).

55. The coefficient of determination R² represents:

a) Proportion of variance explained

b) Correlation coefficient

c) Slope of regression

d) p-value

✅ Correct Answer: a) Proportion of variance explained

📝 Explanation:

SSR / SST.

56. In simple linear regression, the least squares slope b1 =

a) Cov(X,Y) / Var(X)

b) Cov(X,Y) / Var(Y)

c) Mean(Y) / Mean(X)

d) SD(Y) / SD(X)

✅ Correct Answer: a) Cov(X,Y) / Var(X)

📝 Explanation:

Minimizes sum of squared residuals.

57. Residual standard error estimates:

a) σ, the error standard deviation

b) β0

c) R²

d) F-statistic

✅ Correct Answer: a) σ, the error standard deviation

📝 Explanation:

√(SSE / (n–2)).

58. For logistic regression, the link function is:

a) Logit

b) Probit

c) Log

d) Identity

✅ Correct Answer: a) Logit

📝 Explanation:

log(p/(1–p)) = β0 + β1x.

59. The odds ratio exp(β) = 1.5 means:

a) 50% increase in odds per unit increase in x

b) 1.5 times the probability

c) Log odds increase by 1.5

d) Probability increases by 0.5

✅ Correct Answer: a) 50% increase in odds per unit increase in x

📝 Explanation:

Odds multiply by exp(β).

60. Poisson regression is suitable for:

a) Count data

b) Binary outcomes

c) Continuous positive data

d) Time to event

✅ Correct Answer: a) Count data

📝 Explanation:

Mean = variance.

61. In survival analysis, Kaplan-Meier estimates:

a) Survival function non-parametrically

b) Hazard ratio

c) Median survival time only

d) Proportional hazards

✅ Correct Answer: a) Survival function non-parametrically

📝 Explanation:

Product-limit estimator handles censoring.

62. Cox proportional hazards assumption is violated if:

a) Hazard ratios change over time

b) Log(-log(S(t))) curves are parallel

c) Schoenfeld residuals show no trend

d) p-value < 0.05

✅ Correct Answer: a) Hazard ratios change over time

📝 Explanation:

Check via time-dependent covariates or plots.

63. Bayesian inference updates beliefs using:

a) Prior × Likelihood → Posterior

b) Likelihood only

c) Posterior × Data

d) Prior / Likelihood

✅ Correct Answer: a) Prior × Likelihood → Posterior

📝 Explanation:

Bayes’ theorem: P(θ|data) ∝ P(data|θ) P(θ).

64. Conjugate prior for normal mean (known variance) is:

a) Normal

b) Gamma

c) Beta

d) Inverse-gamma

✅ Correct Answer: a) Normal

📝 Explanation:

Posterior remains normal.

65. MCMC methods are used to:

a) Sample from complex posterior distributions

b) Compute exact integrals

c) Perform hypothesis tests

d) Calculate p-values

✅ Correct Answer: a) Sample from complex posterior distributions

📝 Explanation:

Markov Chain Monte Carlo approximates posteriors.

66. Bootstrapping estimates sampling distribution by:

a) Resampling with replacement from the data

b) Using parametric assumptions

c) Increasing sample size

d) Theoretical formulas

✅ Correct Answer: a) Resampling with replacement from the data

📝 Explanation:

Non-parametric; empirical distribution.

67. Percentile bootstrap CI uses:

a) 2.5th and 97.5th percentiles of bootstrap statistics

b) Mean ± 1.96 SE

c) t-distribution

d) Chi-square

✅ Correct Answer: a) 2.5th and 97.5th percentiles of bootstrap statistics

📝 Explanation:

For 95% CI from B=1000 replicates.

68. Cross-validation is primarily used to assess:

a) Model generalization error

b) p-values

c) Confidence intervals

d) Residual normality

✅ Correct Answer: a) Model generalization error

📝 Explanation:

K-fold CV estimates out-of-sample performance.

69. The bias-variance tradeoff implies that overly complex models tend to:

a) Overfit (high variance, low bias)

b) Underfit (high bias, low variance)

c) Balance perfectly

d) Have zero error

✅ Correct Answer: a) Overfit (high variance, low bias)

📝 Explanation:

Capture noise, not just signal.

70. Principal Component Analysis (PCA) maximizes:

a) Variance along projected directions

b) Correlation between variables

c) Mean squared error

d) Entropy

✅ Correct Answer: a) Variance along projected directions

📝 Explanation:

Eigenvectors of covariance matrix.

71. The scree plot helps determine number of PCs by looking for:

a) Elbow in eigenvalue decline

b) Linear trend

c) Constant variance

d) Zero eigenvalues

✅ Correct Answer: a) Elbow in eigenvalue decline

📝 Explanation:

Retain components before sharp drop.

72. In k-means clustering, the objective is to minimize:

a) Within-cluster sum of squares

b) Between-cluster sum of squares

c) Total sum of squares

d) Silhouette score

✅ Correct Answer: a) Within-cluster sum of squares

📝 Explanation:

Iterative assignment and centroid update.

73. Silhouette coefficient ranges from:

a) –1 to +1

b) 0 to 1

c) –∞ to +∞

d) 0 to ∞

✅ Correct Answer: a) –1 to +1

📝 Explanation:

Higher values indicate better cluster separation.

74. A contingency table with all expected frequencies ≥ 5 is required for:

a) Chi-square test validity

b) t-test

c) ANOVA

d) Regression

✅ Correct Answer: a) Chi-square test validity

📝 Explanation:

Approximation to chi-square distribution.

75. McNemar’s test is used for:

a) Paired binary data

b) Independent proportions

c) Continuous paired data

d) Multiple groups

✅ Correct Answer: a) Paired binary data

📝 Explanation:

Tests marginal homogeneity.

76. The central limit theorem requires random samples that are:

a) Independent and identically distributed

b) Normally distributed

c) Paired

d) Stratified

✅ Correct Answer: a) Independent and identically distributed

📝 Explanation:

For large n, sample mean ≈ normal.

77. Degrees of freedom for two-sample t-test (unequal variances) is approximately:

a) Welch-Satterthwaite formula

b) n1 + n2 – 1

c) n1 + n2 – 2

d) min(n1, n2) – 1

✅ Correct Answer: a) Welch-Satterthwaite formula

📝 Explanation:

Conservative; avoids equal variance assumption.

78. Effect size Cohen’s d = (μ1 – μ2) / σ; d = 0.8 is considered:

a) Large

b) Medium

c) Small

d) Negligible

✅ Correct Answer: a) Large

📝 Explanation:

Benchmarks: 0.2 small, 0.5 medium, 0.8 large.

79. Multiple R² in regression can be artificially inflated by:

a) Adding irrelevant predictors

b) Removing outliers

c) Transforming variables

d) Increasing sample size

✅ Correct Answer: a) Adding irrelevant predictors

📝 Explanation:

Use adjusted R² to penalize extra terms.

80. Homoscedasticity means residuals have:

a) Constant variance across predicted values

b) Zero mean

c) Normal distribution

d) No autocorrelation

✅ Correct Answer: a) Constant variance across predicted values

📝 Explanation:

Breusch-Pagan test checks this.

81. Durbin-Watson values < 1 typically indicate:

a) Strong positive autocorrelation

b) No autocorrelation

c) Negative autocorrelation

d) Heteroscedasticity

✅ Correct Answer: a) Strong positive autocorrelation

📝 Explanation:

Values near 0 = positive; near 4 = negative.

82. In time series decomposition, the remainder after trend and seasonal removal is:

a) Irregular (random) component

b) Cyclical component

c) Trend

d) Seasonal index

✅ Correct Answer: a) Irregular (random) component

📝 Explanation:

Should resemble white noise if model is good.

83. STL decomposition stands for:

a) Seasonal-Trend decomposition using LOESS

b) Simple Time Series Linear

c) Standard Trend Line

d) Smooth Time Lag

✅ Correct Answer: a) Seasonal-Trend decomposition using LOESS

📝 Explanation:

Robust, flexible for varying seasonal patterns.

84. ACF of a seasonal series with period 12 shows spikes at:

a) Multiples of 12

b) Lag 1 only

c) All lags

d) Lag 0

✅ Correct Answer: a) Multiples of 12

📝 Explanation:

Indicates need for seasonal AR/MA terms.

85. Box-Cox transformation is applied to stabilize:

a) Variance (heteroscedasticity)

b) Mean

c) Trend

d) Seasonality

✅ Correct Answer: a) Variance (heteroscedasticity)

📝 Explanation:

y(λ) = (y^λ – 1)/λ or log(y) for λ=0.

86. Over-differencing a stationary series introduces:

a) MA(1) structure with negative coefficient

b) Unit root

c) Trend

d) Seasonality

✅ Correct Answer: a) MA(1) structure with negative coefficient

📝 Explanation:

ACF shows spike at lag 1, then near zero.

87. The optimal ARIMA model often has residuals with:

a) No significant ACF/PACF spikes (white noise)

b) Significant lag 1

c) Linear trend

d) Seasonal pattern

✅ Correct Answer: a) No significant ACF/PACF spikes (white noise)

📝 Explanation:

Ljung-Box p > 0.05 supports adequacy.

88. In VAR models, each variable is modeled as a function of:

a) Its own lags and lags of all other variables

b) Only its own lags

c) Exogenous variables

d) Trend only

✅ Correct Answer: a) Its own lags and lags of all other variables

📝 Explanation:

Vector autoregression for multivariate time series.

89. Impulse response function traces effect of a shock in one variable on:

a) Future values of all variables

b) Past values

c) Error term

d) Constant term

✅ Correct Answer: a) Future values of all variables

📝 Explanation:

Shows dynamic responses in VAR.

90. Cointegration means two non-stationary series have:

a) A stationary linear combination

b) Identical trends

c) Zero correlation

d) Same variance

✅ Correct Answer: a) A stationary linear combination

📝 Explanation:

Long-run equilibrium relationship.

91. Johansen test is used to detect:

a) Number of cointegrating relationships

b) Unit roots

c) Granger causality

d) ARCH effects

✅ Correct Answer: a) Number of cointegrating relationships

📝 Explanation:

Trace and max-eigenvalue statistics.

92. ARCH model tests for:

a) Time-varying volatility (heteroscedasticity)

b) Mean reversion

c) Trend

d) Seasonality

✅ Correct Answer: a) Time-varying volatility (heteroscedasticity)

📝 Explanation:

Variance depends on past squared errors.

93. GARCH(1,1) models volatility as:

a) σ²_t = α₀ + α₁ ε²_{t–1} + β₁ σ²_{t–1}

b) σ_t = α₀ + α₁ |ε_{t–1}|

c) σ_t = constant

d) σ²_t = ε²_{t–1}

✅ Correct Answer: a) σ²_t = α₀ + α₁ ε²_{t–1} + β₁ σ²_{t–1}

📝 Explanation:

Combines ARCH and persistence.

94. The Jarque-Bera test assesses:

a) Normality of residuals (skewness + kurtosis)

b) Autocorrelation

c) Stationarity

d) Homoscedasticity

✅ Correct Answer: a) Normality of residuals (skewness + kurtosis)

📝 Explanation:

JB statistic ~ χ²(2).

95. Shapiro-Wilk test null hypothesis is:

a) Data come from a normal distribution

b) Data are skewed

c) Variance is constant

d) Mean equals zero

✅ Correct Answer: a) Data come from a normal distribution

📝 Explanation:

Powerful for small samples.

96. Levene’s test checks for:

a) Equality of variances across groups

b) Equality of means

c) Normality

d) Independence

✅ Correct Answer: a) Equality of variances across groups

📝 Explanation:

Robust to non-normality.

97. The Bonferroni correction adjusts α by:

a) Dividing by number of tests

b) Multiplying by number of tests

c) Using α/2

d) Square root

✅ Correct Answer: a) Dividing by number of tests

📝 Explanation:

Controls family-wise error rate conservatively.

98. Holm’s method is a:

a) Step-down multiple comparison procedure

b) Parametric test

c) Non-parametric ANOVA

d) Bayesian test

✅ Correct Answer: a) Step-down multiple comparison procedure

📝 Explanation:

Less conservative than Bonferroni.

99. Tukey’s HSD test is used after ANOVA to compare:

a) All pairwise means

b) Means vs control

c) Variances

d) Medians

✅ Correct Answer: a) All pairwise means

📝 Explanation:

Honest Significant Difference; assumes equal variances.

100. Dunnett’s test compares:

a) Multiple treatments vs a single control

b) All pairs

c) Proportions

d) Correlations

✅ Correct Answer: a) Multiple treatments vs a single control

📝 Explanation:

Fewer comparisons, higher power.

101. The Q-Q plot assesses normality by plotting:

a) Sample quantiles vs theoretical normal quantiles

b) Residuals vs fitted

c) ACF

d) Histogram

✅ Correct Answer: a) Sample quantiles vs theoretical normal quantiles

📝 Explanation:

Straight line indicates normality.

102. A leverage point in regression has high:

a) Distance from mean of X (hat diagonal)

b) Residual

c) Cook’s distance

d) Standardized coefficient

✅ Correct Answer: a) Distance from mean of X (hat diagonal)

📝 Explanation:

h_ii > 2p/n flags potential leverage.

103. Cook’s distance measures:

a) Influence of an observation on all fitted values

b) Residual size

c) Multicollinearity

d) Heteroscedasticity

✅ Correct Answer: a) Influence of an observation on all fitted values

📝 Explanation:

Large values (> 4/n) indicate influential points.

104. The partial F-test in regression compares:

a) Nested models (reduced vs full)

b) Two independent samples

c) Variances

d) Proportions

✅ Correct Answer: a) Nested models (reduced vs full)

📝 Explanation:

Tests significance of added predictors.

105. Ridge regression adds penalty:

a) λ Σ β_j² (L2)

b) λ Σ |β_j| (L1)

c) λ Σ (β_j – 1)²

d) No penalty

✅ Correct Answer: a) λ Σ β_j² (L2)

📝 Explanation:

Shrinks coefficients, handles multicollinearity.

106. Lasso can perform variable selection because it:

a) Sets some coefficients exactly to zero

b) Shrinks all equally

c) Increases variance

d) Removes intercept

✅ Correct Answer: a) Sets some coefficients exactly to zero

📝 Explanation:

L1 penalty promotes sparsity.

107. Elastic Net combines:

a) Ridge and Lasso penalties

b) Ridge and OLS

c) Lasso and PCR

d) PCR and PLS

✅ Correct Answer: a) Ridge and Lasso penalties

📝 Explanation:

Useful when p > n or high correlation.

New

130 Exploratory Data Analysis (EDA) MCQs

MCQs cover the fundamentals of Exploratory Data Analysis, covering data summarization, visualization techniques, handling anomalies, and inferring patterns from datasets.…

November 8, 2025

By MCQs Generator

New

120 Data Cleaning and Preprocessing in Data Analysis - MCQs

120 industry-level multiple-choice questions on data cleaning, handling missing values, outliers, encoding, scaling, and preprocessing pipelines—modeled after real data scientist…

November 8, 2025

By MCQs Generator

New

50 Hypothesis Testing in Data Analysis - MCQs

This set of 50 MCQs explores key concepts in hypothesis testing, including null and alternative hypotheses, p-values, test statistics, error…

November 8, 2025

By MCQs Generator