100 challenging multiple-choice questions on descriptive statistics, inferential methods, and time series analysis. Inspired by real data science and analytics interview questions from FAANG, consulting firms, and quant roles.
1. Which measure of central tendency is most affected by extreme outliers?
2. In a positively skewed distribution, the correct order of mean, median, and mode is:
3. The interquartile range (IQR) is calculated as:
4. Which of the following is NOT a measure of dispersion?
5. The empirical rule applies to data that are approximately:
6. Pearson's correlation coefficient is undefined when:
7. Boxplot whiskers typically extend to:
8. The coefficient of variation (CV) is useful for comparing:
9. Z-score of a value x is given by:
10. Chebyshev’s theorem guarantees at least what proportion within k standard deviations (k>1)?
11. The sampling distribution of the sample mean becomes approximately normal when n ≥ 30 due to:
12. A Type I error occurs when we:
13. The p-value is the probability of obtaining a test statistic at least as extreme as observed, assuming:
14. For a two-tailed z-test at α = 0.05, the critical values are:
15. The standard error of the mean is σ / √n; when σ is unknown we use:
16. Confidence interval width is proportional to:
17. Power of a test = 1 – β, where β is:
18. For proportions, the standard error is √[p(1–p)/n]; for confidence intervals we often use:
19. The chi-square test for independence tests whether:
20. ANOVA tests equality of:
21. A time series is stationary if:
22. The ACF at lag k measures:
23. In an AR(1) model y_t = φ y_{t–1} + ε_t, |φ| < 1 ensures:
24. Differencing a series once removes:
25. The PACF of an MA(1) process cuts off after:
26. ADF test null hypothesis is:
27. In SARIMA(p,d,q)(P,D,Q)s, 's' denotes:
28. AIC penalizes models for:
29. White noise has ACF values approximately:
30. Holt-Winters additive model is suitable when seasonal fluctuations:
31. The variance of a dataset {2, 4, 6, 8, 10} is:
32. For the data {1, 3, 3, 6}, the mode is:
33. Skewness coefficient > 0 indicates:
34. Percentile rank of the median is:
35. Geometric mean is preferred for:
36. Covariance of a variable with itself equals its:
37. Spearman’s rank correlation is based on:
38. The 95% CI for μ when n=25, x̄=50, s=10 (t-critical ≈ 2.064) is:
39. A test statistic z = 2.5, α = 0.01 two-tailed; decision is:
40. Minimum sample size for proportion CI with margin of error E=0.03, p*=0.5, z=1.96:
41. For paired t-test, degrees of freedom =
42. F-test is used to compare:
43. Mann-Whitney U tests difference in:
44. Kruskal-Wallis is the non-parametric version of:
45. The Durbin-Watson statistic near 2 indicates:
46. In exponential smoothing, α close to 1 gives more weight to:
47. An AR(2) model requires PACF significant at:
48. Seasonal differencing of period 12 is denoted as:
49. Ljung-Box test checks for:
50. KPSS test null hypothesis is:
51. In ARIMA, 'I' stands for:
52. Forecasts from a random walk model are:
53. Granger causality tests whether past values of X improve prediction of:
54. Variance inflation factor (VIF) > 10 suggests:
55. The coefficient of determination R² represents:
56. In simple linear regression, the least squares slope b1 =
57. Residual standard error estimates:
58. For logistic regression, the link function is:
59. The odds ratio exp(β) = 1.5 means:
60. Poisson regression is suitable for:
61. In survival analysis, Kaplan-Meier estimates:
62. Cox proportional hazards assumption is violated if:
63. Bayesian inference updates beliefs using:
64. Conjugate prior for normal mean (known variance) is:
65. MCMC methods are used to:
66. Bootstrapping estimates sampling distribution by:
67. Percentile bootstrap CI uses:
68. Cross-validation is primarily used to assess:
69. The bias-variance tradeoff implies that overly complex models tend to:
70. Principal Component Analysis (PCA) maximizes:
71. The scree plot helps determine number of PCs by looking for:
72. In k-means clustering, the objective is to minimize:
73. Silhouette coefficient ranges from:
74. A contingency table with all expected frequencies ≥ 5 is required for:
75. McNemar’s test is used for:
76. The central limit theorem requires random samples that are:
77. Degrees of freedom for two-sample t-test (unequal variances) is approximately:
78. Effect size Cohen’s d = (μ1 – μ2) / σ; d = 0.8 is considered:
79. Multiple R² in regression can be artificially inflated by:
80. Homoscedasticity means residuals have:
81. Durbin-Watson values < 1 typically indicate:
82. In time series decomposition, the remainder after trend and seasonal removal is:
83. STL decomposition stands for:
84. ACF of a seasonal series with period 12 shows spikes at:
85. Box-Cox transformation is applied to stabilize:
86. Over-differencing a stationary series introduces:
87. The optimal ARIMA model often has residuals with:
88. In VAR models, each variable is modeled as a function of:
89. Impulse response function traces effect of a shock in one variable on:
90. Cointegration means two non-stationary series have:
91. Johansen test is used to detect:
92. ARCH model tests for:
93. GARCH(1,1) models volatility as:
94. The Jarque-Bera test assesses:
95. Shapiro-Wilk test null hypothesis is:
96. Levene’s test checks for:
97. The Bonferroni correction adjusts α by:
98. Holm’s method is a:
99. Tukey’s HSD test is used after ANOVA to compare:
100. Dunnett’s test compares:
101. The Q-Q plot assesses normality by plotting:
102. A leverage point in regression has high:
103. Cook’s distance measures:
104. The partial F-test in regression compares:
105. Ridge regression adds penalty:
106. Lasso can perform variable selection because it:


