See the full post for the detailed answer.

130 Exploratory Data Analysis (EDA) MCQs

1. What is the primary goal of Exploratory Data Analysis (EDA)?

a) To build predictive models

b) To understand the data and uncover patterns

c) To deploy machine learning algorithms

d) To clean data for production

Correct Answer: b) To understand the data and uncover patterns

Explanation:

EDA aims to summarize main characteristics, often using visual methods, to reveal insights and detect anomalies before modeling.

2. Which statistic measures the central tendency of a dataset?

a) Variance

b) Mean

c) Skewness

d) Kurtosis

Correct Answer: b) Mean

Explanation:

The mean is the average value, representing the center of the data distribution.

3. A histogram is primarily used for visualizing

a) Relationships between two variables

b) Distribution of a single numerical variable

c) Categorical data frequencies

d) Time series trends

Correct Answer: b) Distribution of a single numerical variable

Explanation:

Histograms divide continuous data into bins to show frequency distribution.

4. What does the interquartile range (IQR) represent?

a) The difference between max and min

b) The middle 50% of the data

c) The standard deviation

d) The mode

Correct Answer: b) The middle 50% of the data

Explanation:

IQR = Q3 - Q1, measuring spread of the central half of the data.

5. In EDA, missing values are often identified using

a) Correlation matrices

b) Summary functions like isnull()

c) Scatter plots

d) Box plots

Correct Answer: b) Summary functions like isnull()

Explanation:

Functions like df.isnull().sum() count missing entries per column.

6. Which plot is best for detecting outliers in numerical data?

a) Bar chart

b) Box plot

c) Line plot

d) Pie chart

Correct Answer: b) Box plot

Explanation:

Box plots display quartiles and whiskers, highlighting points beyond 1.5 * IQR.

7. The correlation coefficient ranges from

a) 0 to 1

b) -1 to 1

c) -∞ to ∞

d) 0 to ∞

Correct Answer: b) -1 to 1

Explanation:

Pearson's r measures linear relationship strength and direction between -1 and 1.

8. Skewness measures the

a) Central tendency

b) Asymmetry of the distribution

c) Peakedness

d) Spread

Correct Answer: b) Asymmetry of the distribution

Explanation:

Positive skew indicates right tail, negative left; zero for symmetric.

9. A scatter plot visualizes

a) Single variable distribution

b) Relationship between two variables

c) Categorical comparisons

d) Temporal sequences

Correct Answer: b) Relationship between two variables

Explanation:

Scatter plots plot points for two continuous variables to show correlation or trends.

10. What is the purpose of a pair plot in EDA?

a) To show univariate distributions only

b) To visualize pairwise relationships in multivariate data

c) To compute summary statistics

d) To handle missing values

Correct Answer: b) To visualize pairwise relationships in multivariate data

Explanation:

Pair plots create a matrix of scatter plots and histograms for all variable pairs.

11. Kurtosis describes the

a) Tail heaviness relative to normal distribution

b) Central tendency

c) Symmetry

d) Range

Correct Answer: a) Tail heaviness relative to normal distribution

Explanation:

High kurtosis indicates heavy tails and peaked center; low indicates flat.

12. For categorical data, which visualization is appropriate?

a) Histogram

b) Bar chart

c) Scatter plot

d) Line plot

Correct Answer: b) Bar chart

Explanation:

Bar charts display frequencies or counts for discrete categories.

13. The standard deviation measures

a) Average value

b) Dispersion around the mean

c) Median spread

d) Mode frequency

Correct Answer: b) Dispersion around the mean

Explanation:

SD quantifies data variability; sqrt of variance.

14. In EDA, data scaling is often checked for

a) Multicollinearity

b) Feature ranges and distributions

c) Missing values

d) Categorical encoding

Correct Answer: b) Feature ranges and distributions

Explanation:

Scaling ensures features contribute equally, checked via min-max or z-scores.

15. A heatmap is used to visualize

a) Single variable trends

b) Correlation matrices

c) Outlier positions

d) Missing data patterns

Correct Answer: b) Correlation matrices

Explanation:

Heatmaps color-code values in a matrix, ideal for correlations.

16. What indicates a normal distribution in EDA?

a) Skewness = 0, Kurtosis = 3

b) Skewness = 1, Kurtosis = 4

c) Mean > Median

d) High variance

Correct Answer: a) Skewness = 0, Kurtosis = 3

Explanation:

Normal distribution is symmetric (skew=0) with kurtosis=3 (mesokurtic).

17. Violin plots combine

a) Box plot and histogram

b) Box plot and density plot

c) Scatter and line

d) Bar and pie

Correct Answer: b) Box plot and density plot

Explanation:

Violin plots show distribution shape via kernel density and quartiles via box.

18. Multicollinearity is detected using

a) VIF (Variance Inflation Factor)

b) IQR

c) Mean

d) Mode

Correct Answer: a) VIF (Variance Inflation Factor)

Explanation:

VIF > 5-10 indicates high multicollinearity among features.

19. For time series EDA, which plot shows trends over time?

a) Scatter plot

b) Line plot

c) Bar chart

d) Histogram

Correct Answer: b) Line plot

Explanation:

Line plots connect data points chronologically to reveal patterns.

20. The mode is the value that

a) Sums to total

b) Occurs most frequently

c) Divides data in half

d) Measures spread

Correct Answer: b) Occurs most frequently

Explanation:

Mode identifies the most common category or value in unimodal data.

21. In EDA, pivot tables are used for

a) Aggregating data by categories

b) Plotting distributions

c) Detecting outliers

d) Scaling features

Correct Answer: a) Aggregating data by categories

Explanation:

Pivot tables summarize data with rows, columns, and values for cross-tabulation.

22. A Q-Q plot assesses

a) Normality of distribution

b) Correlation strength

c) Missing values

d) Categorical balance

Correct Answer: a) Normality of distribution

Explanation:

Q-Q plots compare quantiles to theoretical normal; straight line indicates normality.

23. Handling outliers in EDA often involves

a) Ignoring them always

b) Capping or removing after investigation

c) Increasing dataset size

d) Changing data type

Correct Answer: b) Capping or removing after investigation

Explanation:

Outliers may be errors or insights; decisions based on domain knowledge.

24. Categorical variables are encoded in EDA using

a) One-hot encoding

b) Z-score normalization

c) Log transformation

d) Binning

Correct Answer: a) One-hot encoding

Explanation:

One-hot creates binary columns for categories to avoid ordinal assumptions.

25. The coefficient of variation (CV) is

a) SD / Mean * 100%

b) Mean / SD

c) Variance / Mean

d) IQR / Median

Correct Answer: a) SD / Mean * 100%

Explanation:

CV measures relative variability, useful for comparing dispersion across datasets.

26. Seaborn library in Python is popular for EDA because it

a) Handles missing data

b) Provides statistical visualizations

c) Trains models

d) Cleans text

Correct Answer: b) Provides statistical visualizations

Explanation:

Seaborn builds on Matplotlib for attractive, informative plots like heatmaps.

27. In bivariate analysis, a low correlation implies

a) Strong linear relationship

b) Weak or no linear relationship

c) Causation

d) Identical distributions

Correct Answer: b) Weak or no linear relationship

Explanation:

Correlation near 0 suggests little linear association between variables.

28. Data profiling in EDA includes

a) Only visualizations

b) Summary statistics and data quality checks

c) Model evaluation

d) Feature selection

Correct Answer: b) Summary statistics and data quality checks

Explanation:

Profiling overviews structure, types, missingness, and stats.

29. A kernel density estimate (KDE) plot shows

a) Discrete counts

b) Smooth probability density

c) Categorical proportions

d) Error bars

Correct Answer: b) Smooth probability density

Explanation:

KDE approximates continuous distribution using a kernel function.

30. For imbalanced classes in EDA, check

a) Class distribution via bar plots

b) Only numerical features

c) Correlation only

d) Outliers exclusively

Correct Answer: a) Class distribution via bar plots

Explanation:

Visualize target variable frequencies to identify imbalance.

31. The five-number summary includes

a) Min, Q1, Median, Q3, Max

b) Mean, Median, Mode, SD, Variance

c) Skewness, Kurtosis, Mean, Median, Mode

d) Range, IQR, Mean, SD, CV

Correct Answer: a) Min, Q1, Median, Q3, Max

Explanation:

Used in box plots to describe data spread without assuming distribution.

32. In EDA, feature engineering starts with

a) Creating new variables from existing

b) Training models

c) Hyperparameter tuning

d) Cross-validation

Correct Answer: a) Creating new variables from existing

Explanation:

Derive interactions, polynomials, or bins to capture patterns.

33. A lag plot in time series EDA detects

a) Seasonality

b) Autocorrelation

c) Trend only

d) Stationarity

Correct Answer: b) Autocorrelation

Explanation:

Plots value against its lagged version to show serial dependence.

34. Z-score for outlier detection is

a) (x - mean) / SD

b) x / mean

c) mean - x

d) SD / x

Correct Answer: a) (x - mean) / SD

Explanation:

|Z| > 3 often flags outliers assuming normality.

35. Count plots are used for

a) Numerical distributions

b) Categorical frequencies

c) Bivariate relations

d) Multivariate analysis

Correct Answer: b) Categorical frequencies

Explanation:

Similar to bar charts but for single categorical variable counts.

36. In EDA, dimensionality reduction preview uses

a) PCA scree plot

b) Histogram

c) Box plot

d) Scatter plot

Correct Answer: a) PCA scree plot

Explanation:

Shows explained variance by components to decide retention.

37. Median is robust to

a) Outliers

b) Symmetry

c) Normality

d) Skewness

Correct Answer: a) Outliers

Explanation:

Unlike mean, median resists extreme values.

38. Joint plots in Seaborn combine

a) Histogram and bar

b) Scatter, marginal histograms, and correlation

c) Box and violin

d) Line and area

Correct Answer: b) Scatter, marginal histograms, and correlation

Explanation:

For bivariate EDA with univariate margins.

39. Variance is the average of

a) Squared deviations from mean

b) Absolute deviations from median

c) Differences between max and min

d) Frequencies

Correct Answer: a) Squared deviations from mean

Explanation:

Variance = Σ(x_i - μ)^2 / n, measuring spread.

40. For text data in EDA, start with

a) Word clouds or n-gram frequencies

b) Numerical scaling

c) Correlation matrix

d) Time series decomposition

Correct Answer: a) Word clouds or n-gram frequencies

Explanation:

Visualize common terms and phrases in unstructured text.

41. Autocorrelation function (ACF) plot identifies

a) Cross-variable correlations

b) Serial correlations in time series

c) Outlier impacts

d) Missing patterns

Correct Answer: b) Serial correlations in time series

Explanation:

ACF shows correlation of series with its lags.

42. In EDA, data types include

a) Only numerical

b) Numerical, categorical, ordinal, datetime

c) Only categorical

d) Binary only

Correct Answer: b) Numerical, categorical, ordinal, datetime

Explanation:

Understanding types guides appropriate analysis and visualization.

43. A facet grid in EDA allows

a) Subplotting by categories

b) 3D plotting

c) Animation

d) Interactive zooming

Correct Answer: a) Subplotting by categories

Explanation:

Splits plots into a grid conditioned on variables for comparisons.

44. Pearson correlation assumes

a) Linear relationship and normality

b) Non-linear only

c) Categorical data

d) No assumptions

Correct Answer: a) Linear relationship and normality

Explanation:

For continuous variables; use Spearman for non-parametric.

45. Box plot whiskers typically extend to

a) 1.5 * IQR from quartiles

b) Full range

c) Mean ± SD

d) Median only

Correct Answer: a) 1.5 * IQR from quartiles

Explanation:

Beyond whiskers are potential outliers.

46. In EDA, resampling checks

a) Stability of statistics

b) Model accuracy

c) Feature importance

d) Hyperparameters

Correct Answer: a) Stability of statistics

Explanation:

Bootstrap or jackknife assesses variability in estimates.

47. Cramér's V measures

a) Association between categorical variables

b) Linear correlation

c) Outlier distance

d) Distribution shape

Correct Answer: a) Association between categorical variables

Explanation:

Ranges 0-1; chi-square based for nominal data.

48. Log transformation in EDA is used to

a) Handle skewed data

b) Encode categories

c) Fill missings

d) Detect trends

Correct Answer: a) Handle skewed data

Explanation:

Reduces right skew, stabilizing variance.

49. A swarm plot displays

a) Points without overlap for categorical

b) Dense lines

c) Heat intensities

d) 3D surfaces

Correct Answer: a) Points without overlap for categorical

Explanation:

Shows individual data points spread to avoid stacking.

50. In multivariate EDA, parallel coordinates plot

a) High-dimensional data as lines

b) Pairwise scatters

c) Time sequences

d) Density contours

Correct Answer: a) High-dimensional data as lines

Explanation:

Each line represents an observation across normalized axes.

51. The range is

a) Max - Min

b) Q3 - Q1

c) Mean - Median

d) SD * 2

Correct Answer: a) Max - Min

Explanation:

Simplest spread measure, sensitive to outliers.

52. For geospatial data in EDA, use

a) Choropleth maps

b) Bar charts

c) Histograms

d) Line plots

Correct Answer: a) Choropleth maps

Explanation:

Color regions by data values for spatial patterns.

53. Chi-square test in EDA checks

a) Independence between categorical variables

b) Normality

c) Linearity

d) Homoscedasticity

Correct Answer: a) Independence between categorical variables

Explanation:

Null hypothesis: no association in contingency tables.

54. Strip plots show

a) Jittered points for categorical

b) Smooth densities

c) Box summaries

d) Violin shapes

Correct Answer: a) Jittered points for categorical

Explanation:

Adds random noise to positions to reveal overplotting.

55. In EDA, cross-validation previews

a) Model performance

b) Data leakage

c) Feature stability

d) All of the above

Correct Answer: d) All of the above

Explanation:

Early CV checks generalization before full modeling.

56. Quantile-quantile (Q-Q) plot compares

a) Sample to theoretical distribution

b) Two samples

c) Time lags

d) Spatial points

Correct Answer: a) Sample to theoretical distribution

Explanation:

Deviations from line indicate non-conformity.

57. For ordinal data, use

a) Ordinal encoding

b) One-hot

c) Frequency encoding

d) Target encoding

Correct Answer: a) Ordinal encoding

Explanation:

Assigns integers preserving order, unlike nominal.

58. Partial dependence plots explain

a) Feature effects in models

b) Data distributions

c) Correlations

d) Outliers

Correct Answer: a) Feature effects in models

Explanation:

Shows marginal effect of a feature on prediction.

59. In EDA, binning continuous data creates

a) Categorical intervals

b) Scaled values

c) Polynomials

d) Interactions

Correct Answer: a) Categorical intervals

Explanation:

Reduces noise and reveals patterns in discretized form.

60. Spearman's rank correlation is

a) Non-parametric for monotonic relations

b) Only for linear

c) For categorical

d) Ignores ranks

Correct Answer: a) Non-parametric for monotonic relations

Explanation:

Based on ranks, robust to non-normality.

61. A ridgeline plot visualizes

a) Overlapping density distributions

b) Scatter clusters

c) Bar comparisons

d) Line trends

Correct Answer: a) Overlapping density distributions

Explanation:

Stacks shifted KDEs for category comparisons.

62. EDA documents findings via

a) Notebooks like Jupyter

b) Only plots

c) Models

d) Databases

Correct Answer: a) Notebooks like Jupyter

Explanation:

Combines code, visuals, and narrative for reproducibility.

63. Theil's U measures

a) Asymmetric association for categorical

b) Symmetry in distributions

c) Linear strength

d) Outlier probability

Correct Answer: a) Asymmetric association for categorical

Explanation:

Uncertainty coefficient based on entropy.

64. In time series, decomposition separates

a) Trend, seasonality, residual

b) Mean, variance, skew

c) Lags, leads, current

d) Frequencies, amplitudes

Correct Answer: a) Trend, seasonality, residual

Explanation:

Additive or multiplicative models isolate components.

65. For high-cardinality categoricals in EDA, use

a) Frequency or target encoding

b) One-hot for all

c) Ignore

d) Bin to low

Correct Answer: a) Frequency or target encoding

Explanation:

Reduces dimensions while retaining information.

66. A contour plot shows

a) Density levels in 2D

b) 3D surfaces

c) Bar heights

d) Point clusters

Correct Answer: a) Density levels in 2D

Explanation:

Lines or colors indicate constant value regions.

67. EDA hypothesis generation leads to

a) Confirmatory analysis

b) Data collection

c) Model deployment

d) Reporting

Correct Answer: a) Confirmatory analysis

Explanation:

Patterns suggest testable hypotheses for further stats.

68. Mahalanobis distance detects

a) Multivariate outliers

b) Univariate means

c) Correlations

d) Skew

Correct Answer: a) Multivariate outliers

Explanation:

Accounts for covariance, unlike Euclidean.

69. In EDA, groupby operations compute

a) Aggregates by categories

b) Global stats

c) Plots

d) Encodings

Correct Answer: a) Aggregates by categories

Explanation:

Like mean by group in pandas for subgroup analysis.

70. A hexbin plot is a

a) 2D histogram with hexagons

b) Line with error bands

c) Categorical strip

d) Density violin

Correct Answer: a) 2D histogram with hexagons

Explanation:

Efficient for dense scatter data to show counts.

71. EDA for regression checks

a) Linearity, homoscedasticity via residuals

b) Class balance

c) Clusters

d) Survival rates

Correct Answer: a) Linearity, homoscedasticity via residuals

Explanation:

Scatter of residuals vs fitted predicts assumptions.

72. Polychoric correlation for

a) Ordinal variables

b) Continuous only

c) Binary

d) Spatial

Correct Answer: a) Ordinal variables

Explanation:

Assumes underlying continuous latent variables.

73. In EDA, melting data changes

a) Wide to long format

b) Long to wide

c) Categorical to numerical

d) Time to static

Correct Answer: a) Wide to long format

Explanation:

Facilitates plotting multiple series.

74. SHAP values in EDA preview

a) Feature contributions post-model

b) Data cleaning

c) Visualization

d) Encoding

Correct Answer: a) Feature contributions post-model

Explanation:

Explains individual predictions for interpretability.

75. For survival data EDA, Kaplan-Meier estimates

a) Survival function

b) Hazard rates

c) Cure fractions

d) All of the above

Correct Answer: d) All of the above

Explanation:

Non-parametric curve for time-to-event analysis.

76. A sunburst plot visualizes

a) Hierarchical data

b) Network graphs

c) Time flows

d) Geospatial

Correct Answer: a) Hierarchical data

Explanation:

Nested rings show proportions in categories.

77. In EDA, Levene's test checks

a) Equal variances

b) Normality

c) Independence

d) Linearity

Correct Answer: a) Equal variances

Explanation:

Robust to non-normality for ANOVA assumptions.

78. Treemap displays

a) Proportional areas for categories

b) Lines for trends

c) Points for relations

d) Densities for shapes

Correct Answer: a) Proportional areas for categories

Explanation:

Rectangles sized by value in hierarchical layout.

79. EDA for clustering previews with

a) Elbow method on k-means

b) ROC curves

c) Confusion matrices

d) Lift charts

Correct Answer: a) Elbow method on k-means

Explanation:

Plots inertia vs k to suggest optimal clusters.

80. Biserial correlation for

a) Continuous and binary variables

b) Two continuous

c) Two binary

d) Ordinal-binary

Correct Answer: a) Continuous and binary variables

Explanation:

Point-biserial if binary is true dichotomy.

81. In EDA, pivot_longer in R or melt in Python

a) Reshapes wide to long

b) Aggregates

c) Filters

d) Joins

Correct Answer: a) Reshapes wide to long

Explanation:

Prepares tidy data for analysis.

82. LIME explains

a) Local model interpretations

b) Global correlations

c) Data summaries

d) Outlier reasons

Correct Answer: a) Local model interpretations

Explanation:

Approximates black-box models locally with interpretable ones.

83. For network data EDA, use

a) Graph visualizations like node-link

b) Bar charts

c) Histograms

d) Scatter only

Correct Answer: a) Graph visualizations like node-link

Explanation:

Shows nodes and edges for connectivity patterns.

84. Anderson-Darling test assesses

a) Goodness-of-fit to distribution

b) Equal means

c) Variances

d) Independence

Correct Answer: a) Goodness-of-fit to distribution

Explanation:

Sensitive to tail deviations from normal.

85. Sankey diagram illustrates

a) Flow between categories

b) Proportions

c) Trends

d) Relations

Correct Answer: a) Flow between categories

Explanation:

Width proportional to magnitude in multi-stage processes.

86. EDA for classification includes

a) Class separation via LDA plot

b) Survival curves

c) Hazard functions

d) Time decompositions

Correct Answer: a) Class separation via LDA plot

Explanation:

Linear Discriminant Analysis projects for discriminability.

87. Tetrachoric correlation assumes

a) Binary data from underlying continuous

b) Ordinal only

c) Continuous

d) Categorical nominal

Correct Answer: a) Binary data from underlying continuous

Explanation:

For dichotomous variables implying latent scale.

88. In EDA, dcast or pivot in tools

a) Long to wide reshaping

b) Wide to long

c) Sorting

d) Filtering

Correct Answer: a) Long to wide reshaping

Explanation:

Spreads variables for certain summaries.

89. Counterfactual explanations in EDA show

a) What-if scenarios for predictions

b) Data distributions

c) Correlations

d) Outliers

Correct Answer: a) What-if scenarios for predictions

Explanation:

Alters features to see outcome changes.

90. For audio data EDA, spectrograms visualize

a) Frequency over time

b) Waveforms

c) Amplitudes only

d) Durations

Correct Answer: a) Frequency over time

Explanation:

2D representation of signal spectrum.

91. Kolmogorov-Smirnov test compares

a) Distributions

b) Means

c) Variances

d) Correlations

Correct Answer: a) Distributions

Explanation:

Maximum difference in empirical CDFs.

92. Alluvial plot shows

a) Changes in category proportions over stages

b) Static hierarchies

c) Networks

d) Maps

Correct Answer: a) Changes in category proportions over stages

Explanation:

Like Sankey but for categorical shifts.

93. In EDA for anomaly detection, isolation forest previews

a) Anomaly scores

b) Class probabilities

c) Regressions

d) Clusters

Correct Answer: a) Anomaly scores

Explanation:

Path length in trees indicates isolation.

94. Phi coefficient for

a) 2x2 contingency association

b) Multiple categories

c) Continuous

d) Ordinal

Correct Answer: a) 2x2 contingency association

Explanation:

Pearson's r for binary variables.

95. Gather function in tidyverse

a) Wide to long

b) Long to wide

c) Summarize

d) Mutate

Correct Answer: a) Wide to long

Explanation:

Tidy data principle for analysis.

96. Anchored explanations focus on

a) Comparison to baseline prediction

b) Global averages

c) Local densities

d) Outlier distances

Correct Answer: a) Comparison to baseline prediction

Explanation:

Highlights feature impacts relative to reference.

97. For image data EDA, use

a) Pixel histograms or t-SNE embeddings

b) Bar charts

c) Line plots

d) Scatter only

Correct Answer: a) Pixel histograms or t-SNE embeddings

Explanation:

Visualize color distributions or latent spaces.

98. Shapiro-Wilk test for

a) Normality

b) Equal variances

c) Independence

d) Homoscedasticity

Correct Answer: a) Normality

Explanation:

Powerful for small samples.

99. Chord diagram visualizes

a) Interconnections between categories

b) Flows

c) Hierarchies

d) Trends

Correct Answer: a) Interconnections between categories

Explanation:

Arc segments with linking ribbons.

100. EDA for recommendation systems includes

a) User-item matrix sparsity

b) Survival analysis

c) Time series only

d) Geospatial clustering

Correct Answer: a) User-item matrix sparsity

Explanation:

Percentage of missing interactions.

101. Contingency coefficient for

a) Categorical association beyond 2x2

b) Binary only

c) Continuous

d) Ordinal

Correct Answer: a) Categorical association beyond 2x2

Explanation:

Chi-square based, asymmetric.

102. Spread function in R

a) Long to wide

b) Wide to long

c) Group by

d) Filter

Correct Answer: a) Long to wide

Explanation:

Pivots values into columns.

103. Prototype-based explanations use

a) Nearest neighbors as exemplars

b) Rule sets

c) Trees

d) Linear models

Correct Answer: a) Nearest neighbors as exemplars

Explanation:

Shows similar cases for context.

104. For video data EDA, frame sampling and

a) Optical flow analysis

b) Static images

c) Audio only

d) Text overlays

Correct Answer: a) Optical flow analysis

Explanation:

Motion vectors between frames.

105. Jarque-Bera test combines

a) Skewness and kurtosis for normality

b) Means and variances

c) Correlations

d) Outliers

Correct Answer: a) Skewness and kurtosis for normality

Explanation:

Omnibus test against normal.

106. Parallel sets plot for

a) Multi-category flows like alluvial

b) Single variables

c) Continuous

d) Spatial

Correct Answer: a) Multi-category flows like alluvial

Explanation:

Ribbon bands for proportions.

107. In reinforcement learning EDA, check

a) State-action distributions

b) Class labels

c) Targets

d) Features only

Correct Answer: a) State-action distributions

Explanation:

Exploration coverage in environments.

108. Lambda coefficient for

a) Asymmetric nominal association

b) Symmetric

c) Ordinal

d) Continuous

Correct Answer: a) Asymmetric nominal association

Explanation:

Predictive reduction in error.

109. Unpivot in data tools

a) Wide to long

b) Long to wide

c) Aggregate

d) Join

Correct Answer: a) Wide to long

Explanation:

Standardizes column structure.

110. Subgroup explanations in EDA target

a) Specific data slices

b) Global model

c) Individual points

d) Random samples

Correct Answer: a) Specific data slices

Explanation:

Tailored insights for segments.

111. For graph data EDA beyond basics, centrality measures like

a) Degree, betweenness

b) Means

c) Variances

d) Skews

Correct Answer: a) Degree, betweenness

Explanation:

Node importance in networks.

112. D'Agostino's K-squared test for

a) Normality via skew and kurtosis

b) Equal variances

c) Independence

d) Trends

Correct Answer: a) Normality via skew and kurtosis

Explanation:

Asymptotic chi-square.

113. Mosaic plot for

a) Categorical contingency visualization

b) Continuous densities

c) Time series

d) Spatial

Correct Answer: a) Categorical contingency visualization

Explanation:

Tiled bars for associations.

114. EDA in causal inference previews with

a) DAGs for confounding

b) ROC

c) MSE

d) R2

Correct Answer: a) DAGs for confounding

Explanation:

Directed Acyclic Graphs map relationships.

115. Uncertainty coefficient for

a) Nominal predictive association

b) Ordinal

c) Binary

d) Continuous

Correct Answer: a) Nominal predictive association

Explanation:

Entropy-based, asymmetric.

116. Pivot_table in pandas for

a) Multi-index aggregation

b) Simple sums

c) Plots

d) Encodes

Correct Answer: a) Multi-index aggregation

Explanation:

Flexible crosstabs with functions.

117. Contrastive explanations compare

a) Prediction to counterfactual

b) Features globally

c) Data points

d) Models

Correct Answer: a) Prediction to counterfactual

Explanation:

Minimal changes for outcome flip.

118. For tabular data EDA, automated tools like

a) Pandas Profiling

b) Manual plots only

c) Models first

d) Cleaning only

Correct Answer: a) Pandas Profiling

Explanation:

Generates comprehensive HTML reports.

119. Lilliefors test is a

a) KS variant without specified distribution

b) T-test

c) Chi-square

d) F-test

Correct Answer: a) KS variant without specified distribution

Explanation:

For normality, parameters estimated from data.

120. Dot plot alternative to

a) Bar for small categories

b) Line for time

c) Scatter for relations

d) Histogram for continuous

Correct Answer: a) Bar for small categories

Explanation:

Points on axis for frequencies.

121. In federated learning EDA, focus on

a) Local dataset summaries without sharing

b) Centralized data

c) Full merges

d) Global models only

Correct Answer: a) Local dataset summaries without sharing

Explanation:

Preserves privacy in distributed settings.

122. Goodman-Kruskal gamma for

a) Ordinal association

b) Nominal

c) Binary

d) Continuous

Correct Answer: a) Ordinal association

Explanation:

Accounts for tied pairs in ranks.

123. Reshape in Python for

a) Data format changes

b) Calculations

c) Visuals

d) Storage

Correct Answer: a) Data format changes

Explanation:

Stack, melt, pivot for tidy data.

124. Input gradient explanations use

a) Model derivatives w.r.t. input

b) Averages

c) Neighbors

d) Rules

Correct Answer: a) Model derivatives w.r.t. input

Explanation:

Sensitivity of prediction to features.

125. For sensor data EDA, time-frequency analysis like

a) Wavelet transforms

b) Simple averages

c) Static plots

d) Categorical bars

Correct Answer: a) Wavelet transforms

Explanation:

Localizes events in time and scale.

126. Cramér-von Mises test for

a) Goodness-of-fit

b) Means

c) Variances

d) Correlations

Correct Answer: a) Goodness-of-fit

Explanation:

Integral of squared CDF differences.

127. Streamgraph for

a) Stacked area over time

b) Static proportions

c) Networks

d) Maps

Correct Answer: a) Stacked area over time

Explanation:

Centered for flowing appearance.

128. EDA in A/B testing checks

a) Group balance and power

b) Model fits

c) Clusters

d) Outliers only

Correct Answer: a) Group balance and power

Explanation:

Pre-test for valid comparisons.

129. Kappa coefficient for

a) Inter-rater agreement beyond chance

b) Association strength

c) Prediction error

d) Distribution fit

Correct Answer: a) Inter-rater agreement beyond chance

Explanation:

For categorical ratings.

130. Tidy data principles in EDA ensure

a) One variable per column, observation per row

b) Wide formats

c) Mixed types

d) Duplicates

Correct Answer: a) One variable per column, observation per row

Explanation:

Facilitates manipulation and analysis.

131. Guided backpropagation for

a) Salient feature visualization in images

b) Tabular data

c) Text

d) Audio

Correct Answer: a) Salient feature visualization in images

Explanation:

Modifies gradients for positive relevance.

132. For financial time series EDA, candlestick charts show

a) OHLC prices

b) Volumes only

c) Returns

d) Volatilities

Correct Answer: a) OHLC prices

Explanation:

Open, High, Low, Close for volatility.

133. Anderson-Rubin test in

a) Instrumental variable strength

b) Normality

c) Equal means

d) Variances

Correct Answer: a) Instrumental variable strength

Explanation:

Weak instrument detection.

134. Waffle chart for

a) Proportional parts like pie alternative

b) Flows

c) Hierarchies

d) Relations

Correct Answer: a) Proportional parts like pie alternative

Explanation:

Grid squares for percentages.

135. EDA for NLP includes

a) Token length distributions, vocab size

b) Numerical correlations

c) Image histograms

d) Audio spectra

Correct Answer: a) Token length distributions, vocab size

Explanation:

Text-specific summaries.

136. Yule's Q for

a) 2x2 table association

b) Multiple categories

c) Ordinal

d) Continuous

Correct Answer: a) 2x2 table association

Explanation:

Dichotomous measure from odds ratio.

137. Stack in pandas for

a) MultiIndex to columns

b) Columns to rows

c) Rows to columns

d) Filtering

Correct Answer: b) Columns to rows

Explanation:

Longer format from wide.

138. Layer-wise relevance propagation for

a) Deep net heatmaps

b) Shallow models

c) Trees

d) Rules

Correct Answer: a) Deep net heatmaps

Explanation:

Backpropagates relevance scores.