130 Exploratory Data Analysis (EDA) MCQs

1 min read
[flat_pm id="7169"]

MCQs cover the fundamentals of Exploratory Data Analysis, covering data summarization, visualization techniques, handling anomalies, and inferring patterns from datasets. Ideal for data analysts and scientists to reinforce EDA practices using statistical and graphical methods.

1. What is the primary goal of Exploratory Data Analysis (EDA)?

a) To build predictive models
b) To understand the data and uncover patterns
c) To deploy machine learning algorithms
d) To clean data for production
Correct Answer: b) To understand the data and uncover patterns
📝 Explanation:
EDA aims to summarize main characteristics, often using visual methods, to reveal insights and detect anomalies before modeling.

2. Which statistic measures the central tendency of a dataset?

a) Variance
b) Mean
c) Skewness
d) Kurtosis
Correct Answer: b) Mean
📝 Explanation:
The mean is the average value, representing the center of the data distribution.

3. A histogram is primarily used for visualizing

a) Relationships between two variables
b) Distribution of a single numerical variable
c) Categorical data frequencies
d) Time series trends
Correct Answer: b) Distribution of a single numerical variable
📝 Explanation:
Histograms divide continuous data into bins to show frequency distribution.

4. What does the interquartile range (IQR) represent?

a) The difference between max and min
b) The middle 50% of the data
c) The standard deviation
d) The mode
Correct Answer: b) The middle 50% of the data
📝 Explanation:
IQR = Q3 - Q1, measuring spread of the central half of the data.

5. In EDA, missing values are often identified using

a) Correlation matrices
b) Summary functions like isnull()
c) Scatter plots
d) Box plots
Correct Answer: b) Summary functions like isnull()
📝 Explanation:
Functions like df.isnull().sum() count missing entries per column.

6. Which plot is best for detecting outliers in numerical data?

a) Bar chart
b) Box plot
c) Line plot
d) Pie chart
Correct Answer: b) Box plot
📝 Explanation:
Box plots display quartiles and whiskers, highlighting points beyond 1.5 * IQR.

7. The correlation coefficient ranges from

a) 0 to 1
b) -1 to 1
c) -∞ to ∞
d) 0 to ∞
Correct Answer: b) -1 to 1
📝 Explanation:
Pearson's r measures linear relationship strength and direction between -1 and 1.

8. Skewness measures the

a) Central tendency
b) Asymmetry of the distribution
c) Peakedness
d) Spread
Correct Answer: b) Asymmetry of the distribution
📝 Explanation:
Positive skew indicates right tail, negative left; zero for symmetric.

9. A scatter plot visualizes

a) Single variable distribution
b) Relationship between two variables
c) Categorical comparisons
d) Temporal sequences
Correct Answer: b) Relationship between two variables
📝 Explanation:
Scatter plots plot points for two continuous variables to show correlation or trends.

10. What is the purpose of a pair plot in EDA?

a) To show univariate distributions only
b) To visualize pairwise relationships in multivariate data
c) To compute summary statistics
d) To handle missing values
Correct Answer: b) To visualize pairwise relationships in multivariate data
📝 Explanation:
Pair plots create a matrix of scatter plots and histograms for all variable pairs.

11. Kurtosis describes the

a) Tail heaviness relative to normal distribution
b) Central tendency
c) Symmetry
d) Range
Correct Answer: a) Tail heaviness relative to normal distribution
📝 Explanation:
High kurtosis indicates heavy tails and peaked center; low indicates flat.

12. For categorical data, which visualization is appropriate?

a) Histogram
b) Bar chart
c) Scatter plot
d) Line plot
Correct Answer: b) Bar chart
📝 Explanation:
Bar charts display frequencies or counts for discrete categories.

13. The standard deviation measures

a) Average value
b) Dispersion around the mean
c) Median spread
d) Mode frequency
Correct Answer: b) Dispersion around the mean
📝 Explanation:
SD quantifies data variability; sqrt of variance.

14. In EDA, data scaling is often checked for

a) Multicollinearity
b) Feature ranges and distributions
c) Missing values
d) Categorical encoding
Correct Answer: b) Feature ranges and distributions
📝 Explanation:
Scaling ensures features contribute equally, checked via min-max or z-scores.

15. A heatmap is used to visualize

a) Single variable trends
b) Correlation matrices
c) Outlier positions
d) Missing data patterns
Correct Answer: b) Correlation matrices
📝 Explanation:
Heatmaps color-code values in a matrix, ideal for correlations.

16. What indicates a normal distribution in EDA?

a) Skewness = 0, Kurtosis = 3
b) Skewness = 1, Kurtosis = 4
c) Mean > Median
d) High variance
Correct Answer: a) Skewness = 0, Kurtosis = 3
📝 Explanation:
Normal distribution is symmetric (skew=0) with kurtosis=3 (mesokurtic).

17. Violin plots combine

a) Box plot and histogram
b) Box plot and density plot
c) Scatter and line
d) Bar and pie
Correct Answer: b) Box plot and density plot
📝 Explanation:
Violin plots show distribution shape via kernel density and quartiles via box.

18. Multicollinearity is detected using

a) VIF (Variance Inflation Factor)
b) IQR
c) Mean
d) Mode
Correct Answer: a) VIF (Variance Inflation Factor)
📝 Explanation:
VIF > 5-10 indicates high multicollinearity among features.

19. For time series EDA, which plot shows trends over time?

a) Scatter plot
b) Line plot
c) Bar chart
d) Histogram
Correct Answer: b) Line plot
📝 Explanation:
Line plots connect data points chronologically to reveal patterns.

20. The mode is the value that

a) Sums to total
b) Occurs most frequently
c) Divides data in half
d) Measures spread
Correct Answer: b) Occurs most frequently
📝 Explanation:
Mode identifies the most common category or value in unimodal data.

21. In EDA, pivot tables are used for

a) Aggregating data by categories
b) Plotting distributions
c) Detecting outliers
d) Scaling features
Correct Answer: a) Aggregating data by categories
📝 Explanation:
Pivot tables summarize data with rows, columns, and values for cross-tabulation.

22. A Q-Q plot assesses

a) Normality of distribution
b) Correlation strength
c) Missing values
d) Categorical balance
Correct Answer: a) Normality of distribution
📝 Explanation:
Q-Q plots compare quantiles to theoretical normal; straight line indicates normality.

23. Handling outliers in EDA often involves

a) Ignoring them always
b) Capping or removing after investigation
c) Increasing dataset size
d) Changing data type
Correct Answer: b) Capping or removing after investigation
📝 Explanation:
Outliers may be errors or insights; decisions based on domain knowledge.

24. Categorical variables are encoded in EDA using

a) One-hot encoding
b) Z-score normalization
c) Log transformation
d) Binning
Correct Answer: a) One-hot encoding
📝 Explanation:
One-hot creates binary columns for categories to avoid ordinal assumptions.

25. The coefficient of variation (CV) is

a) SD / Mean * 100%
b) Mean / SD
c) Variance / Mean
d) IQR / Median
Correct Answer: a) SD / Mean * 100%
📝 Explanation:
CV measures relative variability, useful for comparing dispersion across datasets.

26. Seaborn library in Python is popular for EDA because it

a) Handles missing data
b) Provides statistical visualizations
c) Trains models
d) Cleans text
Correct Answer: b) Provides statistical visualizations
📝 Explanation:
Seaborn builds on Matplotlib for attractive, informative plots like heatmaps.

27. In bivariate analysis, a low correlation implies

a) Strong linear relationship
b) Weak or no linear relationship
c) Causation
d) Identical distributions
Correct Answer: b) Weak or no linear relationship
📝 Explanation:
Correlation near 0 suggests little linear association between variables.

28. Data profiling in EDA includes

a) Only visualizations
b) Summary statistics and data quality checks
c) Model evaluation
d) Feature selection
Correct Answer: b) Summary statistics and data quality checks
📝 Explanation:
Profiling overviews structure, types, missingness, and stats.

29. A kernel density estimate (KDE) plot shows

a) Discrete counts
b) Smooth probability density
c) Categorical proportions
d) Error bars
Correct Answer: b) Smooth probability density
📝 Explanation:
KDE approximates continuous distribution using a kernel function.

30. For imbalanced classes in EDA, check

a) Class distribution via bar plots
b) Only numerical features
c) Correlation only
d) Outliers exclusively
Correct Answer: a) Class distribution via bar plots
📝 Explanation:
Visualize target variable frequencies to identify imbalance.

31. The five-number summary includes

a) Min, Q1, Median, Q3, Max
b) Mean, Median, Mode, SD, Variance
c) Skewness, Kurtosis, Mean, Median, Mode
d) Range, IQR, Mean, SD, CV
Correct Answer: a) Min, Q1, Median, Q3, Max
📝 Explanation:
Used in box plots to describe data spread without assuming distribution.

32. In EDA, feature engineering starts with

a) Creating new variables from existing
b) Training models
c) Hyperparameter tuning
d) Cross-validation
Correct Answer: a) Creating new variables from existing
📝 Explanation:
Derive interactions, polynomials, or bins to capture patterns.

33. A lag plot in time series EDA detects

a) Seasonality
b) Autocorrelation
c) Trend only
d) Stationarity
Correct Answer: b) Autocorrelation
📝 Explanation:
Plots value against its lagged version to show serial dependence.

34. Z-score for outlier detection is

a) (x - mean) / SD
b) x / mean
c) mean - x
d) SD / x
Correct Answer: a) (x - mean) / SD
📝 Explanation:
|Z| > 3 often flags outliers assuming normality.

35. Count plots are used for

a) Numerical distributions
b) Categorical frequencies
c) Bivariate relations
d) Multivariate analysis
Correct Answer: b) Categorical frequencies
📝 Explanation:
Similar to bar charts but for single categorical variable counts.

36. In EDA, dimensionality reduction preview uses

a) PCA scree plot
b) Histogram
c) Box plot
d) Scatter plot
Correct Answer: a) PCA scree plot
📝 Explanation:
Shows explained variance by components to decide retention.

37. Median is robust to

a) Outliers
b) Symmetry
c) Normality
d) Skewness
Correct Answer: a) Outliers
📝 Explanation:
Unlike mean, median resists extreme values.

38. Joint plots in Seaborn combine

a) Histogram and bar
b) Scatter, marginal histograms, and correlation
c) Box and violin
d) Line and area
Correct Answer: b) Scatter, marginal histograms, and correlation
📝 Explanation:
For bivariate EDA with univariate margins.

39. Variance is the average of

a) Squared deviations from mean
b) Absolute deviations from median
c) Differences between max and min
d) Frequencies
Correct Answer: a) Squared deviations from mean
📝 Explanation:
Variance = Σ(x_i - μ)^2 / n, measuring spread.

40. For text data in EDA, start with

a) Word clouds or n-gram frequencies
b) Numerical scaling
c) Correlation matrix
d) Time series decomposition
Correct Answer: a) Word clouds or n-gram frequencies
📝 Explanation:
Visualize common terms and phrases in unstructured text.

41. Autocorrelation function (ACF) plot identifies

a) Cross-variable correlations
b) Serial correlations in time series
c) Outlier impacts
d) Missing patterns
Correct Answer: b) Serial correlations in time series
📝 Explanation:
ACF shows correlation of series with its lags.

42. In EDA, data types include

a) Only numerical
b) Numerical, categorical, ordinal, datetime
c) Only categorical
d) Binary only
Correct Answer: b) Numerical, categorical, ordinal, datetime
📝 Explanation:
Understanding types guides appropriate analysis and visualization.

43. A facet grid in EDA allows

a) Subplotting by categories
b) 3D plotting
c) Animation
d) Interactive zooming
Correct Answer: a) Subplotting by categories
📝 Explanation:
Splits plots into a grid conditioned on variables for comparisons.

44. Pearson correlation assumes

a) Linear relationship and normality
b) Non-linear only
c) Categorical data
d) No assumptions
Correct Answer: a) Linear relationship and normality
📝 Explanation:
For continuous variables; use Spearman for non-parametric.

45. Box plot whiskers typically extend to

a) 1.5 * IQR from quartiles
b) Full range
c) Mean ± SD
d) Median only
Correct Answer: a) 1.5 * IQR from quartiles
📝 Explanation:
Beyond whiskers are potential outliers.

46. In EDA, resampling checks

a) Stability of statistics
b) Model accuracy
c) Feature importance
d) Hyperparameters
Correct Answer: a) Stability of statistics
📝 Explanation:
Bootstrap or jackknife assesses variability in estimates.

47. Cramér's V measures

a) Association between categorical variables
b) Linear correlation
c) Outlier distance
d) Distribution shape
Correct Answer: a) Association between categorical variables
📝 Explanation:
Ranges 0-1; chi-square based for nominal data.

48. Log transformation in EDA is used to

a) Handle skewed data
b) Encode categories
c) Fill missings
d) Detect trends
Correct Answer: a) Handle skewed data
📝 Explanation:
Reduces right skew, stabilizing variance.

49. A swarm plot displays

a) Points without overlap for categorical
b) Dense lines
c) Heat intensities
d) 3D surfaces
Correct Answer: a) Points without overlap for categorical
📝 Explanation:
Shows individual data points spread to avoid stacking.

50. In multivariate EDA, parallel coordinates plot

a) High-dimensional data as lines
b) Pairwise scatters
c) Time sequences
d) Density contours
Correct Answer: a) High-dimensional data as lines
📝 Explanation:
Each line represents an observation across normalized axes.

51. The range is

a) Max - Min
b) Q3 - Q1
c) Mean - Median
d) SD * 2
Correct Answer: a) Max - Min
📝 Explanation:
Simplest spread measure, sensitive to outliers.

52. For geospatial data in EDA, use

a) Choropleth maps
b) Bar charts
c) Histograms
d) Line plots
Correct Answer: a) Choropleth maps
📝 Explanation:
Color regions by data values for spatial patterns.

53. Chi-square test in EDA checks

a) Independence between categorical variables
b) Normality
c) Linearity
d) Homoscedasticity
Correct Answer: a) Independence between categorical variables
📝 Explanation:
Null hypothesis: no association in contingency tables.

54. Strip plots show

a) Jittered points for categorical
b) Smooth densities
c) Box summaries
d) Violin shapes
Correct Answer: a) Jittered points for categorical
📝 Explanation:
Adds random noise to positions to reveal overplotting.

55. In EDA, cross-validation previews

a) Model performance
b) Data leakage
c) Feature stability
d) All of the above
Correct Answer: d) All of the above
📝 Explanation:
Early CV checks generalization before full modeling.

56. Quantile-quantile (Q-Q) plot compares

a) Sample to theoretical distribution
b) Two samples
c) Time lags
d) Spatial points
Correct Answer: a) Sample to theoretical distribution
📝 Explanation:
Deviations from line indicate non-conformity.

57. For ordinal data, use

a) Ordinal encoding
b) One-hot
c) Frequency encoding
d) Target encoding
Correct Answer: a) Ordinal encoding
📝 Explanation:
Assigns integers preserving order, unlike nominal.

58. Partial dependence plots explain

a) Feature effects in models
b) Data distributions
c) Correlations
d) Outliers
Correct Answer: a) Feature effects in models
📝 Explanation:
Shows marginal effect of a feature on prediction.

59. In EDA, binning continuous data creates

a) Categorical intervals
b) Scaled values
c) Polynomials
d) Interactions
Correct Answer: a) Categorical intervals
📝 Explanation:
Reduces noise and reveals patterns in discretized form.

60. Spearman's rank correlation is

a) Non-parametric for monotonic relations
b) Only for linear
c) For categorical
d) Ignores ranks
Correct Answer: a) Non-parametric for monotonic relations
📝 Explanation:
Based on ranks, robust to non-normality.

61. A ridgeline plot visualizes

a) Overlapping density distributions
b) Scatter clusters
c) Bar comparisons
d) Line trends
Correct Answer: a) Overlapping density distributions
📝 Explanation:
Stacks shifted KDEs for category comparisons.

62. EDA documents findings via

a) Notebooks like Jupyter
b) Only plots
c) Models
d) Databases
Correct Answer: a) Notebooks like Jupyter
📝 Explanation:
Combines code, visuals, and narrative for reproducibility.

63. Theil's U measures

a) Asymmetric association for categorical
b) Symmetry in distributions
c) Linear strength
d) Outlier probability
Correct Answer: a) Asymmetric association for categorical
📝 Explanation:
Uncertainty coefficient based on entropy.

64. In time series, decomposition separates

a) Trend, seasonality, residual
b) Mean, variance, skew
c) Lags, leads, current
d) Frequencies, amplitudes
Correct Answer: a) Trend, seasonality, residual
📝 Explanation:
Additive or multiplicative models isolate components.

65. For high-cardinality categoricals in EDA, use

a) Frequency or target encoding
b) One-hot for all
c) Ignore
d) Bin to low
Correct Answer: a) Frequency or target encoding
📝 Explanation:
Reduces dimensions while retaining information.

66. A contour plot shows

a) Density levels in 2D
b) 3D surfaces
c) Bar heights
d) Point clusters
Correct Answer: a) Density levels in 2D
📝 Explanation:
Lines or colors indicate constant value regions.

67. EDA hypothesis generation leads to

a) Confirmatory analysis
b) Data collection
c) Model deployment
d) Reporting
Correct Answer: a) Confirmatory analysis
📝 Explanation:
Patterns suggest testable hypotheses for further stats.

68. Mahalanobis distance detects

a) Multivariate outliers
b) Univariate means
c) Correlations
d) Skew
Correct Answer: a) Multivariate outliers
📝 Explanation:
Accounts for covariance, unlike Euclidean.

69. In EDA, groupby operations compute

a) Aggregates by categories
b) Global stats
c) Plots
d) Encodings
Correct Answer: a) Aggregates by categories
📝 Explanation:
Like mean by group in pandas for subgroup analysis.

70. A hexbin plot is a

a) 2D histogram with hexagons
b) Line with error bands
c) Categorical strip
d) Density violin
Correct Answer: a) 2D histogram with hexagons
📝 Explanation:
Efficient for dense scatter data to show counts.

71. EDA for regression checks

a) Linearity, homoscedasticity via residuals
b) Class balance
c) Clusters
d) Survival rates
Correct Answer: a) Linearity, homoscedasticity via residuals
📝 Explanation:
Scatter of residuals vs fitted predicts assumptions.

72. Polychoric correlation for

a) Ordinal variables
b) Continuous only
c) Binary
d) Spatial
Correct Answer: a) Ordinal variables
📝 Explanation:
Assumes underlying continuous latent variables.

73. In EDA, melting data changes

a) Wide to long format
b) Long to wide
c) Categorical to numerical
d) Time to static
Correct Answer: a) Wide to long format
📝 Explanation:
Facilitates plotting multiple series.

74. SHAP values in EDA preview

a) Feature contributions post-model
b) Data cleaning
c) Visualization
d) Encoding
Correct Answer: a) Feature contributions post-model
📝 Explanation:
Explains individual predictions for interpretability.

75. For survival data EDA, Kaplan-Meier estimates

a) Survival function
b) Hazard rates
c) Cure fractions
d) All of the above
Correct Answer: d) All of the above
📝 Explanation:
Non-parametric curve for time-to-event analysis.

76. A sunburst plot visualizes

a) Hierarchical data
b) Network graphs
c) Time flows
d) Geospatial
Correct Answer: a) Hierarchical data
📝 Explanation:
Nested rings show proportions in categories.

77. In EDA, Levene's test checks

a) Equal variances
b) Normality
c) Independence
d) Linearity
Correct Answer: a) Equal variances
📝 Explanation:
Robust to non-normality for ANOVA assumptions.

78. Treemap displays

a) Proportional areas for categories
b) Lines for trends
c) Points for relations
d) Densities for shapes
Correct Answer: a) Proportional areas for categories
📝 Explanation:
Rectangles sized by value in hierarchical layout.

79. EDA for clustering previews with

a) Elbow method on k-means
b) ROC curves
c) Confusion matrices
d) Lift charts
Correct Answer: a) Elbow method on k-means
📝 Explanation:
Plots inertia vs k to suggest optimal clusters.

80. Biserial correlation for

a) Continuous and binary variables
b) Two continuous
c) Two binary
d) Ordinal-binary
Correct Answer: a) Continuous and binary variables
📝 Explanation:
Point-biserial if binary is true dichotomy.

81. In EDA, pivot_longer in R or melt in Python

a) Reshapes wide to long
b) Aggregates
c) Filters
d) Joins
Correct Answer: a) Reshapes wide to long
📝 Explanation:
Prepares tidy data for analysis.

82. LIME explains

a) Local model interpretations
b) Global correlations
c) Data summaries
d) Outlier reasons
Correct Answer: a) Local model interpretations
📝 Explanation:
Approximates black-box models locally with interpretable ones.

83. For network data EDA, use

a) Graph visualizations like node-link
b) Bar charts
c) Histograms
d) Scatter only
Correct Answer: a) Graph visualizations like node-link
📝 Explanation:
Shows nodes and edges for connectivity patterns.

84. Anderson-Darling test assesses

a) Goodness-of-fit to distribution
b) Equal means
c) Variances
d) Independence
Correct Answer: a) Goodness-of-fit to distribution
📝 Explanation:
Sensitive to tail deviations from normal.

85. Sankey diagram illustrates

a) Flow between categories
b) Proportions
c) Trends
d) Relations
Correct Answer: a) Flow between categories
📝 Explanation:
Width proportional to magnitude in multi-stage processes.

86. EDA for classification includes

a) Class separation via LDA plot
b) Survival curves
c) Hazard functions
d) Time decompositions
Correct Answer: a) Class separation via LDA plot
📝 Explanation:
Linear Discriminant Analysis projects for discriminability.

87. Tetrachoric correlation assumes

a) Binary data from underlying continuous
b) Ordinal only
c) Continuous
d) Categorical nominal
Correct Answer: a) Binary data from underlying continuous
📝 Explanation:
For dichotomous variables implying latent scale.

88. In EDA, dcast or pivot in tools

a) Long to wide reshaping
b) Wide to long
c) Sorting
d) Filtering
Correct Answer: a) Long to wide reshaping
📝 Explanation:
Spreads variables for certain summaries.

89. Counterfactual explanations in EDA show

a) What-if scenarios for predictions
b) Data distributions
c) Correlations
d) Outliers
Correct Answer: a) What-if scenarios for predictions
📝 Explanation:
Alters features to see outcome changes.

90. For audio data EDA, spectrograms visualize

a) Frequency over time
b) Waveforms
c) Amplitudes only
d) Durations
Correct Answer: a) Frequency over time
📝 Explanation:
2D representation of signal spectrum.

91. Kolmogorov-Smirnov test compares

a) Distributions
b) Means
c) Variances
d) Correlations
Correct Answer: a) Distributions
📝 Explanation:
Maximum difference in empirical CDFs.

92. Alluvial plot shows

a) Changes in category proportions over stages
b) Static hierarchies
c) Networks
d) Maps
Correct Answer: a) Changes in category proportions over stages
📝 Explanation:
Like Sankey but for categorical shifts.

93. In EDA for anomaly detection, isolation forest previews

a) Anomaly scores
b) Class probabilities
c) Regressions
d) Clusters
Correct Answer: a) Anomaly scores
📝 Explanation:
Path length in trees indicates isolation.

94. Phi coefficient for

a) 2x2 contingency association
b) Multiple categories
c) Continuous
d) Ordinal
Correct Answer: a) 2x2 contingency association
📝 Explanation:
Pearson's r for binary variables.

95. Gather function in tidyverse

a) Wide to long
b) Long to wide
c) Summarize
d) Mutate
Correct Answer: a) Wide to long
📝 Explanation:
Tidy data principle for analysis.

96. Anchored explanations focus on

a) Comparison to baseline prediction
b) Global averages
c) Local densities
d) Outlier distances
Correct Answer: a) Comparison to baseline prediction
📝 Explanation:
Highlights feature impacts relative to reference.

97. For image data EDA, use

a) Pixel histograms or t-SNE embeddings
b) Bar charts
c) Line plots
d) Scatter only
Correct Answer: a) Pixel histograms or t-SNE embeddings
📝 Explanation:
Visualize color distributions or latent spaces.

98. Shapiro-Wilk test for

a) Normality
b) Equal variances
c) Independence
d) Homoscedasticity
Correct Answer: a) Normality
📝 Explanation:
Powerful for small samples.

99. Chord diagram visualizes

a) Interconnections between categories
b) Flows
c) Hierarchies
d) Trends
Correct Answer: a) Interconnections between categories
📝 Explanation:
Arc segments with linking ribbons.

100. EDA for recommendation systems includes

a) User-item matrix sparsity
b) Survival analysis
c) Time series only
d) Geospatial clustering
Correct Answer: a) User-item matrix sparsity
📝 Explanation:
Percentage of missing interactions.

101. Contingency coefficient for

a) Categorical association beyond 2x2
b) Binary only
c) Continuous
d) Ordinal
Correct Answer: a) Categorical association beyond 2x2
📝 Explanation:
Chi-square based, asymmetric.

102. Spread function in R

a) Long to wide
b) Wide to long
c) Group by
d) Filter
Correct Answer: a) Long to wide
📝 Explanation:
Pivots values into columns.

103. Prototype-based explanations use

a) Nearest neighbors as exemplars
b) Rule sets
c) Trees
d) Linear models
Correct Answer: a) Nearest neighbors as exemplars
📝 Explanation:
Shows similar cases for context.

104. For video data EDA, frame sampling and

a) Optical flow analysis
b) Static images
c) Audio only
d) Text overlays
Correct Answer: a) Optical flow analysis
📝 Explanation:
Motion vectors between frames.

105. Jarque-Bera test combines

a) Skewness and kurtosis for normality
b) Means and variances
c) Correlations
d) Outliers
Correct Answer: a) Skewness and kurtosis for normality
📝 Explanation:
Omnibus test against normal.

106. Parallel sets plot for

a) Multi-category flows like alluvial
b) Single variables
c) Continuous
d) Spatial
Correct Answer: a) Multi-category flows like alluvial
📝 Explanation:
Ribbon bands for proportions.

107. In reinforcement learning EDA, check

a) State-action distributions
b) Class labels
c) Targets
d) Features only
Correct Answer: a) State-action distributions
📝 Explanation:
Exploration coverage in environments.

108. Lambda coefficient for

a) Asymmetric nominal association
b) Symmetric
c) Ordinal
d) Continuous
Correct Answer: a) Asymmetric nominal association
📝 Explanation:
Predictive reduction in error.

109. Unpivot in data tools

a) Wide to long
b) Long to wide
c) Aggregate
d) Join
Correct Answer: a) Wide to long
📝 Explanation:
Standardizes column structure.

110. Subgroup explanations in EDA target

a) Specific data slices
b) Global model
c) Individual points
d) Random samples
Correct Answer: a) Specific data slices
📝 Explanation:
Tailored insights for segments.

111. For graph data EDA beyond basics, centrality measures like

a) Degree, betweenness
b) Means
c) Variances
d) Skews
Correct Answer: a) Degree, betweenness
📝 Explanation:
Node importance in networks.

112. D'Agostino's K-squared test for

a) Normality via skew and kurtosis
b) Equal variances
c) Independence
d) Trends
Correct Answer: a) Normality via skew and kurtosis
📝 Explanation:
Asymptotic chi-square.

113. Mosaic plot for

a) Categorical contingency visualization
b) Continuous densities
c) Time series
d) Spatial
Correct Answer: a) Categorical contingency visualization
📝 Explanation:
Tiled bars for associations.

114. EDA in causal inference previews with

a) DAGs for confounding
b) ROC
c) MSE
d) R2
Correct Answer: a) DAGs for confounding
📝 Explanation:
Directed Acyclic Graphs map relationships.

115. Uncertainty coefficient for

a) Nominal predictive association
b) Ordinal
c) Binary
d) Continuous
Correct Answer: a) Nominal predictive association
📝 Explanation:
Entropy-based, asymmetric.

116. Pivot_table in pandas for

a) Multi-index aggregation
b) Simple sums
c) Plots
d) Encodes
Correct Answer: a) Multi-index aggregation
📝 Explanation:
Flexible crosstabs with functions.

117. Contrastive explanations compare

a) Prediction to counterfactual
b) Features globally
c) Data points
d) Models
Correct Answer: a) Prediction to counterfactual
📝 Explanation:
Minimal changes for outcome flip.

118. For tabular data EDA, automated tools like

a) Pandas Profiling
b) Manual plots only
c) Models first
d) Cleaning only
Correct Answer: a) Pandas Profiling
📝 Explanation:
Generates comprehensive HTML reports.

119. Lilliefors test is a

a) KS variant without specified distribution
b) T-test
c) Chi-square
d) F-test
Correct Answer: a) KS variant without specified distribution
📝 Explanation:
For normality, parameters estimated from data.

120. Dot plot alternative to

a) Bar for small categories
b) Line for time
c) Scatter for relations
d) Histogram for continuous
Correct Answer: a) Bar for small categories
📝 Explanation:
Points on axis for frequencies.

121. In federated learning EDA, focus on

a) Local dataset summaries without sharing
b) Centralized data
c) Full merges
d) Global models only
Correct Answer: a) Local dataset summaries without sharing
📝 Explanation:
Preserves privacy in distributed settings.

122. Goodman-Kruskal gamma for

a) Ordinal association
b) Nominal
c) Binary
d) Continuous
Correct Answer: a) Ordinal association
📝 Explanation:
Accounts for tied pairs in ranks.

123. Reshape in Python for

a) Data format changes
b) Calculations
c) Visuals
d) Storage
Correct Answer: a) Data format changes
📝 Explanation:
Stack, melt, pivot for tidy data.

124. Input gradient explanations use

a) Model derivatives w.r.t. input
b) Averages
c) Neighbors
d) Rules
Correct Answer: a) Model derivatives w.r.t. input
📝 Explanation:
Sensitivity of prediction to features.

125. For sensor data EDA, time-frequency analysis like

a) Wavelet transforms
b) Simple averages
c) Static plots
d) Categorical bars
Correct Answer: a) Wavelet transforms
📝 Explanation:
Localizes events in time and scale.

126. Cramér-von Mises test for

a) Goodness-of-fit
b) Means
c) Variances
d) Correlations
Correct Answer: a) Goodness-of-fit
📝 Explanation:
Integral of squared CDF differences.

127. Streamgraph for

a) Stacked area over time
b) Static proportions
c) Networks
d) Maps
Correct Answer: a) Stacked area over time
📝 Explanation:
Centered for flowing appearance.

128. EDA in A/B testing checks

a) Group balance and power
b) Model fits
c) Clusters
d) Outliers only
Correct Answer: a) Group balance and power
📝 Explanation:
Pre-test for valid comparisons.

129. Kappa coefficient for

a) Inter-rater agreement beyond chance
b) Association strength
c) Prediction error
d) Distribution fit
Correct Answer: a) Inter-rater agreement beyond chance
📝 Explanation:
For categorical ratings.

130. Tidy data principles in EDA ensure

a) One variable per column, observation per row
b) Wide formats
c) Mixed types
d) Duplicates
Correct Answer: a) One variable per column, observation per row
📝 Explanation:
Facilitates manipulation and analysis.

131. Guided backpropagation for

a) Salient feature visualization in images
b) Tabular data
c) Text
d) Audio
Correct Answer: a) Salient feature visualization in images
📝 Explanation:
Modifies gradients for positive relevance.

132. For financial time series EDA, candlestick charts show

a) OHLC prices
b) Volumes only
c) Returns
d) Volatilities
Correct Answer: a) OHLC prices
📝 Explanation:
Open, High, Low, Close for volatility.

133. Anderson-Rubin test in

a) Instrumental variable strength
b) Normality
c) Equal means
d) Variances
Correct Answer: a) Instrumental variable strength
📝 Explanation:
Weak instrument detection.

134. Waffle chart for

a) Proportional parts like pie alternative
b) Flows
c) Hierarchies
d) Relations
Correct Answer: a) Proportional parts like pie alternative
📝 Explanation:
Grid squares for percentages.

135. EDA for NLP includes

a) Token length distributions, vocab size
b) Numerical correlations
c) Image histograms
d) Audio spectra
Correct Answer: a) Token length distributions, vocab size
📝 Explanation:
Text-specific summaries.

136. Yule's Q for

a) 2x2 table association
b) Multiple categories
c) Ordinal
d) Continuous
Correct Answer: a) 2x2 table association
📝 Explanation:
Dichotomous measure from odds ratio.

137. Stack in pandas for

a) MultiIndex to columns
b) Columns to rows
c) Rows to columns
d) Filtering
Correct Answer: b) Columns to rows
📝 Explanation:
Longer format from wide.

138. Layer-wise relevance propagation for

a) Deep net heatmaps
b) Shallow models
c) Trees
d) Rules
Correct Answer: a) Deep net heatmaps
📝 Explanation:
Backpropagates relevance scores.

[flat_pm id="7165"]
[flat_pm id="7166"]
[flat_pm id="7168"]
← Previous: 120 Data Cleaning and Preprocessing in Data Analysis - MCQs
Next →: 50 Hypothesis Testing in Data Analysis - MCQs
Correlation and Covariance

60 Important Correlation and Covariance MCQs

This set of 60 MCQs covers the fundamentals of correlation and covariance, including types like Pearson and Spearman, their calculations,…

By MCQs Generator
50 Regression Analysis in Data Analysis - MCQs

50 Regression Analysis in Data Analysis MCQs

These 50 MCQs covers fundamental concepts in regression analysis, including linear and multiple regression, assumptions, diagnostics, and interpretation. Ideal for…

By MCQs Generator
Hypothesis Testing in Data Analysis

50 Hypothesis Testing in Data Analysis - MCQs

This set of 50 MCQs explores key concepts in hypothesis testing, including null and alternative hypotheses, p-values, test statistics, error…

By MCQs Generator
[flat_pm id="7160"]