MCQs Generator

MCQs Generator - Fixed Responsive Header
Home » Directory » 1000 Data Analysis MCQ » 130 Exploratory Data Analysis (EDA) MCQs

130 Exploratory Data Analysis (EDA) MCQs

MCQs cover the fundamentals of Exploratory Data Analysis, covering data summarization, visualization techniques, handling anomalies, and inferring patterns from datasets. Ideal for data analysts and scientists to reinforce EDA practices using statistical and graphical methods.

1. What is the primary goal of Exploratory Data Analysis (EDA)?

a) To build predictive models
b) To understand the data and uncover patterns
c) To deploy machine learning algorithms
d) To clean data for production
✅ Correct Answer: b) To understand the data and uncover patterns
📝 Explanation:
EDA aims to summarize main characteristics, often using visual methods, to reveal insights and detect anomalies before modeling.

2. Which statistic measures the central tendency of a dataset?

a) Variance
b) Mean
c) Skewness
d) Kurtosis
✅ Correct Answer: b) Mean
📝 Explanation:
The mean is the average value, representing the center of the data distribution.

3. A histogram is primarily used for visualizing

a) Relationships between two variables
b) Distribution of a single numerical variable
c) Categorical data frequencies
d) Time series trends
✅ Correct Answer: b) Distribution of a single numerical variable
📝 Explanation:
Histograms divide continuous data into bins to show frequency distribution.

4. What does the interquartile range (IQR) represent?

a) The difference between max and min
b) The middle 50% of the data
c) The standard deviation
d) The mode
✅ Correct Answer: b) The middle 50% of the data
📝 Explanation:
IQR = Q3 - Q1, measuring spread of the central half of the data.

5. In EDA, missing values are often identified using

a) Correlation matrices
b) Summary functions like isnull()
c) Scatter plots
d) Box plots
✅ Correct Answer: b) Summary functions like isnull()
📝 Explanation:
Functions like df.isnull().sum() count missing entries per column.

6. Which plot is best for detecting outliers in numerical data?

a) Bar chart
b) Box plot
c) Line plot
d) Pie chart
✅ Correct Answer: b) Box plot
📝 Explanation:
Box plots display quartiles and whiskers, highlighting points beyond 1.5 * IQR.

7. The correlation coefficient ranges from

a) 0 to 1
b) -1 to 1
c) -∞ to ∞
d) 0 to ∞
✅ Correct Answer: b) -1 to 1
📝 Explanation:
Pearson's r measures linear relationship strength and direction between -1 and 1.

8. Skewness measures the

a) Central tendency
b) Asymmetry of the distribution
c) Peakedness
d) Spread
✅ Correct Answer: b) Asymmetry of the distribution
📝 Explanation:
Positive skew indicates right tail, negative left; zero for symmetric.

9. A scatter plot visualizes

a) Single variable distribution
b) Relationship between two variables
c) Categorical comparisons
d) Temporal sequences
✅ Correct Answer: b) Relationship between two variables
📝 Explanation:
Scatter plots plot points for two continuous variables to show correlation or trends.

10. What is the purpose of a pair plot in EDA?

a) To show univariate distributions only
b) To visualize pairwise relationships in multivariate data
c) To compute summary statistics
d) To handle missing values
✅ Correct Answer: b) To visualize pairwise relationships in multivariate data
📝 Explanation:
Pair plots create a matrix of scatter plots and histograms for all variable pairs.

11. Kurtosis describes the

a) Tail heaviness relative to normal distribution
b) Central tendency
c) Symmetry
d) Range
✅ Correct Answer: a) Tail heaviness relative to normal distribution
📝 Explanation:
High kurtosis indicates heavy tails and peaked center; low indicates flat.

12. For categorical data, which visualization is appropriate?

a) Histogram
b) Bar chart
c) Scatter plot
d) Line plot
✅ Correct Answer: b) Bar chart
📝 Explanation:
Bar charts display frequencies or counts for discrete categories.

13. The standard deviation measures

a) Average value
b) Dispersion around the mean
c) Median spread
d) Mode frequency
✅ Correct Answer: b) Dispersion around the mean
📝 Explanation:
SD quantifies data variability; sqrt of variance.

14. In EDA, data scaling is often checked for

a) Multicollinearity
b) Feature ranges and distributions
c) Missing values
d) Categorical encoding
✅ Correct Answer: b) Feature ranges and distributions
📝 Explanation:
Scaling ensures features contribute equally, checked via min-max or z-scores.

15. A heatmap is used to visualize

a) Single variable trends
b) Correlation matrices
c) Outlier positions
d) Missing data patterns
✅ Correct Answer: b) Correlation matrices
📝 Explanation:
Heatmaps color-code values in a matrix, ideal for correlations.

16. What indicates a normal distribution in EDA?

a) Skewness = 0, Kurtosis = 3
b) Skewness = 1, Kurtosis = 4
c) Mean > Median
d) High variance
✅ Correct Answer: a) Skewness = 0, Kurtosis = 3
📝 Explanation:
Normal distribution is symmetric (skew=0) with kurtosis=3 (mesokurtic).

17. Violin plots combine

a) Box plot and histogram
b) Box plot and density plot
c) Scatter and line
d) Bar and pie
✅ Correct Answer: b) Box plot and density plot
📝 Explanation:
Violin plots show distribution shape via kernel density and quartiles via box.

18. Multicollinearity is detected using

a) VIF (Variance Inflation Factor)
b) IQR
c) Mean
d) Mode
✅ Correct Answer: a) VIF (Variance Inflation Factor)
📝 Explanation:
VIF > 5-10 indicates high multicollinearity among features.

19. For time series EDA, which plot shows trends over time?

a) Scatter plot
b) Line plot
c) Bar chart
d) Histogram
✅ Correct Answer: b) Line plot
📝 Explanation:
Line plots connect data points chronologically to reveal patterns.

20. The mode is the value that

a) Sums to total
b) Occurs most frequently
c) Divides data in half
d) Measures spread
✅ Correct Answer: b) Occurs most frequently
📝 Explanation:
Mode identifies the most common category or value in unimodal data.

21. In EDA, pivot tables are used for

a) Aggregating data by categories
b) Plotting distributions
c) Detecting outliers
d) Scaling features
✅ Correct Answer: a) Aggregating data by categories
📝 Explanation:
Pivot tables summarize data with rows, columns, and values for cross-tabulation.

22. A Q-Q plot assesses

a) Normality of distribution
b) Correlation strength
c) Missing values
d) Categorical balance
✅ Correct Answer: a) Normality of distribution
📝 Explanation:
Q-Q plots compare quantiles to theoretical normal; straight line indicates normality.

23. Handling outliers in EDA often involves

a) Ignoring them always
b) Capping or removing after investigation
c) Increasing dataset size
d) Changing data type
✅ Correct Answer: b) Capping or removing after investigation
📝 Explanation:
Outliers may be errors or insights; decisions based on domain knowledge.

24. Categorical variables are encoded in EDA using

a) One-hot encoding
b) Z-score normalization
c) Log transformation
d) Binning
✅ Correct Answer: a) One-hot encoding
📝 Explanation:
One-hot creates binary columns for categories to avoid ordinal assumptions.

25. The coefficient of variation (CV) is

a) SD / Mean * 100%
b) Mean / SD
c) Variance / Mean
d) IQR / Median
✅ Correct Answer: a) SD / Mean * 100%
📝 Explanation:
CV measures relative variability, useful for comparing dispersion across datasets.

26. Seaborn library in Python is popular for EDA because it

a) Handles missing data
b) Provides statistical visualizations
c) Trains models
d) Cleans text
✅ Correct Answer: b) Provides statistical visualizations
📝 Explanation:
Seaborn builds on Matplotlib for attractive, informative plots like heatmaps.

27. In bivariate analysis, a low correlation implies

a) Strong linear relationship
b) Weak or no linear relationship
c) Causation
d) Identical distributions
✅ Correct Answer: b) Weak or no linear relationship
📝 Explanation:
Correlation near 0 suggests little linear association between variables.

28. Data profiling in EDA includes

a) Only visualizations
b) Summary statistics and data quality checks
c) Model evaluation
d) Feature selection
✅ Correct Answer: b) Summary statistics and data quality checks
📝 Explanation:
Profiling overviews structure, types, missingness, and stats.

29. A kernel density estimate (KDE) plot shows

a) Discrete counts
b) Smooth probability density
c) Categorical proportions
d) Error bars
✅ Correct Answer: b) Smooth probability density
📝 Explanation:
KDE approximates continuous distribution using a kernel function.

30. For imbalanced classes in EDA, check

a) Class distribution via bar plots
b) Only numerical features
c) Correlation only
d) Outliers exclusively
✅ Correct Answer: a) Class distribution via bar plots
📝 Explanation:
Visualize target variable frequencies to identify imbalance.

31. The five-number summary includes

a) Min, Q1, Median, Q3, Max
b) Mean, Median, Mode, SD, Variance
c) Skewness, Kurtosis, Mean, Median, Mode
d) Range, IQR, Mean, SD, CV
✅ Correct Answer: a) Min, Q1, Median, Q3, Max
📝 Explanation:
Used in box plots to describe data spread without assuming distribution.

32. In EDA, feature engineering starts with

a) Creating new variables from existing
b) Training models
c) Hyperparameter tuning
d) Cross-validation
✅ Correct Answer: a) Creating new variables from existing
📝 Explanation:
Derive interactions, polynomials, or bins to capture patterns.

33. A lag plot in time series EDA detects

a) Seasonality
b) Autocorrelation
c) Trend only
d) Stationarity
✅ Correct Answer: b) Autocorrelation
📝 Explanation:
Plots value against its lagged version to show serial dependence.

34. Z-score for outlier detection is

a) (x - mean) / SD
b) x / mean
c) mean - x
d) SD / x
✅ Correct Answer: a) (x - mean) / SD
📝 Explanation:
|Z| > 3 often flags outliers assuming normality.

35. Count plots are used for

a) Numerical distributions
b) Categorical frequencies
c) Bivariate relations
d) Multivariate analysis
✅ Correct Answer: b) Categorical frequencies
📝 Explanation:
Similar to bar charts but for single categorical variable counts.

36. In EDA, dimensionality reduction preview uses

a) PCA scree plot
b) Histogram
c) Box plot
d) Scatter plot
✅ Correct Answer: a) PCA scree plot
📝 Explanation:
Shows explained variance by components to decide retention.

37. Median is robust to

a) Outliers
b) Symmetry
c) Normality
d) Skewness
✅ Correct Answer: a) Outliers
📝 Explanation:
Unlike mean, median resists extreme values.

38. Joint plots in Seaborn combine

a) Histogram and bar
b) Scatter, marginal histograms, and correlation
c) Box and violin
d) Line and area
✅ Correct Answer: b) Scatter, marginal histograms, and correlation
📝 Explanation:
For bivariate EDA with univariate margins.

39. Variance is the average of

a) Squared deviations from mean
b) Absolute deviations from median
c) Differences between max and min
d) Frequencies
✅ Correct Answer: a) Squared deviations from mean
📝 Explanation:
Variance = Σ(x_i - μ)^2 / n, measuring spread.

40. For text data in EDA, start with

a) Word clouds or n-gram frequencies
b) Numerical scaling
c) Correlation matrix
d) Time series decomposition
✅ Correct Answer: a) Word clouds or n-gram frequencies
📝 Explanation:
Visualize common terms and phrases in unstructured text.

41. Autocorrelation function (ACF) plot identifies

a) Cross-variable correlations
b) Serial correlations in time series
c) Outlier impacts
d) Missing patterns
✅ Correct Answer: b) Serial correlations in time series
📝 Explanation:
ACF shows correlation of series with its lags.

42. In EDA, data types include

a) Only numerical
b) Numerical, categorical, ordinal, datetime
c) Only categorical
d) Binary only
✅ Correct Answer: b) Numerical, categorical, ordinal, datetime
📝 Explanation:
Understanding types guides appropriate analysis and visualization.

43. A facet grid in EDA allows

a) Subplotting by categories
b) 3D plotting
c) Animation
d) Interactive zooming
✅ Correct Answer: a) Subplotting by categories
📝 Explanation:
Splits plots into a grid conditioned on variables for comparisons.

44. Pearson correlation assumes

a) Linear relationship and normality
b) Non-linear only
c) Categorical data
d) No assumptions
✅ Correct Answer: a) Linear relationship and normality
📝 Explanation:
For continuous variables; use Spearman for non-parametric.

45. Box plot whiskers typically extend to

a) 1.5 * IQR from quartiles
b) Full range
c) Mean ± SD
d) Median only
✅ Correct Answer: a) 1.5 * IQR from quartiles
📝 Explanation:
Beyond whiskers are potential outliers.

46. In EDA, resampling checks

a) Stability of statistics
b) Model accuracy
c) Feature importance
d) Hyperparameters
✅ Correct Answer: a) Stability of statistics
📝 Explanation:
Bootstrap or jackknife assesses variability in estimates.

47. Cramér's V measures

a) Association between categorical variables
b) Linear correlation
c) Outlier distance
d) Distribution shape
✅ Correct Answer: a) Association between categorical variables
📝 Explanation:
Ranges 0-1; chi-square based for nominal data.

48. Log transformation in EDA is used to

a) Handle skewed data
b) Encode categories
c) Fill missings
d) Detect trends
✅ Correct Answer: a) Handle skewed data
📝 Explanation:
Reduces right skew, stabilizing variance.

49. A swarm plot displays

a) Points without overlap for categorical
b) Dense lines
c) Heat intensities
d) 3D surfaces
✅ Correct Answer: a) Points without overlap for categorical
📝 Explanation:
Shows individual data points spread to avoid stacking.

50. In multivariate EDA, parallel coordinates plot

a) High-dimensional data as lines
b) Pairwise scatters
c) Time sequences
d) Density contours
✅ Correct Answer: a) High-dimensional data as lines
📝 Explanation:
Each line represents an observation across normalized axes.

51. The range is

a) Max - Min
b) Q3 - Q1
c) Mean - Median
d) SD * 2
✅ Correct Answer: a) Max - Min
📝 Explanation:
Simplest spread measure, sensitive to outliers.

52. For geospatial data in EDA, use

a) Choropleth maps
b) Bar charts
c) Histograms
d) Line plots
✅ Correct Answer: a) Choropleth maps
📝 Explanation:
Color regions by data values for spatial patterns.

53. Chi-square test in EDA checks

a) Independence between categorical variables
b) Normality
c) Linearity
d) Homoscedasticity
✅ Correct Answer: a) Independence between categorical variables
📝 Explanation:
Null hypothesis: no association in contingency tables.

54. Strip plots show

a) Jittered points for categorical
b) Smooth densities
c) Box summaries
d) Violin shapes
✅ Correct Answer: a) Jittered points for categorical
📝 Explanation:
Adds random noise to positions to reveal overplotting.

55. In EDA, cross-validation previews

a) Model performance
b) Data leakage
c) Feature stability
d) All of the above
✅ Correct Answer: d) All of the above
📝 Explanation:
Early CV checks generalization before full modeling.

56. Quantile-quantile (Q-Q) plot compares

a) Sample to theoretical distribution
b) Two samples
c) Time lags
d) Spatial points
✅ Correct Answer: a) Sample to theoretical distribution
📝 Explanation:
Deviations from line indicate non-conformity.

57. For ordinal data, use

a) Ordinal encoding
b) One-hot
c) Frequency encoding
d) Target encoding
✅ Correct Answer: a) Ordinal encoding
📝 Explanation:
Assigns integers preserving order, unlike nominal.

58. Partial dependence plots explain

a) Feature effects in models
b) Data distributions
c) Correlations
d) Outliers
✅ Correct Answer: a) Feature effects in models
📝 Explanation:
Shows marginal effect of a feature on prediction.

59. In EDA, binning continuous data creates

a) Categorical intervals
b) Scaled values
c) Polynomials
d) Interactions
✅ Correct Answer: a) Categorical intervals
📝 Explanation:
Reduces noise and reveals patterns in discretized form.

60. Spearman's rank correlation is

a) Non-parametric for monotonic relations
b) Only for linear
c) For categorical
d) Ignores ranks
✅ Correct Answer: a) Non-parametric for monotonic relations
📝 Explanation:
Based on ranks, robust to non-normality.

61. A ridgeline plot visualizes

a) Overlapping density distributions
b) Scatter clusters
c) Bar comparisons
d) Line trends
✅ Correct Answer: a) Overlapping density distributions
📝 Explanation:
Stacks shifted KDEs for category comparisons.

62. EDA documents findings via

a) Notebooks like Jupyter
b) Only plots
c) Models
d) Databases
✅ Correct Answer: a) Notebooks like Jupyter
📝 Explanation:
Combines code, visuals, and narrative for reproducibility.

63. Theil's U measures

a) Asymmetric association for categorical
b) Symmetry in distributions
c) Linear strength
d) Outlier probability
✅ Correct Answer: a) Asymmetric association for categorical
📝 Explanation:
Uncertainty coefficient based on entropy.

64. In time series, decomposition separates

a) Trend, seasonality, residual
b) Mean, variance, skew
c) Lags, leads, current
d) Frequencies, amplitudes
✅ Correct Answer: a) Trend, seasonality, residual
📝 Explanation:
Additive or multiplicative models isolate components.

65. For high-cardinality categoricals in EDA, use

a) Frequency or target encoding
b) One-hot for all
c) Ignore
d) Bin to low
✅ Correct Answer: a) Frequency or target encoding
📝 Explanation:
Reduces dimensions while retaining information.

66. A contour plot shows

a) Density levels in 2D
b) 3D surfaces
c) Bar heights
d) Point clusters
✅ Correct Answer: a) Density levels in 2D
📝 Explanation:
Lines or colors indicate constant value regions.

67. EDA hypothesis generation leads to

a) Confirmatory analysis
b) Data collection
c) Model deployment
d) Reporting
✅ Correct Answer: a) Confirmatory analysis
📝 Explanation:
Patterns suggest testable hypotheses for further stats.

68. Mahalanobis distance detects

a) Multivariate outliers
b) Univariate means
c) Correlations
d) Skew
✅ Correct Answer: a) Multivariate outliers
📝 Explanation:
Accounts for covariance, unlike Euclidean.

69. In EDA, groupby operations compute

a) Aggregates by categories
b) Global stats
c) Plots
d) Encodings
✅ Correct Answer: a) Aggregates by categories
📝 Explanation:
Like mean by group in pandas for subgroup analysis.

70. A hexbin plot is a

a) 2D histogram with hexagons
b) Line with error bands
c) Categorical strip
d) Density violin
✅ Correct Answer: a) 2D histogram with hexagons
📝 Explanation:
Efficient for dense scatter data to show counts.

71. EDA for regression checks

a) Linearity, homoscedasticity via residuals
b) Class balance
c) Clusters
d) Survival rates
✅ Correct Answer: a) Linearity, homoscedasticity via residuals
📝 Explanation:
Scatter of residuals vs fitted predicts assumptions.

72. Polychoric correlation for

a) Ordinal variables
b) Continuous only
c) Binary
d) Spatial
✅ Correct Answer: a) Ordinal variables
📝 Explanation:
Assumes underlying continuous latent variables.

73. In EDA, melting data changes

a) Wide to long format
b) Long to wide
c) Categorical to numerical
d) Time to static
✅ Correct Answer: a) Wide to long format
📝 Explanation:
Facilitates plotting multiple series.

74. SHAP values in EDA preview

a) Feature contributions post-model
b) Data cleaning
c) Visualization
d) Encoding
✅ Correct Answer: a) Feature contributions post-model
📝 Explanation:
Explains individual predictions for interpretability.

75. For survival data EDA, Kaplan-Meier estimates

a) Survival function
b) Hazard rates
c) Cure fractions
d) All of the above
✅ Correct Answer: d) All of the above
📝 Explanation:
Non-parametric curve for time-to-event analysis.

76. A sunburst plot visualizes

a) Hierarchical data
b) Network graphs
c) Time flows
d) Geospatial
✅ Correct Answer: a) Hierarchical data
📝 Explanation:
Nested rings show proportions in categories.

77. In EDA, Levene's test checks

a) Equal variances
b) Normality
c) Independence
d) Linearity
✅ Correct Answer: a) Equal variances
📝 Explanation:
Robust to non-normality for ANOVA assumptions.

78. Treemap displays

a) Proportional areas for categories
b) Lines for trends
c) Points for relations
d) Densities for shapes
✅ Correct Answer: a) Proportional areas for categories
📝 Explanation:
Rectangles sized by value in hierarchical layout.

79. EDA for clustering previews with

a) Elbow method on k-means
b) ROC curves
c) Confusion matrices
d) Lift charts
✅ Correct Answer: a) Elbow method on k-means
📝 Explanation:
Plots inertia vs k to suggest optimal clusters.

80. Biserial correlation for

a) Continuous and binary variables
b) Two continuous
c) Two binary
d) Ordinal-binary
✅ Correct Answer: a) Continuous and binary variables
📝 Explanation:
Point-biserial if binary is true dichotomy.

81. In EDA, pivot_longer in R or melt in Python

a) Reshapes wide to long
b) Aggregates
c) Filters
d) Joins
✅ Correct Answer: a) Reshapes wide to long
📝 Explanation:
Prepares tidy data for analysis.

82. LIME explains

a) Local model interpretations
b) Global correlations
c) Data summaries
d) Outlier reasons
✅ Correct Answer: a) Local model interpretations
📝 Explanation:
Approximates black-box models locally with interpretable ones.

83. For network data EDA, use

a) Graph visualizations like node-link
b) Bar charts
c) Histograms
d) Scatter only
✅ Correct Answer: a) Graph visualizations like node-link
📝 Explanation:
Shows nodes and edges for connectivity patterns.

84. Anderson-Darling test assesses

a) Goodness-of-fit to distribution
b) Equal means
c) Variances
d) Independence
✅ Correct Answer: a) Goodness-of-fit to distribution
📝 Explanation:
Sensitive to tail deviations from normal.

85. Sankey diagram illustrates

a) Flow between categories
b) Proportions
c) Trends
d) Relations
✅ Correct Answer: a) Flow between categories
📝 Explanation:
Width proportional to magnitude in multi-stage processes.

86. EDA for classification includes

a) Class separation via LDA plot
b) Survival curves
c) Hazard functions
d) Time decompositions
✅ Correct Answer: a) Class separation via LDA plot
📝 Explanation:
Linear Discriminant Analysis projects for discriminability.

87. Tetrachoric correlation assumes

a) Binary data from underlying continuous
b) Ordinal only
c) Continuous
d) Categorical nominal
✅ Correct Answer: a) Binary data from underlying continuous
📝 Explanation:
For dichotomous variables implying latent scale.

88. In EDA, dcast or pivot in tools

a) Long to wide reshaping
b) Wide to long
c) Sorting
d) Filtering
✅ Correct Answer: a) Long to wide reshaping
📝 Explanation:
Spreads variables for certain summaries.

89. Counterfactual explanations in EDA show

a) What-if scenarios for predictions
b) Data distributions
c) Correlations
d) Outliers
✅ Correct Answer: a) What-if scenarios for predictions
📝 Explanation:
Alters features to see outcome changes.

90. For audio data EDA, spectrograms visualize

a) Frequency over time
b) Waveforms
c) Amplitudes only
d) Durations
✅ Correct Answer: a) Frequency over time
📝 Explanation:
2D representation of signal spectrum.

91. Kolmogorov-Smirnov test compares

a) Distributions
b) Means
c) Variances
d) Correlations
✅ Correct Answer: a) Distributions
📝 Explanation:
Maximum difference in empirical CDFs.

92. Alluvial plot shows

a) Changes in category proportions over stages
b) Static hierarchies
c) Networks
d) Maps
✅ Correct Answer: a) Changes in category proportions over stages
📝 Explanation:
Like Sankey but for categorical shifts.

93. In EDA for anomaly detection, isolation forest previews

a) Anomaly scores
b) Class probabilities
c) Regressions
d) Clusters
✅ Correct Answer: a) Anomaly scores
📝 Explanation:
Path length in trees indicates isolation.

94. Phi coefficient for

a) 2x2 contingency association
b) Multiple categories
c) Continuous
d) Ordinal
✅ Correct Answer: a) 2x2 contingency association
📝 Explanation:
Pearson's r for binary variables.

95. Gather function in tidyverse

a) Wide to long
b) Long to wide
c) Summarize
d) Mutate
✅ Correct Answer: a) Wide to long
📝 Explanation:
Tidy data principle for analysis.

96. Anchored explanations focus on

a) Comparison to baseline prediction
b) Global averages
c) Local densities
d) Outlier distances
✅ Correct Answer: a) Comparison to baseline prediction
📝 Explanation:
Highlights feature impacts relative to reference.

97. For image data EDA, use

a) Pixel histograms or t-SNE embeddings
b) Bar charts
c) Line plots
d) Scatter only
✅ Correct Answer: a) Pixel histograms or t-SNE embeddings
📝 Explanation:
Visualize color distributions or latent spaces.

98. Shapiro-Wilk test for

a) Normality
b) Equal variances
c) Independence
d) Homoscedasticity
✅ Correct Answer: a) Normality
📝 Explanation:
Powerful for small samples.

99. Chord diagram visualizes

a) Interconnections between categories
b) Flows
c) Hierarchies
d) Trends
✅ Correct Answer: a) Interconnections between categories
📝 Explanation:
Arc segments with linking ribbons.

100. EDA for recommendation systems includes

a) User-item matrix sparsity
b) Survival analysis
c) Time series only
d) Geospatial clustering
✅ Correct Answer: a) User-item matrix sparsity
📝 Explanation:
Percentage of missing interactions.

101. Contingency coefficient for

a) Categorical association beyond 2x2
b) Binary only
c) Continuous
d) Ordinal
✅ Correct Answer: a) Categorical association beyond 2x2
📝 Explanation:
Chi-square based, asymmetric.

102. Spread function in R

a) Long to wide
b) Wide to long
c) Group by
d) Filter
✅ Correct Answer: a) Long to wide
📝 Explanation:
Pivots values into columns.

103. Prototype-based explanations use

a) Nearest neighbors as exemplars
b) Rule sets
c) Trees
d) Linear models
✅ Correct Answer: a) Nearest neighbors as exemplars
📝 Explanation:
Shows similar cases for context.

104. For video data EDA, frame sampling and

a) Optical flow analysis
b) Static images
c) Audio only
d) Text overlays
✅ Correct Answer: a) Optical flow analysis
📝 Explanation:
Motion vectors between frames.

105. Jarque-Bera test combines

a) Skewness and kurtosis for normality
b) Means and variances
c) Correlations
d) Outliers
✅ Correct Answer: a) Skewness and kurtosis for normality
📝 Explanation:
Omnibus test against normal.

106. Parallel sets plot for

a) Multi-category flows like alluvial
b) Single variables
c) Continuous
d) Spatial
✅ Correct Answer: a) Multi-category flows like alluvial
📝 Explanation:
Ribbon bands for proportions.

107. In reinforcement learning EDA, check

a) State-action distributions
b) Class labels
c) Targets
d) Features only
✅ Correct Answer: a) State-action distributions
📝 Explanation:
Exploration coverage in environments.

108. Lambda coefficient for

a) Asymmetric nominal association
b) Symmetric
c) Ordinal
d) Continuous
✅ Correct Answer: a) Asymmetric nominal association
📝 Explanation:
Predictive reduction in error.

109. Unpivot in data tools

a) Wide to long
b) Long to wide
c) Aggregate
d) Join
✅ Correct Answer: a) Wide to long
📝 Explanation:
Standardizes column structure.

110. Subgroup explanations in EDA target

a) Specific data slices
b) Global model
c) Individual points
d) Random samples
✅ Correct Answer: a) Specific data slices
📝 Explanation:
Tailored insights for segments.

111. For graph data EDA beyond basics, centrality measures like

a) Degree, betweenness
b) Means
c) Variances
d) Skews
✅ Correct Answer: a) Degree, betweenness
📝 Explanation:
Node importance in networks.

112. D'Agostino's K-squared test for

a) Normality via skew and kurtosis
b) Equal variances
c) Independence
d) Trends
✅ Correct Answer: a) Normality via skew and kurtosis
📝 Explanation:
Asymptotic chi-square.

113. Mosaic plot for

a) Categorical contingency visualization
b) Continuous densities
c) Time series
d) Spatial
✅ Correct Answer: a) Categorical contingency visualization
📝 Explanation:
Tiled bars for associations.

114. EDA in causal inference previews with

a) DAGs for confounding
b) ROC
c) MSE
d) R2
✅ Correct Answer: a) DAGs for confounding
📝 Explanation:
Directed Acyclic Graphs map relationships.

115. Uncertainty coefficient for

a) Nominal predictive association
b) Ordinal
c) Binary
d) Continuous
✅ Correct Answer: a) Nominal predictive association
📝 Explanation:
Entropy-based, asymmetric.

116. Pivot_table in pandas for

a) Multi-index aggregation
b) Simple sums
c) Plots
d) Encodes
✅ Correct Answer: a) Multi-index aggregation
📝 Explanation:
Flexible crosstabs with functions.

117. Contrastive explanations compare

a) Prediction to counterfactual
b) Features globally
c) Data points
d) Models
✅ Correct Answer: a) Prediction to counterfactual
📝 Explanation:
Minimal changes for outcome flip.

118. For tabular data EDA, automated tools like

a) Pandas Profiling
b) Manual plots only
c) Models first
d) Cleaning only
✅ Correct Answer: a) Pandas Profiling
📝 Explanation:
Generates comprehensive HTML reports.

119. Lilliefors test is a

a) KS variant without specified distribution
b) T-test
c) Chi-square
d) F-test
✅ Correct Answer: a) KS variant without specified distribution
📝 Explanation:
For normality, parameters estimated from data.

120. Dot plot alternative to

a) Bar for small categories
b) Line for time
c) Scatter for relations
d) Histogram for continuous
✅ Correct Answer: a) Bar for small categories
📝 Explanation:
Points on axis for frequencies.

121. In federated learning EDA, focus on

a) Local dataset summaries without sharing
b) Centralized data
c) Full merges
d) Global models only
✅ Correct Answer: a) Local dataset summaries without sharing
📝 Explanation:
Preserves privacy in distributed settings.

122. Goodman-Kruskal gamma for

a) Ordinal association
b) Nominal
c) Binary
d) Continuous
✅ Correct Answer: a) Ordinal association
📝 Explanation:
Accounts for tied pairs in ranks.

123. Reshape in Python for

a) Data format changes
b) Calculations
c) Visuals
d) Storage
✅ Correct Answer: a) Data format changes
📝 Explanation:
Stack, melt, pivot for tidy data.

124. Input gradient explanations use

a) Model derivatives w.r.t. input
b) Averages
c) Neighbors
d) Rules
✅ Correct Answer: a) Model derivatives w.r.t. input
📝 Explanation:
Sensitivity of prediction to features.

125. For sensor data EDA, time-frequency analysis like

a) Wavelet transforms
b) Simple averages
c) Static plots
d) Categorical bars
✅ Correct Answer: a) Wavelet transforms
📝 Explanation:
Localizes events in time and scale.

126. Cramér-von Mises test for

a) Goodness-of-fit
b) Means
c) Variances
d) Correlations
✅ Correct Answer: a) Goodness-of-fit
📝 Explanation:
Integral of squared CDF differences.

127. Streamgraph for

a) Stacked area over time
b) Static proportions
c) Networks
d) Maps
✅ Correct Answer: a) Stacked area over time
📝 Explanation:
Centered for flowing appearance.

128. EDA in A/B testing checks

a) Group balance and power
b) Model fits
c) Clusters
d) Outliers only
✅ Correct Answer: a) Group balance and power
📝 Explanation:
Pre-test for valid comparisons.

129. Kappa coefficient for

a) Inter-rater agreement beyond chance
b) Association strength
c) Prediction error
d) Distribution fit
✅ Correct Answer: a) Inter-rater agreement beyond chance
📝 Explanation:
For categorical ratings.

130. Tidy data principles in EDA ensure

a) One variable per column, observation per row
b) Wide formats
c) Mixed types
d) Duplicates
✅ Correct Answer: a) One variable per column, observation per row
📝 Explanation:
Facilitates manipulation and analysis.

131. Guided backpropagation for

a) Salient feature visualization in images
b) Tabular data
c) Text
d) Audio
✅ Correct Answer: a) Salient feature visualization in images
📝 Explanation:
Modifies gradients for positive relevance.

132. For financial time series EDA, candlestick charts show

a) OHLC prices
b) Volumes only
c) Returns
d) Volatilities
✅ Correct Answer: a) OHLC prices
📝 Explanation:
Open, High, Low, Close for volatility.

133. Anderson-Rubin test in

a) Instrumental variable strength
b) Normality
c) Equal means
d) Variances
✅ Correct Answer: a) Instrumental variable strength
📝 Explanation:
Weak instrument detection.

134. Waffle chart for

a) Proportional parts like pie alternative
b) Flows
c) Hierarchies
d) Relations
✅ Correct Answer: a) Proportional parts like pie alternative
📝 Explanation:
Grid squares for percentages.

135. EDA for NLP includes

a) Token length distributions, vocab size
b) Numerical correlations
c) Image histograms
d) Audio spectra
✅ Correct Answer: a) Token length distributions, vocab size
📝 Explanation:
Text-specific summaries.

136. Yule's Q for

a) 2x2 table association
b) Multiple categories
c) Ordinal
d) Continuous
✅ Correct Answer: a) 2x2 table association
📝 Explanation:
Dichotomous measure from odds ratio.

137. Stack in pandas for

a) MultiIndex to columns
b) Columns to rows
c) Rows to columns
d) Filtering
✅ Correct Answer: b) Columns to rows
📝 Explanation:
Longer format from wide.

138. Layer-wise relevance propagation for

a) Deep net heatmaps
b) Shallow models
c) Trees
d) Rules
✅ Correct Answer: a) Deep net heatmaps
📝 Explanation:
Backpropagates relevance scores.
Previous: 100 Descriptive, Inferential, and Time Series Statistics in Data Analysis - MCQs
Next: 50 Hypothesis Testing in Data Analysis - MCQs
NewDescriptive, Inferential, and Time Series Statistics in Data Analysis - MCQs

100 Descriptive, Inferential, and Time Series Statistics in Data Analysis - MCQs

100 challenging multiple-choice questions on descriptive statistics, inferential methods, and time series analysis. Inspired by real data science and analytics…

By MCQs Generator
New50 Regression Analysis in Data Analysis - MCQs

50 Regression Analysis in Data Analysis MCQs

These 50 MCQs covers fundamental concepts in regression analysis, including linear and multiple regression, assumptions, diagnostics, and interpretation. Ideal for…

By MCQs Generator
NewCorrelation and Covariance

60 Important Correlation and Covariance MCQs

This set of 60 MCQs covers the fundamentals of correlation and covariance, including types like Pearson and Spearman, their calculations,…

By MCQs Generator

Detailed Explanation ×

Loading usage info...

Generating comprehensive explanation...