MCQs cover the fundamentals of Exploratory Data Analysis, covering data summarization, visualization techniques, handling anomalies, and inferring patterns from datasets. Ideal for data analysts and scientists to reinforce EDA practices using statistical and graphical methods.
130 Exploratory Data Analysis (EDA) MCQs
1 min read
Correct Answer: b) To understand the data and uncover patterns
Explanation:
EDA aims to summarize main characteristics, often using visual methods, to reveal insights and detect anomalies before modeling.
Correct Answer: b) Mean
Explanation:
The mean is the average value, representing the center of the data distribution.
Correct Answer: b) Distribution of a single numerical variable
Explanation:
Histograms divide continuous data into bins to show frequency distribution.
Correct Answer: b) The middle 50% of the data
Explanation:
IQR = Q3 - Q1, measuring spread of the central half of the data.
Correct Answer: b) Summary functions like isnull()
Explanation:
Functions like df.isnull().sum() count missing entries per column.
Correct Answer: b) Box plot
Explanation:
Box plots display quartiles and whiskers, highlighting points beyond 1.5 * IQR.
Correct Answer: b) -1 to 1
Explanation:
Pearson's r measures linear relationship strength and direction between -1 and 1.
Correct Answer: b) Asymmetry of the distribution
Explanation:
Positive skew indicates right tail, negative left; zero for symmetric.
Correct Answer: b) Relationship between two variables
Explanation:
Scatter plots plot points for two continuous variables to show correlation or trends.
Correct Answer: b) To visualize pairwise relationships in multivariate data
Explanation:
Pair plots create a matrix of scatter plots and histograms for all variable pairs.
Correct Answer: a) Tail heaviness relative to normal distribution
Explanation:
High kurtosis indicates heavy tails and peaked center; low indicates flat.
Correct Answer: b) Bar chart
Explanation:
Bar charts display frequencies or counts for discrete categories.
Correct Answer: b) Dispersion around the mean
Explanation:
SD quantifies data variability; sqrt of variance.
Correct Answer: b) Feature ranges and distributions
Explanation:
Scaling ensures features contribute equally, checked via min-max or z-scores.
Correct Answer: b) Correlation matrices
Explanation:
Heatmaps color-code values in a matrix, ideal for correlations.
Correct Answer: a) Skewness = 0, Kurtosis = 3
Explanation:
Normal distribution is symmetric (skew=0) with kurtosis=3 (mesokurtic).
Correct Answer: b) Box plot and density plot
Explanation:
Violin plots show distribution shape via kernel density and quartiles via box.
Correct Answer: a) VIF (Variance Inflation Factor)
Explanation:
VIF > 5-10 indicates high multicollinearity among features.
Correct Answer: b) Line plot
Explanation:
Line plots connect data points chronologically to reveal patterns.
Correct Answer: b) Occurs most frequently
Explanation:
Mode identifies the most common category or value in unimodal data.
Correct Answer: a) Aggregating data by categories
Explanation:
Pivot tables summarize data with rows, columns, and values for cross-tabulation.
Correct Answer: a) Normality of distribution
Explanation:
Q-Q plots compare quantiles to theoretical normal; straight line indicates normality.
Correct Answer: b) Capping or removing after investigation
Explanation:
Outliers may be errors or insights; decisions based on domain knowledge.
Correct Answer: a) One-hot encoding
Explanation:
One-hot creates binary columns for categories to avoid ordinal assumptions.
Correct Answer: a) SD / Mean * 100%
Explanation:
CV measures relative variability, useful for comparing dispersion across datasets.
Correct Answer: b) Provides statistical visualizations
Explanation:
Seaborn builds on Matplotlib for attractive, informative plots like heatmaps.
Correct Answer: b) Weak or no linear relationship
Explanation:
Correlation near 0 suggests little linear association between variables.
Correct Answer: b) Summary statistics and data quality checks
Explanation:
Profiling overviews structure, types, missingness, and stats.
Correct Answer: b) Smooth probability density
Explanation:
KDE approximates continuous distribution using a kernel function.
Correct Answer: a) Class distribution via bar plots
Explanation:
Visualize target variable frequencies to identify imbalance.
Correct Answer: a) Min, Q1, Median, Q3, Max
Explanation:
Used in box plots to describe data spread without assuming distribution.
Correct Answer: a) Creating new variables from existing
Explanation:
Derive interactions, polynomials, or bins to capture patterns.
Correct Answer: b) Autocorrelation
Explanation:
Plots value against its lagged version to show serial dependence.
Correct Answer: a) (x - mean) / SD
Explanation:
|Z| > 3 often flags outliers assuming normality.
Correct Answer: b) Categorical frequencies
Explanation:
Similar to bar charts but for single categorical variable counts.
Correct Answer: a) PCA scree plot
Explanation:
Shows explained variance by components to decide retention.
Correct Answer: a) Outliers
Explanation:
Unlike mean, median resists extreme values.
Correct Answer: b) Scatter, marginal histograms, and correlation
Explanation:
For bivariate EDA with univariate margins.
Correct Answer: a) Squared deviations from mean
Explanation:
Variance = Σ(x_i - μ)^2 / n, measuring spread.
Correct Answer: a) Word clouds or n-gram frequencies
Explanation:
Visualize common terms and phrases in unstructured text.
Correct Answer: b) Serial correlations in time series
Explanation:
ACF shows correlation of series with its lags.
Correct Answer: b) Numerical, categorical, ordinal, datetime
Explanation:
Understanding types guides appropriate analysis and visualization.
Correct Answer: a) Subplotting by categories
Explanation:
Splits plots into a grid conditioned on variables for comparisons.
Correct Answer: a) Linear relationship and normality
Explanation:
For continuous variables; use Spearman for non-parametric.
Correct Answer: a) 1.5 * IQR from quartiles
Explanation:
Beyond whiskers are potential outliers.
Correct Answer: a) Stability of statistics
Explanation:
Bootstrap or jackknife assesses variability in estimates.
Correct Answer: a) Association between categorical variables
Explanation:
Ranges 0-1; chi-square based for nominal data.
Correct Answer: a) Handle skewed data
Explanation:
Reduces right skew, stabilizing variance.
Correct Answer: a) Points without overlap for categorical
Explanation:
Shows individual data points spread to avoid stacking.
Correct Answer: a) High-dimensional data as lines
Explanation:
Each line represents an observation across normalized axes.
Correct Answer: a) Max - Min
Explanation:
Simplest spread measure, sensitive to outliers.
Correct Answer: a) Choropleth maps
Explanation:
Color regions by data values for spatial patterns.
Correct Answer: a) Independence between categorical variables
Explanation:
Null hypothesis: no association in contingency tables.
Correct Answer: a) Jittered points for categorical
Explanation:
Adds random noise to positions to reveal overplotting.
Correct Answer: d) All of the above
Explanation:
Early CV checks generalization before full modeling.
Correct Answer: a) Sample to theoretical distribution
Explanation:
Deviations from line indicate non-conformity.
Correct Answer: a) Ordinal encoding
Explanation:
Assigns integers preserving order, unlike nominal.
Correct Answer: a) Feature effects in models
Explanation:
Shows marginal effect of a feature on prediction.
Correct Answer: a) Categorical intervals
Explanation:
Reduces noise and reveals patterns in discretized form.
Correct Answer: a) Non-parametric for monotonic relations
Explanation:
Based on ranks, robust to non-normality.
Correct Answer: a) Overlapping density distributions
Explanation:
Stacks shifted KDEs for category comparisons.
Correct Answer: a) Notebooks like Jupyter
Explanation:
Combines code, visuals, and narrative for reproducibility.
Correct Answer: a) Asymmetric association for categorical
Explanation:
Uncertainty coefficient based on entropy.
Correct Answer: a) Trend, seasonality, residual
Explanation:
Additive or multiplicative models isolate components.
Correct Answer: a) Frequency or target encoding
Explanation:
Reduces dimensions while retaining information.
Correct Answer: a) Density levels in 2D
Explanation:
Lines or colors indicate constant value regions.
Correct Answer: a) Confirmatory analysis
Explanation:
Patterns suggest testable hypotheses for further stats.
Correct Answer: a) Multivariate outliers
Explanation:
Accounts for covariance, unlike Euclidean.
Correct Answer: a) Aggregates by categories
Explanation:
Like mean by group in pandas for subgroup analysis.
Correct Answer: a) 2D histogram with hexagons
Explanation:
Efficient for dense scatter data to show counts.
Correct Answer: a) Linearity, homoscedasticity via residuals
Explanation:
Scatter of residuals vs fitted predicts assumptions.
Correct Answer: a) Ordinal variables
Explanation:
Assumes underlying continuous latent variables.
Correct Answer: a) Wide to long format
Explanation:
Facilitates plotting multiple series.
Correct Answer: a) Feature contributions post-model
Explanation:
Explains individual predictions for interpretability.
Correct Answer: d) All of the above
Explanation:
Non-parametric curve for time-to-event analysis.
Correct Answer: a) Hierarchical data
Explanation:
Nested rings show proportions in categories.
Correct Answer: a) Equal variances
Explanation:
Robust to non-normality for ANOVA assumptions.
Correct Answer: a) Proportional areas for categories
Explanation:
Rectangles sized by value in hierarchical layout.
Correct Answer: a) Elbow method on k-means
Explanation:
Plots inertia vs k to suggest optimal clusters.
Correct Answer: a) Continuous and binary variables
Explanation:
Point-biserial if binary is true dichotomy.
Correct Answer: a) Reshapes wide to long
Explanation:
Prepares tidy data for analysis.
Correct Answer: a) Local model interpretations
Explanation:
Approximates black-box models locally with interpretable ones.
Correct Answer: a) Graph visualizations like node-link
Explanation:
Shows nodes and edges for connectivity patterns.
Correct Answer: a) Goodness-of-fit to distribution
Explanation:
Sensitive to tail deviations from normal.
Correct Answer: a) Flow between categories
Explanation:
Width proportional to magnitude in multi-stage processes.
Correct Answer: a) Class separation via LDA plot
Explanation:
Linear Discriminant Analysis projects for discriminability.
Correct Answer: a) Binary data from underlying continuous
Explanation:
For dichotomous variables implying latent scale.
Correct Answer: a) Long to wide reshaping
Explanation:
Spreads variables for certain summaries.
Correct Answer: a) What-if scenarios for predictions
Explanation:
Alters features to see outcome changes.
Correct Answer: a) Frequency over time
Explanation:
2D representation of signal spectrum.
Correct Answer: a) Distributions
Explanation:
Maximum difference in empirical CDFs.
Correct Answer: a) Changes in category proportions over stages
Explanation:
Like Sankey but for categorical shifts.
Correct Answer: a) Anomaly scores
Explanation:
Path length in trees indicates isolation.
Correct Answer: a) 2x2 contingency association
Explanation:
Pearson's r for binary variables.
Correct Answer: a) Wide to long
Explanation:
Tidy data principle for analysis.
Correct Answer: a) Comparison to baseline prediction
Explanation:
Highlights feature impacts relative to reference.
Correct Answer: a) Pixel histograms or t-SNE embeddings
Explanation:
Visualize color distributions or latent spaces.
Correct Answer: a) Normality
Explanation:
Powerful for small samples.
Correct Answer: a) Interconnections between categories
Explanation:
Arc segments with linking ribbons.
Correct Answer: a) User-item matrix sparsity
Explanation:
Percentage of missing interactions.
Correct Answer: a) Categorical association beyond 2x2
Explanation:
Chi-square based, asymmetric.
Correct Answer: a) Long to wide
Explanation:
Pivots values into columns.
Correct Answer: a) Nearest neighbors as exemplars
Explanation:
Shows similar cases for context.
Correct Answer: a) Optical flow analysis
Explanation:
Motion vectors between frames.
Correct Answer: a) Skewness and kurtosis for normality
Explanation:
Omnibus test against normal.
Correct Answer: a) Multi-category flows like alluvial
Explanation:
Ribbon bands for proportions.
Correct Answer: a) State-action distributions
Explanation:
Exploration coverage in environments.
Correct Answer: a) Asymmetric nominal association
Explanation:
Predictive reduction in error.
Correct Answer: a) Wide to long
Explanation:
Standardizes column structure.
Correct Answer: a) Specific data slices
Explanation:
Tailored insights for segments.
Correct Answer: a) Degree, betweenness
Explanation:
Node importance in networks.
Correct Answer: a) Normality via skew and kurtosis
Explanation:
Asymptotic chi-square.
Correct Answer: a) Categorical contingency visualization
Explanation:
Tiled bars for associations.
Correct Answer: a) DAGs for confounding
Explanation:
Directed Acyclic Graphs map relationships.
Correct Answer: a) Nominal predictive association
Explanation:
Entropy-based, asymmetric.
Correct Answer: a) Multi-index aggregation
Explanation:
Flexible crosstabs with functions.
Correct Answer: a) Prediction to counterfactual
Explanation:
Minimal changes for outcome flip.
Correct Answer: a) Pandas Profiling
Explanation:
Generates comprehensive HTML reports.
Correct Answer: a) KS variant without specified distribution
Explanation:
For normality, parameters estimated from data.
Correct Answer: a) Bar for small categories
Explanation:
Points on axis for frequencies.
Correct Answer: a) Local dataset summaries without sharing
Explanation:
Preserves privacy in distributed settings.
Correct Answer: a) Ordinal association
Explanation:
Accounts for tied pairs in ranks.
Correct Answer: a) Data format changes
Explanation:
Stack, melt, pivot for tidy data.
Correct Answer: a) Model derivatives w.r.t. input
Explanation:
Sensitivity of prediction to features.
Correct Answer: a) Wavelet transforms
Explanation:
Localizes events in time and scale.
Correct Answer: a) Goodness-of-fit
Explanation:
Integral of squared CDF differences.
Correct Answer: a) Stacked area over time
Explanation:
Centered for flowing appearance.
Correct Answer: a) Group balance and power
Explanation:
Pre-test for valid comparisons.
Correct Answer: a) Inter-rater agreement beyond chance
Explanation:
For categorical ratings.
Correct Answer: a) One variable per column, observation per row
Explanation:
Facilitates manipulation and analysis.
Correct Answer: a) Salient feature visualization in images
Explanation:
Modifies gradients for positive relevance.
Correct Answer: a) OHLC prices
Explanation:
Open, High, Low, Close for volatility.
Correct Answer: a) Instrumental variable strength
Explanation:
Weak instrument detection.
Correct Answer: a) Proportional parts like pie alternative
Explanation:
Grid squares for percentages.
Correct Answer: a) Token length distributions, vocab size
Explanation:
Text-specific summaries.
Correct Answer: a) 2x2 table association
Explanation:
Dichotomous measure from odds ratio.
Correct Answer: b) Columns to rows
Explanation:
Longer format from wide.
Correct Answer: a) Deep net heatmaps
Explanation:
Backpropagates relevance scores.
Related Posts

120 Data Cleaning and Preprocessing in Data Analysis - MCQs
120 industry-level multiple-choice questions on data cleaning, handling missing values, outliers, encoding, scaling, and preprocessing pipelines—modeled after real data scientist…
November 8, 2025By MCQs Generator

50 Regression Analysis in Data Analysis MCQs
These 50 MCQs covers fundamental concepts in regression analysis, including linear and multiple regression, assumptions, diagnostics, and interpretation. Ideal for…
November 8, 2025By MCQs Generator

50 Hypothesis Testing in Data Analysis - MCQs
This set of 50 MCQs explores key concepts in hypothesis testing, including null and alternative hypotheses, p-values, test statistics, error…
November 8, 2025By MCQs Generator