MCQs cover the fundamentals of Exploratory Data Analysis, covering data summarization, visualization techniques, handling anomalies, and inferring patterns from datasets. Ideal for data analysts and scientists to reinforce EDA practices using statistical and graphical methods.
130 Exploratory Data Analysis (EDA) MCQs
✅ Correct Answer: b) To understand the data and uncover patterns
📝 Explanation:
EDA aims to summarize main characteristics, often using visual methods, to reveal insights and detect anomalies before modeling.
✅ Correct Answer: b) Mean
📝 Explanation:
The mean is the average value, representing the center of the data distribution.
✅ Correct Answer: b) Distribution of a single numerical variable
📝 Explanation:
Histograms divide continuous data into bins to show frequency distribution.
✅ Correct Answer: b) The middle 50% of the data
📝 Explanation:
IQR = Q3 - Q1, measuring spread of the central half of the data.
✅ Correct Answer: b) Summary functions like isnull()
📝 Explanation:
Functions like df.isnull().sum() count missing entries per column.
✅ Correct Answer: b) Box plot
📝 Explanation:
Box plots display quartiles and whiskers, highlighting points beyond 1.5 * IQR.
✅ Correct Answer: b) -1 to 1
📝 Explanation:
Pearson's r measures linear relationship strength and direction between -1 and 1.
✅ Correct Answer: b) Asymmetry of the distribution
📝 Explanation:
Positive skew indicates right tail, negative left; zero for symmetric.
✅ Correct Answer: b) Relationship between two variables
📝 Explanation:
Scatter plots plot points for two continuous variables to show correlation or trends.
✅ Correct Answer: b) To visualize pairwise relationships in multivariate data
📝 Explanation:
Pair plots create a matrix of scatter plots and histograms for all variable pairs.
✅ Correct Answer: a) Tail heaviness relative to normal distribution
📝 Explanation:
High kurtosis indicates heavy tails and peaked center; low indicates flat.
✅ Correct Answer: b) Bar chart
📝 Explanation:
Bar charts display frequencies or counts for discrete categories.
✅ Correct Answer: b) Dispersion around the mean
📝 Explanation:
SD quantifies data variability; sqrt of variance.
✅ Correct Answer: b) Feature ranges and distributions
📝 Explanation:
Scaling ensures features contribute equally, checked via min-max or z-scores.
✅ Correct Answer: b) Correlation matrices
📝 Explanation:
Heatmaps color-code values in a matrix, ideal for correlations.
✅ Correct Answer: a) Skewness = 0, Kurtosis = 3
📝 Explanation:
Normal distribution is symmetric (skew=0) with kurtosis=3 (mesokurtic).
✅ Correct Answer: b) Box plot and density plot
📝 Explanation:
Violin plots show distribution shape via kernel density and quartiles via box.
✅ Correct Answer: a) VIF (Variance Inflation Factor)
📝 Explanation:
VIF > 5-10 indicates high multicollinearity among features.
✅ Correct Answer: b) Line plot
📝 Explanation:
Line plots connect data points chronologically to reveal patterns.
✅ Correct Answer: b) Occurs most frequently
📝 Explanation:
Mode identifies the most common category or value in unimodal data.
✅ Correct Answer: a) Aggregating data by categories
📝 Explanation:
Pivot tables summarize data with rows, columns, and values for cross-tabulation.
✅ Correct Answer: a) Normality of distribution
📝 Explanation:
Q-Q plots compare quantiles to theoretical normal; straight line indicates normality.
✅ Correct Answer: b) Capping or removing after investigation
📝 Explanation:
Outliers may be errors or insights; decisions based on domain knowledge.
✅ Correct Answer: a) One-hot encoding
📝 Explanation:
One-hot creates binary columns for categories to avoid ordinal assumptions.
✅ Correct Answer: a) SD / Mean * 100%
📝 Explanation:
CV measures relative variability, useful for comparing dispersion across datasets.
✅ Correct Answer: b) Provides statistical visualizations
📝 Explanation:
Seaborn builds on Matplotlib for attractive, informative plots like heatmaps.
✅ Correct Answer: b) Weak or no linear relationship
📝 Explanation:
Correlation near 0 suggests little linear association between variables.
✅ Correct Answer: b) Summary statistics and data quality checks
📝 Explanation:
Profiling overviews structure, types, missingness, and stats.
✅ Correct Answer: b) Smooth probability density
📝 Explanation:
KDE approximates continuous distribution using a kernel function.
✅ Correct Answer: a) Class distribution via bar plots
📝 Explanation:
Visualize target variable frequencies to identify imbalance.
✅ Correct Answer: a) Min, Q1, Median, Q3, Max
📝 Explanation:
Used in box plots to describe data spread without assuming distribution.
✅ Correct Answer: a) Creating new variables from existing
📝 Explanation:
Derive interactions, polynomials, or bins to capture patterns.
✅ Correct Answer: b) Autocorrelation
📝 Explanation:
Plots value against its lagged version to show serial dependence.
✅ Correct Answer: a) (x - mean) / SD
📝 Explanation:
|Z| > 3 often flags outliers assuming normality.
✅ Correct Answer: b) Categorical frequencies
📝 Explanation:
Similar to bar charts but for single categorical variable counts.
✅ Correct Answer: a) PCA scree plot
📝 Explanation:
Shows explained variance by components to decide retention.
✅ Correct Answer: a) Outliers
📝 Explanation:
Unlike mean, median resists extreme values.
✅ Correct Answer: b) Scatter, marginal histograms, and correlation
📝 Explanation:
For bivariate EDA with univariate margins.
✅ Correct Answer: a) Squared deviations from mean
📝 Explanation:
Variance = Σ(x_i - μ)^2 / n, measuring spread.
✅ Correct Answer: a) Word clouds or n-gram frequencies
📝 Explanation:
Visualize common terms and phrases in unstructured text.
✅ Correct Answer: b) Serial correlations in time series
📝 Explanation:
ACF shows correlation of series with its lags.
✅ Correct Answer: b) Numerical, categorical, ordinal, datetime
📝 Explanation:
Understanding types guides appropriate analysis and visualization.
✅ Correct Answer: a) Subplotting by categories
📝 Explanation:
Splits plots into a grid conditioned on variables for comparisons.
✅ Correct Answer: a) Linear relationship and normality
📝 Explanation:
For continuous variables; use Spearman for non-parametric.
✅ Correct Answer: a) 1.5 * IQR from quartiles
📝 Explanation:
Beyond whiskers are potential outliers.
✅ Correct Answer: a) Stability of statistics
📝 Explanation:
Bootstrap or jackknife assesses variability in estimates.
✅ Correct Answer: a) Association between categorical variables
📝 Explanation:
Ranges 0-1; chi-square based for nominal data.
✅ Correct Answer: a) Handle skewed data
📝 Explanation:
Reduces right skew, stabilizing variance.
✅ Correct Answer: a) Points without overlap for categorical
📝 Explanation:
Shows individual data points spread to avoid stacking.
✅ Correct Answer: a) High-dimensional data as lines
📝 Explanation:
Each line represents an observation across normalized axes.
✅ Correct Answer: a) Max - Min
📝 Explanation:
Simplest spread measure, sensitive to outliers.
✅ Correct Answer: a) Choropleth maps
📝 Explanation:
Color regions by data values for spatial patterns.
✅ Correct Answer: a) Independence between categorical variables
📝 Explanation:
Null hypothesis: no association in contingency tables.
✅ Correct Answer: a) Jittered points for categorical
📝 Explanation:
Adds random noise to positions to reveal overplotting.
✅ Correct Answer: d) All of the above
📝 Explanation:
Early CV checks generalization before full modeling.
✅ Correct Answer: a) Sample to theoretical distribution
📝 Explanation:
Deviations from line indicate non-conformity.
✅ Correct Answer: a) Ordinal encoding
📝 Explanation:
Assigns integers preserving order, unlike nominal.
✅ Correct Answer: a) Feature effects in models
📝 Explanation:
Shows marginal effect of a feature on prediction.
✅ Correct Answer: a) Categorical intervals
📝 Explanation:
Reduces noise and reveals patterns in discretized form.
✅ Correct Answer: a) Non-parametric for monotonic relations
📝 Explanation:
Based on ranks, robust to non-normality.
✅ Correct Answer: a) Overlapping density distributions
📝 Explanation:
Stacks shifted KDEs for category comparisons.
✅ Correct Answer: a) Notebooks like Jupyter
📝 Explanation:
Combines code, visuals, and narrative for reproducibility.
✅ Correct Answer: a) Asymmetric association for categorical
📝 Explanation:
Uncertainty coefficient based on entropy.
✅ Correct Answer: a) Trend, seasonality, residual
📝 Explanation:
Additive or multiplicative models isolate components.
✅ Correct Answer: a) Frequency or target encoding
📝 Explanation:
Reduces dimensions while retaining information.
✅ Correct Answer: a) Density levels in 2D
📝 Explanation:
Lines or colors indicate constant value regions.
✅ Correct Answer: a) Confirmatory analysis
📝 Explanation:
Patterns suggest testable hypotheses for further stats.
✅ Correct Answer: a) Multivariate outliers
📝 Explanation:
Accounts for covariance, unlike Euclidean.
✅ Correct Answer: a) Aggregates by categories
📝 Explanation:
Like mean by group in pandas for subgroup analysis.
✅ Correct Answer: a) 2D histogram with hexagons
📝 Explanation:
Efficient for dense scatter data to show counts.
✅ Correct Answer: a) Linearity, homoscedasticity via residuals
📝 Explanation:
Scatter of residuals vs fitted predicts assumptions.
✅ Correct Answer: a) Ordinal variables
📝 Explanation:
Assumes underlying continuous latent variables.
✅ Correct Answer: a) Wide to long format
📝 Explanation:
Facilitates plotting multiple series.
✅ Correct Answer: a) Feature contributions post-model
📝 Explanation:
Explains individual predictions for interpretability.
✅ Correct Answer: d) All of the above
📝 Explanation:
Non-parametric curve for time-to-event analysis.
✅ Correct Answer: a) Hierarchical data
📝 Explanation:
Nested rings show proportions in categories.
✅ Correct Answer: a) Equal variances
📝 Explanation:
Robust to non-normality for ANOVA assumptions.
✅ Correct Answer: a) Proportional areas for categories
📝 Explanation:
Rectangles sized by value in hierarchical layout.
✅ Correct Answer: a) Elbow method on k-means
📝 Explanation:
Plots inertia vs k to suggest optimal clusters.
✅ Correct Answer: a) Continuous and binary variables
📝 Explanation:
Point-biserial if binary is true dichotomy.
✅ Correct Answer: a) Reshapes wide to long
📝 Explanation:
Prepares tidy data for analysis.
✅ Correct Answer: a) Local model interpretations
📝 Explanation:
Approximates black-box models locally with interpretable ones.
✅ Correct Answer: a) Graph visualizations like node-link
📝 Explanation:
Shows nodes and edges for connectivity patterns.
✅ Correct Answer: a) Goodness-of-fit to distribution
📝 Explanation:
Sensitive to tail deviations from normal.
✅ Correct Answer: a) Flow between categories
📝 Explanation:
Width proportional to magnitude in multi-stage processes.
✅ Correct Answer: a) Class separation via LDA plot
📝 Explanation:
Linear Discriminant Analysis projects for discriminability.
✅ Correct Answer: a) Binary data from underlying continuous
📝 Explanation:
For dichotomous variables implying latent scale.
✅ Correct Answer: a) Long to wide reshaping
📝 Explanation:
Spreads variables for certain summaries.
✅ Correct Answer: a) What-if scenarios for predictions
📝 Explanation:
Alters features to see outcome changes.
✅ Correct Answer: a) Frequency over time
📝 Explanation:
2D representation of signal spectrum.
✅ Correct Answer: a) Distributions
📝 Explanation:
Maximum difference in empirical CDFs.
✅ Correct Answer: a) Changes in category proportions over stages
📝 Explanation:
Like Sankey but for categorical shifts.
✅ Correct Answer: a) Anomaly scores
📝 Explanation:
Path length in trees indicates isolation.
✅ Correct Answer: a) 2x2 contingency association
📝 Explanation:
Pearson's r for binary variables.
✅ Correct Answer: a) Wide to long
📝 Explanation:
Tidy data principle for analysis.
✅ Correct Answer: a) Comparison to baseline prediction
📝 Explanation:
Highlights feature impacts relative to reference.
✅ Correct Answer: a) Pixel histograms or t-SNE embeddings
📝 Explanation:
Visualize color distributions or latent spaces.
✅ Correct Answer: a) Normality
📝 Explanation:
Powerful for small samples.
✅ Correct Answer: a) Interconnections between categories
📝 Explanation:
Arc segments with linking ribbons.
✅ Correct Answer: a) User-item matrix sparsity
📝 Explanation:
Percentage of missing interactions.
✅ Correct Answer: a) Categorical association beyond 2x2
📝 Explanation:
Chi-square based, asymmetric.
✅ Correct Answer: a) Long to wide
📝 Explanation:
Pivots values into columns.
✅ Correct Answer: a) Nearest neighbors as exemplars
📝 Explanation:
Shows similar cases for context.
✅ Correct Answer: a) Optical flow analysis
📝 Explanation:
Motion vectors between frames.
✅ Correct Answer: a) Skewness and kurtosis for normality
📝 Explanation:
Omnibus test against normal.
✅ Correct Answer: a) Multi-category flows like alluvial
📝 Explanation:
Ribbon bands for proportions.
✅ Correct Answer: a) State-action distributions
📝 Explanation:
Exploration coverage in environments.
✅ Correct Answer: a) Asymmetric nominal association
📝 Explanation:
Predictive reduction in error.
✅ Correct Answer: a) Wide to long
📝 Explanation:
Standardizes column structure.
✅ Correct Answer: a) Specific data slices
📝 Explanation:
Tailored insights for segments.
✅ Correct Answer: a) Degree, betweenness
📝 Explanation:
Node importance in networks.
✅ Correct Answer: a) Normality via skew and kurtosis
📝 Explanation:
Asymptotic chi-square.
✅ Correct Answer: a) Categorical contingency visualization
📝 Explanation:
Tiled bars for associations.
✅ Correct Answer: a) DAGs for confounding
📝 Explanation:
Directed Acyclic Graphs map relationships.
✅ Correct Answer: a) Nominal predictive association
📝 Explanation:
Entropy-based, asymmetric.
✅ Correct Answer: a) Multi-index aggregation
📝 Explanation:
Flexible crosstabs with functions.
✅ Correct Answer: a) Prediction to counterfactual
📝 Explanation:
Minimal changes for outcome flip.
✅ Correct Answer: a) Pandas Profiling
📝 Explanation:
Generates comprehensive HTML reports.
✅ Correct Answer: a) KS variant without specified distribution
📝 Explanation:
For normality, parameters estimated from data.
✅ Correct Answer: a) Bar for small categories
📝 Explanation:
Points on axis for frequencies.
✅ Correct Answer: a) Local dataset summaries without sharing
📝 Explanation:
Preserves privacy in distributed settings.
✅ Correct Answer: a) Ordinal association
📝 Explanation:
Accounts for tied pairs in ranks.
✅ Correct Answer: a) Data format changes
📝 Explanation:
Stack, melt, pivot for tidy data.
✅ Correct Answer: a) Model derivatives w.r.t. input
📝 Explanation:
Sensitivity of prediction to features.
✅ Correct Answer: a) Wavelet transforms
📝 Explanation:
Localizes events in time and scale.
✅ Correct Answer: a) Goodness-of-fit
📝 Explanation:
Integral of squared CDF differences.
✅ Correct Answer: a) Stacked area over time
📝 Explanation:
Centered for flowing appearance.
✅ Correct Answer: a) Group balance and power
📝 Explanation:
Pre-test for valid comparisons.
✅ Correct Answer: a) Inter-rater agreement beyond chance
📝 Explanation:
For categorical ratings.
✅ Correct Answer: a) One variable per column, observation per row
📝 Explanation:
Facilitates manipulation and analysis.
✅ Correct Answer: a) Salient feature visualization in images
📝 Explanation:
Modifies gradients for positive relevance.
✅ Correct Answer: a) OHLC prices
📝 Explanation:
Open, High, Low, Close for volatility.
✅ Correct Answer: a) Instrumental variable strength
📝 Explanation:
Weak instrument detection.
✅ Correct Answer: a) Proportional parts like pie alternative
📝 Explanation:
Grid squares for percentages.
✅ Correct Answer: a) Token length distributions, vocab size
📝 Explanation:
Text-specific summaries.
✅ Correct Answer: a) 2x2 table association
📝 Explanation:
Dichotomous measure from odds ratio.
✅ Correct Answer: b) Columns to rows
📝 Explanation:
Longer format from wide.
✅ Correct Answer: a) Deep net heatmaps
📝 Explanation:
Backpropagates relevance scores.
Related Posts
New
New
New
100 Descriptive, Inferential, and Time Series Statistics in Data Analysis - MCQs
100 challenging multiple-choice questions on descriptive statistics, inferential methods, and time series analysis. Inspired by real data science and analytics…
November 8, 2025By MCQs Generator
50 Regression Analysis in Data Analysis MCQs
These 50 MCQs covers fundamental concepts in regression analysis, including linear and multiple regression, assumptions, diagnostics, and interpretation. Ideal for…
November 8, 2025By MCQs Generator
60 Important Correlation and Covariance MCQs
This set of 60 MCQs covers the fundamentals of correlation and covariance, including types like Pearson and Spearman, their calculations,…
November 8, 2025By MCQs Generator