80 Big Data: MapReduce, HDFS, and YARN - MCQs

Q: 1. What does HDFS stand for in the context of Big Data?

See the full post for the detailed answer.

Q: 2. In HDFS, what is the default block size for files?

See the full post for the detailed answer.

Q: 3. Which component in HDFS is responsible for managing the namespace and regulating access to files?

See the full post for the detailed answer.

Q: 4. In HDFS, DataNodes store and retrieve data in response to instructions from which component?

See the full post for the detailed answer.

Q: 5. What is the role of the Secondary NameNode in HDFS?

See the full post for the detailed answer.

Q: 6. HDFS is designed to run on which type of hardware?

See the full post for the detailed answer.

Q: 7. By default, how many replicas of each block are maintained in HDFS?

See the full post for the detailed answer.

Q: 8. What is the purpose of the Rack Awareness feature in HDFS?

See the full post for the detailed answer.

Q: 9. Which HDFS command is used to create a directory?

See the full post for the detailed answer.

Q: 10. In HDFS, what happens if the NameNode fails without a proper checkpoint?

See the full post for the detailed answer.

1. What does HDFS stand for in the context of Big Data?

a) Hadoop Distributed File System

b) High Density File Storage

c) Hierarchical Data File Structure

d) Hadoop Data File System

✅ Correct Answer: a) Hadoop Distributed File System

📝 Explanation:

HDFS is the primary storage system used by Hadoop for storing large datasets across multiple machines in a distributed manner.

2. In HDFS, what is the default block size for files?

a) 64 MB

b) 128 MB

c) 256 MB

d) 512 MB

✅ Correct Answer: b) 128 MB

📝 Explanation:

The default block size in HDFS is 128 MB, which allows for efficient storage and processing of large files by dividing them into manageable chunks.

3. Which component in HDFS is responsible for managing the namespace and regulating access to files?

a) DataNode

b) NameNode

c) Secondary NameNode

d) Backup Node

✅ Correct Answer: b) NameNode

📝 Explanation:

The NameNode maintains the file system namespace and metadata, controlling access to files and directories in HDFS.

4. In HDFS, DataNodes store and retrieve data in response to instructions from which component?

a) NameNode

b) YARN

c) JobTracker

d) TaskTracker

✅ Correct Answer: a) NameNode

📝 Explanation:

DataNodes perform read/write operations on blocks as directed by the NameNode and report their status periodically.

5. What is the role of the Secondary NameNode in HDFS?

a) It acts as a backup for the NameNode

b) It periodically checkpoints the fsimage and edits log files

c) It stores actual data blocks

d) It manages resource allocation

✅ Correct Answer: b) It periodically checkpoints the fsimage and edits log files

📝 Explanation:

The Secondary NameNode merges the fsimage and edit logs to create a new checkpoint, reducing recovery time for the NameNode.

6. HDFS is designed to run on which type of hardware?

a) Low-cost commodity hardware

b) High-end enterprise servers

c) Cloud-only virtual machines

d) Specialized GPU clusters

✅ Correct Answer: a) Low-cost commodity hardware

📝 Explanation:

HDFS is built to tolerate frequent hardware failures and operate on inexpensive, commodity hardware for scalability.

7. By default, how many replicas of each block are maintained in HDFS?

a) 1

b) 2

c) 3

d) 4

✅ Correct Answer: c) 3

📝 Explanation:

HDFS maintains three replicas of each block by default to ensure high availability and fault tolerance.

8. What is the purpose of the Rack Awareness feature in HDFS?

a) To optimize data locality and fault tolerance

b) To encrypt data blocks

c) To compress files automatically

d) To manage user permissions

✅ Correct Answer: a) To optimize data locality and fault tolerance

📝 Explanation:

Rack Awareness places replicas across different racks to minimize network traffic and improve resilience against rack failures.

9. Which HDFS command is used to create a directory?

a) hdfs dfs -mkdir

b) hdfs dfs -ls

c) hdfs dfs -rm

d) hdfs dfs -put

✅ Correct Answer: a) hdfs dfs -mkdir

📝 Explanation:

The 'hdfs dfs -mkdir' command creates directories in the HDFS namespace, similar to the Unix mkdir command.

10. In HDFS, what happens if the NameNode fails without a proper checkpoint?

a) The cluster shuts down permanently

b) DataNodes take over namespace management

c) Manual recovery from edit logs is required, which can be time-consuming

d) YARN automatically restarts it

✅ Correct Answer: c) Manual recovery from edit logs is required, which can be time-consuming

📝 Explanation:

Without checkpoints, the NameNode recovery involves replaying the entire edit log, which can delay cluster availability.

11. HDFS supports which type of streaming access for data?

a) High-throughput, low-latency access

b) Random write access

c) Small file access optimization

d) Real-time querying

✅ Correct Answer: a) High-throughput, low-latency access

📝 Explanation:

HDFS is optimized for batch processing with high-throughput streaming access to large files, not low-latency random access.

12. What is the maximum file size supported by HDFS?

a) 1 TB

b) Unlimited (limited by available storage)

c) 100 GB

d) 10 TB

✅ Correct Answer: b) Unlimited (limited by available storage)

📝 Explanation:

HDFS can theoretically handle petabyte-scale files, constrained only by the total cluster storage capacity.

13. Which permission model does HDFS use for access control?

a) POSIX-like permissions

b) ACL-based only

c) Role-based access control

d) No permissions

✅ Correct Answer: a) POSIX-like permissions

📝 Explanation:

HDFS implements a permission model similar to POSIX, with owner, group, and others categories for read/write/execute.

14. In HDFS, blocks are stored as files on the local file system of which node?

a) NameNode

b) Secondary NameNode

c) DataNode

d) Client Node

✅ Correct Answer: c) DataNode

📝 Explanation:

DataNodes store HDFS blocks as regular files on their local file systems and handle I/O operations for them.

15. What is the fsimage file in HDFS?

a) A persistent checkpoint of the file system metadata

b) A log of data block locations

c) A compressed data file

d) A temporary edit log

✅ Correct Answer: a) A persistent checkpoint of the file system metadata

📝 Explanation:

The fsimage is a serialized representation of the NameNode's in-memory namespace and block data.

16. HDFS Federation allows multiple NameNodes to manage which aspect?

a) Separate namespaces

b) Data block replication

c) Resource allocation

d) Client connections only

✅ Correct Answer: a) Separate namespaces

📝 Explanation:

HDFS Federation enables horizontal scaling by allowing multiple independent NameNodes, each managing its own namespace.

17. What command lists the contents of a directory in HDFS?

a) hdfs dfs -ls

b) hdfs dfs -cat

c) hdfs dfs -get

d) hdfs dfs -touchz

✅ Correct Answer: a) hdfs dfs -ls

📝 Explanation:

'hdfs dfs -ls' displays the list of files and directories in the specified HDFS path.

18. In HDFS, what is the edit log used for?

a) Recording namespace modifications between checkpoints

b) Storing actual data blocks

c) Managing user authentication

d) Compressing files

✅ Correct Answer: a) Recording namespace modifications between checkpoints

📝 Explanation:

The edit log captures every change to the file system metadata since the last fsimage snapshot.

19. HDFS is best suited for which workload?

a) Write-once, read-many-times

b) Frequent random writes

c) Small file transactions

d) Real-time streaming

✅ Correct Answer: a) Write-once, read-many-times

📝 Explanation:

HDFS appends data to files and supports multiple reads, making it ideal for batch processing workloads.

20. What is the purpose of Balancer in HDFS?

a) To distribute data evenly across DataNodes

b) To encrypt data

c) To backup metadata

d) To monitor CPU usage

✅ Correct Answer: a) To distribute data evenly across DataNodes

📝 Explanation:

The HDFS Balancer tool rebalances data blocks to ensure even distribution and prevent hotspots.

21. Which HDFS feature provides fault tolerance through data replication?

a) Block replication

b) Erasure coding

c) Compression

d) Federation

✅ Correct Answer: a) Block replication

📝 Explanation:

Replication ensures multiple copies of data blocks, allowing the system to recover from node failures seamlessly.

22. What is the default port for the NameNode web UI in HDFS?

a) 50070

b) 50075

c) 8020

d) 9000

✅ Correct Answer: a) 50070

📝 Explanation:

The NameNode's web interface runs on port 50070 for monitoring cluster status.

23. In HDFS, can files smaller than the block size be stored?

a) Yes, they occupy the full block space

b) No, they are rejected

c) Yes, without wasting space

d) Only if compressed

✅ Correct Answer: a) Yes, they occupy the full block space

📝 Explanation:

Small files in HDFS still reserve a full block, which can lead to inefficiency in name space usage.

24. What is Erasure Coding in HDFS used for?

a) To reduce storage overhead compared to replication

b) To increase block size

c) To manage metadata

d) To handle small files

✅ Correct Answer: a) To reduce storage overhead compared to replication

📝 Explanation:

Erasure Coding encodes data into parity blocks, providing fault tolerance with less storage than triple replication.

25. Which command copies a file from local to HDFS?

a) hdfs dfs -put

b) hdfs dfs -get

c) hdfs dfs -cp

d) hdfs dfs -mv

✅ Correct Answer: a) hdfs dfs -put

📝 Explanation:

'hdfs dfs -put' uploads files from the local file system to HDFS.

26. HDFS supports symmetric or asymmetric replication policies?

a) Symmetric

b) Asymmetric

c) Both

d) Neither

✅ Correct Answer: c) Both

📝 Explanation:

HDFS allows configuration of both symmetric (same replicas everywhere) and asymmetric (different counts) replication.

27. What is the heartbeat interval for DataNodes to NameNode?

a) 3 seconds

b) 10 seconds

c) 30 seconds

d) 60 seconds

✅ Correct Answer: a) 3 seconds

📝 Explanation:

DataNodes send heartbeats every 3 seconds to indicate liveness and report block reports every 6 seconds.

28. In HDFS, what is a 'block scan'?

a) Periodic verification of block integrity

b) Scanning for free space

c) User file search

d) Metadata dump

✅ Correct Answer: a) Periodic verification of block integrity

📝 Explanation:

Block scanning checks the checksums of stored blocks to detect corruption.

29. Which HDFS configuration parameter sets the replication factor?

a) dfs.replication

b) dfs.blocksize

c) dfs.namenode.port

d) dfs.datanode.handler.count

✅ Correct Answer: a) dfs.replication

📝 Explanation:

The 'dfs.replication' property in hdfs-site.xml defines the default number of replicas for data blocks.

30. HDFS High Availability (HA) uses which components for failover?

a) Active and Standby NameNodes with JournalNodes

b) Multiple DataNodes

c) YARN ResourceManager

d) Zookeeper only

✅ Correct Answer: a) Active and Standby NameNodes with JournalNodes

📝 Explanation:

HA setup includes shared storage via JournalNodes for seamless failover between NameNodes.

31. What is the purpose of the 'hdfs fsck' command?

a) To check file system health and find corrupt blocks

b) To format the NameNode

c) To start DataNodes

d) To compress directories

✅ Correct Answer: a) To check file system health and find corrupt blocks

📝 Explanation:

'hdfs fsck' performs a file system check, reporting under-replicated, missing, or corrupt blocks.

32. In HDFS, data transfer between client and DataNode uses which protocol?

a) TCP/IP

b) HTTP/HTTPS

c) UDP

d) RPC only

✅ Correct Answer: a) TCP/IP

📝 Explanation:

HDFS uses TCP sockets for reliable data streaming between clients and DataNodes.

33. What does MapReduce stand for in Big Data processing?

a) A programming model for parallel processing

b) A file compression technique

c) A storage optimization method

d) A resource scheduling framework

✅ Correct Answer: a) A programming model for parallel processing

📝 Explanation:

MapReduce is a framework that allows distributed processing of large data sets on clusters using Map and Reduce functions.

34. In MapReduce, what is the role of the Map function?

a) To process input data and produce key-value pairs

b) To aggregate intermediate results

c) To shuffle data across nodes

d) To store output in HDFS

✅ Correct Answer: a) To process input data and produce key-value pairs

📝 Explanation:

The Map function takes input data, processes it in parallel, and emits intermediate key-value pairs.

35. What is the output of the Reduce function in MapReduce?

a) Final aggregated results

b) Intermediate key-value pairs

c) Input splits

d) Job configuration files

✅ Correct Answer: a) Final aggregated results

📝 Explanation:

The Reduce function receives grouped key-value pairs and produces the final output for the job.

36. In MapReduce, what is an InputSplit?

a) A logical division of input data for parallel processing

b) A physical block in HDFS

c) A Reduce task output

d) A mapper configuration

✅ Correct Answer: a) A logical division of input data for parallel processing

📝 Explanation:

InputSplits define how input data is divided among Map tasks for distributed execution.

37. Which component in MapReduce v1 is responsible for job scheduling and task management?

a) JobTracker

b) TaskTracker

c) NameNode

d) DataNode

✅ Correct Answer: a) JobTracker

📝 Explanation:

The JobTracker oversees the entire MapReduce job lifecycle, assigning tasks to TaskTrackers.

38. What is the purpose of the Combiner in MapReduce?

a) To perform local aggregation before shuffle and sort

b) To split input data

c) To manage job resources

d) To write final output

✅ Correct Answer: a) To perform local aggregation before shuffle and sort

📝 Explanation:

Combiners reduce the amount of data transferred during the shuffle phase by aggregating locally on mapper nodes.

39. In MapReduce, the shuffle and sort phase occurs between which stages?

a) Map and Reduce

b) Input and Map

c) Reduce and Output

d) Job submission and execution

✅ Correct Answer: a) Map and Reduce

📝 Explanation:

Shuffle and sort groups and sorts the intermediate outputs from Mappers before sending them to Reducers.

40. What is the default InputFormat in MapReduce?

a) TextInputFormat

b) SequenceFileInputFormat

c) KeyValueTextInputFormat

d) DBInputFormat

✅ Correct Answer: a) TextInputFormat

📝 Explanation:

TextInputFormat treats each line of input as a key-value pair, with offset as key and line as value.

41. MapReduce jobs are fault-tolerant due to which mechanism?

a) Task retry and speculative execution

b) Data replication only

c) Manual checkpointing

d) Single point of failure

✅ Correct Answer: a) Task retry and speculative execution

📝 Explanation:

Failed tasks are retried, and speculative execution runs duplicates of slow tasks to ensure timely completion.

42. What is a Partitioner's role in MapReduce?

a) To decide which Reducer receives which key

b) To split input files

c) To combine outputs

d) To format data

✅ Correct Answer: a) To decide which Reducer receives which key

📝 Explanation:

The Partitioner determines the mapping of intermediate keys to Reducers based on a hash function.

43. In MapReduce, what does speculative execution address?

a) Straggler tasks that slow down the job

b) Data locality issues

c) Memory overflows

d) Network bottlenecks

✅ Correct Answer: a) Straggler tasks that slow down the job

📝 Explanation:

Speculative execution launches duplicate tasks for slow-running ones, using the first to complete.

44. Which class is used to define a custom Mapper in MapReduce?

a) Mapper

b) Reducer

c) Combiner

d) Partitioner

✅ Correct Answer: a) Mapper

📝 Explanation:

Developers extend the Mapper class and override the map() method to implement custom logic.

45. What is the purpose of OutputFormat in MapReduce?

a) To control the output of Reduce tasks

b) To read input data

c) To shuffle data

d) To schedule jobs

✅ Correct Answer: a) To control the output of Reduce tasks

📝 Explanation:

OutputFormat defines how and where the final output from Reducers is written, e.g., to HDFS.

46. MapReduce processes data in which manner?

a) In parallel across a cluster

b) Sequentially on a single machine

c) In real-time streams

d) Using SQL queries

✅ Correct Answer: a) In parallel across a cluster

📝 Explanation:

MapReduce enables parallel processing by distributing tasks across multiple nodes in a cluster.

47. What is the Counter in MapReduce used for?

a) To track job progress and custom metrics

b) To count words only

c) To partition data

d) To compress output

✅ Correct Answer: a) To track job progress and custom metrics

📝 Explanation:

Counters collect statistics during job execution, such as bytes processed or custom application metrics.

48. In MapReduce v1, TaskTrackers run on which nodes?

a) Slave nodes

b) Master node only

c) Client machines

d) NameNode

✅ Correct Answer: a) Slave nodes

📝 Explanation:

TaskTrackers execute Map and Reduce tasks on worker (slave) nodes under JobTracker supervision.

49. What is the join operation in MapReduce typically implemented using?

a) Custom Mappers and Reducers

b) Built-in SQL functions

c) HDFS commands

d) YARN applications

✅ Correct Answer: a) Custom Mappers and Reducers

📝 Explanation:

Joins are achieved by emitting join keys in Map and aggregating matching records in Reduce.

50. MapReduce supports user-defined functions in which languages?

a) Java, Python, C++ via Hadoop Streaming

b) Java only

c) SQL only

d) R only

✅ Correct Answer: a) Java, Python, C++ via Hadoop Streaming

📝 Explanation:

Hadoop Streaming allows MapReduce jobs in non-Java languages using standard input/output.

51. What is the purpose of the DistributedCache in MapReduce?

a) To cache small files on all nodes for efficient access

b) To cache large datasets

c) To manage memory

d) To handle failures

✅ Correct Answer: a) To cache small files on all nodes for efficient access

📝 Explanation:

DistributedCache distributes read-only files like lookup tables to all nodes before job start.

52. In MapReduce, data locality refers to?

a) Processing data on the node where it is stored

b) Transferring data to a central server

c) Encrypting data in transit

d) Compressing data blocks

✅ Correct Answer: a) Processing data on the node where it is stored

📝 Explanation:

Data locality minimizes network I/O by scheduling tasks on nodes holding the data.

53. What is the default number of Reduce tasks in a MapReduce job?

a) 1

b) 0

c) Number of mappers

d) Cluster size

✅ Correct Answer: a) 1

📝 Explanation:

By default, MapReduce sets one Reducer unless specified otherwise via job configuration.

54. Which MapReduce phase can run without the Reduce phase?

a) Map-only job

b) Shuffle phase

c) Output commit

d) Job submission

✅ Correct Answer: a) Map-only job

📝 Explanation:

Jobs can be configured with zero reducers to perform only mapping and write intermediate output.

55. What is YARN in the Hadoop ecosystem?

a) Yet Another Resource Negotiator

b) A file storage system

c) A data processing engine

d) A compression library

✅ Correct Answer: a) Yet Another Resource Negotiator

📝 Explanation:

YARN is Hadoop's resource management framework that decouples resource allocation from job execution.

56. In YARN, what is the role of the ResourceManager?

a) Global resource allocation and job scheduling

b) Local task execution

c) Data storage

d) Metadata management

✅ Correct Answer: a) Global resource allocation and job scheduling

📝 Explanation:

The ResourceManager arbitrates resources across the cluster and schedules applications.

57. What are NodeManagers in YARN?

a) Per-node agents that manage containers

b) Global schedulers

c) Input split handlers

d) Block replicators

✅ Correct Answer: a) Per-node agents that manage containers

📝 Explanation:

NodeManagers monitor resources on their host and launch containers as directed by the ResourceManager.

58. In YARN, what is an ApplicationMaster?

a) Per-application manager for negotiating resources and coordinating tasks

b) Cluster-wide resource allocator

c) Node health monitor

d) Job queue manager

✅ Correct Answer: a) Per-application manager for negotiating resources and coordinating tasks

📝 Explanation:

Each application gets its own ApplicationMaster to handle resource requests and task execution.

59. What is a Container in YARN?

a) An abstract unit of allocation including CPU, memory, etc.

b) A physical storage block

c) A MapReduce task

d) A network packet

✅ Correct Answer: a) An abstract unit of allocation including CPU, memory, etc.

📝 Explanation:

Containers represent allocated resources (CPU, memory, disk) for running application components.

60. YARN improves upon MapReduce v1 by?

a) Separating resource management from job-specific logic

b) Increasing block size

c) Adding encryption

d) Reducing replication

✅ Correct Answer: a) Separating resource management from job-specific logic

📝 Explanation:

YARN generalizes the architecture to support multiple processing engines beyond MapReduce.

61. Which scheduler in YARN provides capacity guarantees?

a) Capacity Scheduler

b) FIFO Scheduler

c) Fair Scheduler

d) All of the above

✅ Correct Answer: a) Capacity Scheduler

📝 Explanation:

The Capacity Scheduler allows queues with guaranteed shares and supports hierarchical queues.

62. In YARN, what is the ResourceRequest used for?

a) ApplicationMaster requesting specific resources from ResourceManager

b) Storing data blocks

c) Running Map tasks

d) Compressing logs

✅ Correct Answer: a) ApplicationMaster requesting specific resources from ResourceManager

📝 Explanation:

ResourceRequest specifies locality preferences, priority, and resource requirements for allocation.

63. What is the default scheduler in YARN?

a) Capacity Scheduler

b) Fair Scheduler

c) FIFO Scheduler

d) Round-Robin Scheduler

✅ Correct Answer: a) Capacity Scheduler

📝 Explanation:

YARN defaults to the Capacity Scheduler for multi-tenancy support in production environments.

64. YARN supports which programming models besides MapReduce?

a) Spark, Tez, and others

b) SQL only

c) Graph processing only

d) Stream processing only

✅ Correct Answer: a) Spark, Tez, and others

📝 Explanation:

YARN's generic interface allows diverse frameworks like Spark and Tez to run on Hadoop clusters.

65. In YARN, NodeManager heartbeats to whom?

a) ResourceManager

b) ApplicationMaster

c) NameNode

d) DataNode

✅ Correct Answer: a) ResourceManager

📝 Explanation:

NodeManagers send periodic heartbeats to the ResourceManager to report available resources.

66. What is the purpose of the Timeline Server in YARN?

a) To store and retrieve application history information

b) To schedule jobs

c) To manage containers

d) To replicate data

✅ Correct Answer: a) To store and retrieve application history information

📝 Explanation:

The Timeline Server collects generic application history for monitoring and debugging.

67. YARN Federation allows?

a) Multiple ResourceManagers for scaling

b) Single ResourceManager only

c) Data federation

d) Job federation

✅ Correct Answer: a) Multiple ResourceManagers for scaling

📝 Explanation:

YARN Federation enables sub-clusters with separate ResourceManagers for large-scale deployments.

68. In YARN, what happens if an ApplicationMaster fails?

a) The ResourceManager restarts it

b) The job is killed

c) NodeManager takes over

d) Manual intervention required

✅ Correct Answer: a) The ResourceManager restarts it

📝 Explanation:

YARN supports ApplicationMaster recovery by restarting it with the same state.

69. Which YARN component handles security via tokens?

a) ResourceManager

b) All components

c) NodeManager only

d) ApplicationMaster only

✅ Correct Answer: b) All components

📝 Explanation:

YARN uses container launch tokens and other security tokens for authentication across components.

70. What is Opportunistic Containers in YARN?

a) Low-priority containers that use idle resources

b) Guaranteed high-priority allocations

c) Storage containers

d) Network containers

✅ Correct Answer: a) Low-priority containers that use idle resources

📝 Explanation:

Opportunistic mode allows flexible resource usage for bursty workloads without guarantees.

71. The Fair Scheduler in YARN supports?

a) Hierarchical queues with fair share allocation

b) Capacity-based only

c) FIFO only

d) No queues

✅ Correct Answer: a) Hierarchical queues with fair share allocation

📝 Explanation:

It dynamically allocates resources fairly among queues and jobs within them.

72. In YARN, the 'yarn' command is used for?

a) Submitting applications and managing cluster

b) Reading HDFS files

c) Compiling code

d) Formatting disks

✅ Correct Answer: a) Submitting applications and managing cluster

📝 Explanation:

The 'yarn' CLI submits jobs, lists applications, and kills running ones.

73. What is the default memory allocation for a container in YARN?

a) Configurable, default 1 GB

b) 256 MB

c) 4 GB

d) Unlimited

✅ Correct Answer: a) Configurable, default 1 GB

📝 Explanation:

yarn.scheduler.minimum-allocation-mb defaults to 1024 MB for container memory.

74. YARN's ResourceModel includes which resources?

a) CPU vcores and memory MB

b) Disk space only

c) Network bandwidth

d) GPU only

✅ Correct Answer: a) CPU vcores and memory MB

📝 Explanation:

YARN allocates resources in terms of virtual CPU cores and memory in megabytes.

75. What is the role of the Scheduler in YARN?

a) To allocate resources to applications based on policies

b) To execute tasks

c) To store history

d) To monitor nodes

✅ Correct Answer: a) To allocate resources to applications based on policies

📝 Explanation:

The Scheduler component of ResourceManager decides resource grants to ApplicationMasters.

76. In YARN, Application Timeline Service v2 provides?

a) Flow-level aggregation of application events

b) Real-time execution

c) Data storage

d) Block management

✅ Correct Answer: a) Flow-level aggregation of application events

📝 Explanation:

ATS v2 enables entity-level history storage for better querying and visualization.

77. YARN supports rolling upgrades for?

a) Zero-downtime version updates

b) Data migration

c) Job restarts

d) Queue reconfiguration

✅ Correct Answer: a) Zero-downtime version updates

📝 Explanation:

Rolling upgrades allow updating nodes incrementally without stopping the cluster.

78. What is the default port for YARN ResourceManager web UI?

a) 8088

b) 50070

c) 8020

d) 9870

✅ Correct Answer: a) 8088

📝 Explanation:

The ResourceManager's web interface is accessible on port 8088 for cluster monitoring.

79. In YARN, locality preferences for ResourceRequest include?

a) Node, rack, and any

b) Process only

c) Queue only

d) User only

✅ Correct Answer: a) Node, rack, and any

📝 Explanation:

Applications can request resources with preferences for specific nodes, racks, or anywhere.

80. What is the purpose of Labels in YARN?

a) To tag nodes for access control and scheduling

b) To label data blocks

c) To compress logs

d) To format output

✅ Correct Answer: a) To tag nodes for access control and scheduling

📝 Explanation:

Node labels allow partitioning the cluster logically for different workloads or users.