1. IBM and ________ have announced a major initiative to use Hadoop to support university courses in distributed computer programming.
2. Point out the correct statement.
3. What license is Hadoop distributed under?
4. Sun also has the Hadoop Live CD ________ project, which allows running a fully functional Hadoop cluster using a live CD.
5. Which of the following genres does Hadoop produce?
6. What was Hadoop written in?
7. Which of the following platforms does Hadoop run on?
8. Hadoop achieves reliability by replicating the data across multiple hosts and hence does not require ________ storage on hosts.
9. Above the file systems comes the ________ engine, which consists of one Job Tracker, to which client applications submit MapReduce jobs.
10. The Hadoop list includes the HBase database, the Apache Mahout ________ system, and matrix operations.
11. Which of the following is a characteristic of HDFS?
12. Point out the correct statement.
13. Which of the following is a feature of HDFS?
14. Which of the following is a benefit of HDFS?
15. Point out the wrong statement.
16. Which of these is not a feature of HDFS?
17. Which of these is a characteristic of HDFS NameNode?
18. What is the default block size in HDFS?
19. Which command is used to copy files from local file system to HDFS?
20. What is the purpose of the Secondary NameNode in HDFS?
21. Which of these is not a Hadoop file format?
22. Which of these is not a Hadoop file format?
23. Which of these is not a Hadoop file format?
24. What is Hadoop primarily used for?
25. Which core component of Hadoop is responsible for data storage?
26. What type of architecture does Hadoop use to process large data sets?
27. Hadoop can process data that is:
28. Which feature of Hadoop makes it suitable for processing large volumes of data?
29. What mechanism does Hadoop use to ensure data is not lost in case of a node failure?
30. Which programming model is primarily used by Hadoop to process large data sets?
31. Which command is used to view the contents of a directory in HDFS?
32. Which component in Hadoop's architecture is responsible for processing data?
33. What role does the NameNode play in Hadoop Architecture?
34. In Hadoop, what is the function of a DataNode?
35. Which type of file system does Hadoop use?
36. How does the Hadoop framework handle hardware failures?
37. What mechanism allows Hadoop to scale processing capacity?
38. How do you list all nodes in a Hadoop cluster using the command line?
39. Which command can you use to check the health of the Hadoop file system?
40. What is the purpose of the hadoop balancer command?
41. What should you check first if the NameNode is not starting?
42. When a DataNode is reported as down, what is the first action to take?
43. What is a fundamental characteristic of HDFS?
44. Which of these is a feature of MapReduce?
45. Which of these is a key component of MapReduce?
46. Which of the following is the primary function of the Map phase in MapReduce?
47. Which of these is NOT a phase in MapReduce?
48. Which of the following best describes the purpose of the Reduce phase?
49. Which of these classes is used to write the output of a MapReduce job?
50. Which of these classes is used to read the input for a MapReduce job?
51. Which of these is a generic API for MapReduce in Hadoop?
52. Which of these classes is used to specify the mapper class in a MapReduce job?
53. Which of these classes is used to specify the reducer class in a MapReduce job?
54. Which of these classes is used to specify the input format class in a MapReduce job?
55. Which of these classes is used to specify the output format class in a MapReduce job?
56. Which of these methods is used to set the number of reduce tasks in a MapReduce job?
57. Which of these methods is used to set the number of map tasks in a MapReduce job?
58. What action should you take if you notice that the HDFS capacity is unexpectedly decreasing?
59. Which operation is NOT a typical function of the Reduce phase in MapReduce?
60. How does the MapReduce framework typically divide the processing of data?
61. What is the role of the Combiner function in a MapReduce job?
62. In which scenario would you configure multiple reducers in a MapReduce job?
63. What determines the number of mappers to be run in a MapReduce job?
64. What happens if a mapper fails during the execution of a MapReduce job?
65. Which MapReduce method is called once at the end of the task?
66. How do you specify the number of reduce tasks for a Hadoop job?
67. What is the purpose of the Partitioner class in MapReduce?
68. What does the WritableComparable interface in Hadoop define?
69. What common issue should be checked first when a MapReduce job is running slower than expected?
70. What is an effective way to resolve data skew during the reduce phase of a MapReduce job?
71. What is the primary function of the Resource Manager in YARN?
72. How does YARN improve the scalability of Hadoop?
73. What role does the NodeManager play in a YARN cluster?
74. Which YARN component is responsible for monitoring the health of the cluster nodes?
75. In YARN, what does the ApplicationMaster do?
76. How does YARN handle the failure of an ApplicationMaster?
77. Which command is used to list all running applications in YARN?
78. How can you kill an application in YARN using the command line?
79. What command would you use to check the logs for a specific YARN application?
80. What should be your first step if a YARN application fails to start?
81. If you notice that applications in YARN are frequently being killed due to insufficient memory, what should you adjust?
82. What is Hive primarily used for in the Hadoop ecosystem?
83. Which tool in the Hadoop ecosystem is best suited for real-time data processing?
84. How does Pig differ from SQL in terms of data processing?
85. What is the primary function of Apache Flume?
86. In the Hadoop ecosystem, what is the role of Oozie?
87. How does HBase provide fast access to large datasets?
88. Which command in HBase is used to scan all records from a specific table?
89. How do you create a new table in Hive?
90. What is the primary command to view the status of a job in Oozie?
91. What functionality does the sqoop merge command provide?
92. What should you verify first if a Sqoop import fails?
93. If a Hive query runs significantly slower than expected, what should be checked first?
94. What is Hive mainly used for in the Hadoop ecosystem?
95. How does Hive handle data storage?
96. What type of data models does Hive support?
97. Which Hive component is responsible for converting SQL queries into MapReduce jobs?
98. How does partitioning in Hive improve query performance?
99. What is the correct HiveQL command to list all tables in the database?
100. How do you add a new column to an existing Hive table?
101. In Hive, which command would you use to change the data type of a column in a table?
102. How can you optimize a Hive query to limit the number of MapReduce jobs it generates?
103. What is a common fix if a Hive query returns incorrect results?
104. What should you check if a Hive job is running longer than expected without errors?
105. What is Pig primarily used for in the Hadoop ecosystem?
106. What makes Pig different from traditional SQL in processing data?
107. In Pig, what is the difference between 'STORE' and 'DUMP'?
108. How does Pig handle schema-less data?
109. How can Pig scripts be optimized to handle large datasets more efficiently?
110. What Pig command is used to load data from a file?
111. How do you group data by a specific column in Pig?
112. What Pig function aggregates data to find the total?
113. How do you filter rows in Pig that match a specific condition?
114. What is the first thing you should check if a Pig script fails due to an out-of-memory error?
115. If a Pig script is unexpectedly slow, what should be checked first to improve performance?
116. What is the primary storage model used by HBase?
117. How does HBase handle scalability?
118. Which of the following is true about Hadoop's design?
119. What is the default replication factor in HDFS?
120. In MapReduce, what is the purpose of the shuffle phase?
121. Which Hadoop ecosystem tool is used for data serialization?
122. What is YARN?
123. In Hive, what is a SerDe?
124. What is the main goal of Hadoop's data locality?
125. Which file format in Hadoop is optimized for OLAP workloads?
126. What is the role of Zookeeper in Hadoop?
127. In Pig, what does the FOREACH operator do?
128. What is a RegionServer in HBase?
129. Which Sqoop option is used for incremental imports?
130. What is the default port for the NameNode web UI?
131. In MapReduce, what is speculation?
132. What is the purpose of the /tmp directory in HDFS?
133. Which tool is used for monitoring Hadoop clusters?
134. What is a Bloom filter in HBase?
135. In Hive, what is bucketing?
136. What is the maximum number of characters in a Hadoop block name?
137. Which is not a valid Hadoop daemon?
138. What does DFS stand for in HDFS?
139. In YARN, what is a container?
140. What is the purpose of the InputFormat in MapReduce?
141. Which compression codec is splittable in Hadoop?
142. What is the default sort order in Hadoop?
143. In HBase, what is a column family?
144. What is Tez in Hadoop?
145. Which is a NoSQL database in Hadoop ecosystem?
146. What is the command to start the Hadoop DFS daemon?
147. What is rack awareness in Hadoop?
148. Which language is used to write Hive queries?
149. What is the purpose of the fair scheduler in Hadoop?
150. In Pig, what is a bag?
151. What is the maximum number of map tasks per job in Hadoop?
152. Which is used for machine learning in Hadoop?
153. What is the block report interval in HDFS?
154. In MapReduce, what is a counter?
155. What is the default input format in MapReduce?
156. Which tool is used for log aggregation in Hadoop?
157. What is a split in MapReduce?
158. In HBase, what is the master node called?
159. What is the purpose of the --direct option in Sqoop?
160. Which is a graph processing framework in Hadoop?
161. What is the heartbeat interval for DataNodes?
162. In Hive, what is dynamic partitioning?
163. What is the role of the OutputCommitter in MapReduce?


