1. What does the 'V' in Big Data's 3Vs stand for primarily?
2. Which architecture pattern processes both batch and real-time data streams?
3. In Hadoop, what is the primary role of the NameNode?
4. What is the default block size in HDFS?
5. Which component in Hadoop 2.x manages cluster resources and job scheduling?
6. In MapReduce, what does the Map phase do?
7. What is the purpose of the Reduce phase in MapReduce?
8. Which file format is optimized for Hive and supports schema evolution?
9. What is a Data Lake in Big Data architecture?
10. In Spark, what is the role of the Driver Program?
11. Which Spark component manages data sharing and caching?
12. What is the main advantage of using Apache Kafka in Big Data pipelines?
13. In NoSQL databases for Big Data, which type is best for hierarchical data?
14. What does CAP theorem state in distributed Big Data systems?
15. Which tool is used for ETL processes in Hadoop ecosystem?
16. What is the fault tolerance mechanism in HDFS?
17. In Lambda Architecture, what is the 'batch layer' responsible for?
18. Which protocol does HDFS use for data transfer?
19. What is Spark's Directed Acyclic Graph (DAG) used for?
20. In Big Data, what is a 'sharded' architecture?
21. Which Apache project provides SQL-like querying on Hadoop?
22. What is the role of ZooKeeper in Big Data architectures?
23. In Kappa Architecture, what replaces the batch layer?
24. What is HBase in the Hadoop ecosystem?
25. Which feature of Spark enables in-memory computation?
26. What is the purpose of Apache Flume?
27. In Big Data, what is 'schema-on-read'?
28. Which YARN component negotiates resources from the ResourceManager?
29. What is Apache Tez?
30. In Spark Streaming, what is a DStream?
31. What is the default replication factor in HDFS?
32. Which tool is used for transferring bulk data between Hadoop and relational databases?
33. What is a 'hot spot' in Big Data partitioning?
34. In Cassandra, what ensures data consistency?
35. What is the serving layer in Lambda Architecture?
36. Which Spark library is for structured data processing?
37. What is Apache Mahout used for in Big Data?
38. In HDFS, what is a 'rack'?
39. What is the primary storage in a Data Warehouse?
40. Which protocol is used for secure communication in Hadoop?
41. What is Apache Storm used for?
42. In MapReduce, what handles intermediate data spill to disk?
43. What is a 'partition' in Spark?
44. Which Big Data tool is for workflow scheduling?
45. What is eventual consistency in Big Data systems?
46. In Parquet format, what is columnar storage beneficial for?
47. What is the ResourceManager's role in YARN?
48. Which architecture uses micro-batches for streaming?
49. What is Apache Flink's key feature?
50. In HBase, what is a 'RegionServer'?
51. What is data partitioning strategy in Big Data for load balancing?
52. Which tool visualizes Big Data workflows?
53. What is the 'speed layer' in Lambda Architecture?
54. In Spark, what is lazy evaluation?
55. What is Apache NiFi for?
56. Which consistency model does DynamoDB use?
57. What is a 'checkpoint' in stream processing?
58. In Hadoop, what is 'federation'?
59. What is ORC file format optimized for?
60. Which component in YARN monitors node health?
61. What is 'backpressure' in stream processing architectures?
62. In GraphX, what is a Property Graph?
63. What is Apache Phoenix?
64. In Big Data, what is 'data lineage'?
65. Which scalability type adds more nodes?
66. What is Apache Drill for?
67. In Kafka, what is a 'topic'?
68. What is 'idempotency' in Big Data processing?
69. Which tool manages Hadoop cluster deployment?


