Introduction: I am writing here questions and answers those i got from internet and some i formed from my study materials.
Q 1: What is Hadoop?
Ans: Hadoop is the most popular platform for big data analysis. Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. The Hadoop ecosystem is huge and involves many supporting frameworks and tools to effectively run and manage it. Hadoop is part of the Apache project sponsored by the Apache Software Foundation.
Q 2: What is HDFS?
HDFS was based on a paper Google published about their Google File System.
It runs on top of the existing file systems on each node in a Hadoop cluster.
Q 3: What is MapReduce?
Q 4: What is a JobTracker in Hadoop? How many instances of JobTracker run on a Hadoop Cluster?
Q 5: What is a Task Tracker in Hadoop? How many instances of TaskTracker run on a Hadoop Cluster?
Q 6: What is HIVE?
Q 7: What is PIG?
Pig or Pig Latin is a language.
It helps analyst to concentrate on analytic work by removing map-reduce programming complexity.
PIG is high-level language and it converts its operators into MapReduce code.
Q 8: What is HBase?
Q 9: What is replication factor in HDFS?
Q 10: What is Master-Worker Pattern?
Q 11: In HDFS, Why does system reconstruct block location information every time on start up?
Q 12: What is POSIX (Portable Operating System Interface)?
Q 13: What is data locality optimization?
Q 14: What is the meaning of streaming data access pattern?