Hadoop interview Questions

. 1. What is Hadoop?
Hadoop is a distributed computing platform written in Java. It incorporates features similar to those ofthe Google File System and of MapReduce
.2. What platforms and Java versions does Hadoop run on?
Java 1.6.x or higher, preferably from Sun -see HadoopJavaVersions in apache website for more detailsLinux and Windows are the supported operating systems, but BSD, Mac OS/X, and OpenSolaris areknown to work. (Windows requires the installation of Cygwin).
3. How well does Hadoop scale?
Hadoop has been demonstrated on clusters of up to 4000 nodes. Sort performance on 900 nodes is good(sorting 9TB of data on 900 nodes takes around 1.8 hours) and improving using these non-defaultconfiguration values:dfs.block.size = 134217728dfs.namenode.handler.count = 40mapred.reduce.parallel.copies = 20mapred.child.java.opts = -Xmx512mfs.inmemory.size.mb = 200io.sort.factor = 100io.sort.mb = 200io.file.buffer.size = 131072Sort performances on 1400 nodes and 2000 nodes are pretty good too – sorting 14TB of data on a 1400-node cluster takes 2.2 hours; sorting 20TB on a 2000-node cluster takes 2.5 hours. The updates to theabove configuration being:mapred.job.tracker.handler.count = 60mapred.reduce.parallel.copies = 50tasktracker.http.threads = 50mapred.child.java.opts = -Xmx1024m
4. What kind of hardware scales best for Hadoop?
The short answer is dual processor/dual core machines with 4-8GB of RAM using ECC memory,depending upon workflow needs. Machines should be moderately high-end commodity machines to bemost cost-effective and typically cost 1/2 – 2/3 the cost of normal production application servers but arenot desktop-class machines. This cost tends to be $2-5K.
5. I have a new node I want to add to a running Hadoop cluster; how do I start services on justone node?




Hadoop interview Questions

News Reporter

Leave a Reply

Your email address will not be published. Required fields are marked *

Have no product in the cart!