Hadoop self-test question and reference answer (continuous update in--2015.6.14)

Last Update:2015-06-15 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Single Choice
1. Unlike several other items
A. Mesos
B. Mongodb
C. Corona
D. Borg
E. YARN

Note: Other items are resource unified management system or resource unified dispatching system, and MongoDB is a non-relational database.

2, [Java Foundation] The following is not a thread-safe data structure is
A. HashMap
B. HashTable
C. copyonwritearraylist
D. Concurrenthashmap

3, hadoop2.x use what technology to build source code
A. Ant
B. Ivy
C. Maven
D. Makefile

4. Which company was the first developer of Apache Tez?
A. Cloudera
B. MapR
C. Hortonworks
D. Intel

5. The return type after Distributedfilesystem calls the Create method is
A. Fsdataoutputstream
B. DataOutputStream
C. Dfsoutputstream
D. Fsdatainputstream

6 Which of the following is not how Hadoop handles small files
A. Sequencefile
B. Combinedinputformat
C. Archive
D. MapFile
E. Bytebuffer

7. The tool of data migration between relational database and HDFs is
A. distcp
B. fsck
C. fastcopy
D. Sqoop

8, the role of Secondarynamenode is
A. Monitoring Namenode
B. Management Datanode
c. Merging Fsimage and Editlogs
D. Support Namenode HA

9, [Linux Foundation] represents the host name and IP address mapping of the file is
A./etc/host.conf
B./etc/hostname
C./etc/hosts
D./etc/resolv.conf

10, the following belong to the Oozie role is
A. Job Monitoring
B. Log Collection
C. Workflow scheduling
D. Cluster Management

11. What layer of Hadoop is in the cloud three layer model
A. PaaS
B. SaaS
C. IASs
D. Between IaaS and PAAs

12, the following data structure is the fastest file read in Java
A. Randomaccessfile
B. FileChannel
C. Bufferedinputstream
D. FileInputStream

FileChannel a channel for reading, writing, mapping, and manipulating files. Multiple concurrent threads can safely use file channels.
Randomaccessfile instances of this class support read and write to random-access files. Random access to a file behaves like a large byte array stored in the file system. There is a cursor or index that points to the suppressed array, called a file pointer, and the input operation reads the bytes from the file pointer and moves the file pointer forward as the byte is read.
Bufferedinputstream adds some functionality to another input stream, which is buffered input and the ability to support the mark and reset methods. When you create a bufferedinputstream, an array of internal buffers is created.
FileInputStream gets the input bytes from a file in the file system.

13. The default Namenode Web management port is
A. 50070
B. 8020
C. 50030
D. 22

14, the RPC communication protocol between client side and Namenode is
A. Clientnamenodeprotocol
B. Namenodeprotocl
C. Datanodeprotocol
D. ClientProtocol

15, Fsdataoutputstream realized which interface
A. DataOutputStream
B. Filteroutputstream
C. OutputStream
D. syncable

public class Fsdataoutputstream extends DataOutputStream implements syncable, cansetdropbehind{}

16, about Directbytebuffer and Bytebuffer describes the error is
A. Bytebuffer allocating memory on the heap
B. Directbytebuffer byte access speed than Bytebuffer block
C. Bytebuffer needs to encapsulate the byte array through the Wrap method
D. Directbytebuffer is responsible for garbage collection by the JVM

Bytebuffer needs to encapsulate the byte array through the Wrap method, Bytebuffer allocates memory on the heap, Directbytebuffer bytes accesses faster than Bytebuffer.
Bytebuffer is responsible for garbage collection by the JVM (Directbytebuffer not)

17. There is no distributed file system that provides fuse function
A. Lustre
B. Glusterfs
C. Fastdfs
D. moosefs

mogilefs: key-value-type meta-file system, does not support fuse, the application needs to access its API, mainly in the web to deal with a large number of small images, efficiency is much higher than moosefs.
Fastdfs: People on the basis of mogilefs to improve the Key-value-type file system, also does not support fuse, provides better performance than mogilefs.
moosefs: Support fuse, relatively lightweight, the master server has a single point of Reliance, written in Perl, the performance is relatively poor, domestic use more people.
GlusterFS: Support for fuse, larger than moosefs
Ceph: Support for Fuse, the client has entered the linux-2.6.34 kernel, that is to say, like Ext3/rasierfs, choose Ceph as the file system. Completely distributed, no single point of Reliance, written in C, good performance. Based on the immature btrfs, it is also very immature.
Lustre: Oracle's Enterprise-class products are very large and rely on the kernel and ext3 depth.
NFS: Old network File system.

18. Which of the following class declarations is correct
A. Abstract Final Class a{}
B. Abstract private B () {}
C. protected private C;
D. Public abstract class d{}

19, FileSystem class is a
A. Interface
B. Abstract class
C. General Category
D. Internal classes

Public abstract class Filesystemextends configuredimplements closeable{}

20. Which parameter of the JAVAP command can be used to view the signature of the Java internal type
A.-P
B.-L
C.-S
D.-C

21, the use of Snappy-java, the need for the native library is
A. libhadoop.so
B. libsnappyjava.so
C. libsnappy.so
D. libjavasnappy.so

22. The following compression algorithms support splitable, which can be input as MapReduce
A. Deflate
B. gzip
C. bzip2
D. Snappy

23, disable the local file system check function can set the property is
A. Fs.file.impl
B. Fs.hdfs.impl
C. Fs.local.impl
D. Fs.raw.impl

Set the value of Fs.file.impl to Org.apache.hadoop.fs,rawlocalfilesystem

24, Linux By default, the maximum number of files a process can open
A.
B. 128
C. 512
D. 1024

25, using the TAR Archive command in Linux, which of the following command can list the detailed list
A. TAR-CV
B. Tar-r
C. TAR-CVF
D. tar–t

T list the contents of the archive file and see which files have been backed up.

26, the same byte order as the network byte order is
A. Big Endian
B. Little Endian
C. Biglittle
D. Misc

Different CPUs have different byte-order types these byte-order refers to the order in which integers are stored in memory this is called the host order, the most common of which are two kinds:
1. Little endian: Storing low-order bytes at the start address
2. Big Endian: Stores high-order bytes at the start address
network byte order is a well-defined data representation format in TCP/IP, which is independent of the specific CPU type, operating system and so on, so that the data can be interpreted correctly when transferring between different hosts. The network byte order takes the big endian sort method .

27, Namenode sent to datanode what command can let Datanode to the bad block to delete
A. Dna_transfer
B. dna_finalize
C. Dna_invalidate
D. Dna_recoverblock

28, data node by running what background thread to detect if there is data corruption
A. Dataxceiver
B. Replicationmanager
C. Blockpoolmanager
D. Datablockscanner

29. The following correct statement is
A. New InputStreamReader (New FileReader ("Data"))
B. New InputStreamReader (New BufferedReader ("Data"))
C. New InputStreamReader ("Data")
D. New InputStreamReader (system.in)

30, which of the following set** default is sort * *
A. HashSet
B. TreeSet
C. Abstractset
D. Linkedhashset

31. The following description is correct
A. Comparable under the java.util.* package
B. Comparator used to compare data within a collection
C. Writablecomparable inherited the Writablecomparator
D. The class that implements the comparable interface needs to implement the CompareTo method

32. The following description is correct
A. nullwritable can be used to represent an empty writable object
B. The fix () method is a static member method of Mapfile
C. The value of text can not be modified
D. Writablecomparator inherited the Writablecomparable

This method attempts to fix a corrupt MapFile by re-creating it index.
public static long fix (parameter omitted) {}

33. The following 7 layers of OSI network models are sorted in the correct order
A. Physical layer data Link Layer Transport Layer Network layer Session Layer presentation Layer application layer
B. Physical layer data Link Layer Session Layer Network Layer Transport Layer Presentation layer application layer
C. Physical Layer Data Link Layer Network layer Transport Layer Session Layer presentation Layer application layer
D. Network layer Transport Layer physical Layer Data Link Layer Session Layer presentation Layer application layer

34, the following description of the error is
A. Mapfile key is the writable type
B. If a record compression is used in the Sequencefile, the key is not compressed
C. Sequencefile.writer class Support Append method
D. If block compression is used in Sequencefile, key is also compressed

35. The error of the following statement is
A. Using the writable serialization method does not meet the scalability requirements
B. Byteswritable is immutable in length
C. vintwritable is variable-length
D. Using Protocobuffer to customize message types

36, the design pattern of attention to interface or abstraction and concrete implementation of the separation is
A. Bridge Mode
B. Facade mode
C. Proxy mode
D. Adapter Mode

37. The following description is correct
A. Linkedhashmap.keyset () is sorted by default in descending order
B. Hashtable.keyset () is sorted in ascending order by default
C. Hashmap.keyset () By default is sorted by random order
D. Treemap.keyset () is ordered by default

Treemap.keyset () is sorted by default in ascending order
Linkedhashmap.keyset () is sorted sequentially by default

38, DAO mode generally adopts the following design mode
A. Proxy mode
B. Factory mode
C. Prototype mode
D. Observer mode

-2015.6.8

39, how to determine the number of map of a job
A. Attribute mapred.map.tasks Settings
B. Jobtracker Calculated
C. Number of inputsplit shards
D. Tasktracker by configuring the number of maps

40. The communication protocol between Tasktracker and Jobtracker is
A. Jobsubmissionprotocol
B. ClientProtocol
C. Taskumbilicalprotocol
D. Intertrackerprotocol

In Hadoop, the communication between JT (Jobtracker) and TT (Tasktracker) is done through the heartbeat mechanism. JT implements the INTERTRACKERPROTOCOL protocol, which defines the heartbeat of the communication mechanism between JT and TT. The heartbeat mechanism is actually an RPC request, JT acts as a server, and TT acts as a client,tt via RPC to invoke JT's heartbeat method, sending some state information of the TT itself to JT, while JT returns the instruction to TT via the return value.

41. The default MapReduce input format is
A. Textinputformat
B. Keyvaluetextinputformat
C. Nlineinputformat
D. Sequencefileinputformat

42, the following description of the error is
A. sequencefile can be used as a consolidated storage container for small files
B. The key of Textinputformat is the longwritable type
C. Combinefileinputformat is an abstract class
D. Textinputformat key refers to the line number of the record in the file

43. The following description of the new and old MapReduce API is wrong
A. The new API is placed in the Org.apache.hadoop.mapreduce package, while the old API is placed in the org.apache.hadoop.mapred
B. The new API tends to use an interface approach, whereas the old API tends to use abstract classes
C. The new API uses the configuration, while the old API uses jobconf to pass the config information
D. New APIs can use job objects to submit jobs

44. The size of the ring buffer in map can be determined by which of the following properties
A. io.sort.spill.percent
B. Io.sort.factor
C. IO.SORT.MB
D. mapred.reduce.parallel.copies

Each map has a ring buffer, the default size is 100M, and the size can be modified by the attribute IO.SORT.MB.
Once the memory buffer reaches an overflow threshold (io.sort.spill.percent), a new overflow file is created.
Io.sort.factor controls how many partitions can be combined at a time.

45, the following description of the error is
A. Input Shard Inputsplit is actually a reference to the data
B. Multipleinputs can set multiple data sources and their corresponding input formats
C. You can avoid file fragmentation by overloading the issplitable () method
D. Reducetask needs to wait until all the map outputs are copied before the merge is completed

46. The direct communication protocol between task and Tasktracker is
A. Jobsubmissionprotocol
B. ClientProtocol
C. Taskumbilicalprotocol
D. Intertrackerprotocol

Interdatanodeprotocol:datanode interface for internal interaction to update block metadata;
Innertrackerprotocol:tasktracker and Jobtracker interface, function and Datanodeprotocol are similar;
Jobsubmissionprotocol:jobclient interface with Jobtracker, used to submit job, job and other job-related operations;
Taskumbilicalprotocol:task the interaction between the neutron process and the parent process, the sub-process is the map, reduce and other operations, the parent process is tasktracker, the interface can return the running state of the child process ( lexical literacy: Umbilical umbilical cord, closely related).

-2015.6.11

47. Which of the following components can specify a policy for reduce distribution of key?
A. Recordreader
B. combiner
C. Partitioner
D. Fileinputformat

48. The following description is correct
A. As long as Job.setcombinerclass () is set, the combiner function must be executed
B. Linerecordreader class Internal member variable end variable refers to the location of this input shard
C. The number of maps in the M/R cluster can be set
D. Number of reduce in M/R cluster can be set

49. In the Namenode solution, which of the following scenarios cannot be used to store meta-data information
A. QJM
B. Bookeeper
C. NFS
D. Zookeeper

Bookkeeper is a sub-project of the Apache zookeeper project. It is a system for reliably recording streaming data , primarily for storing the Wal (Write Ahead Log). We know that Hadoop Namenode is used to store metadata for HDSF clusters, where there is a editlog file for writing the data and a fsimage image that exists in memory whenever the client interacts with the HDFs cluster. Changes to the data in the cluster are recorded in the Namenode editlog file, and then the change is synchronized to the fsimage mirror of the memory.
In bookkeeper, the service node (multiple) is called the bookie, and the log stream is called Ledger, and each log unit (such as a record) is called a ledger entry. A set of service nodes bookie the type of primary storage Ledger,ledger is very complex, so it is possible that one bookie node may fail, but as long as there are correctly available nodes in bookie storage for multiple service nodes of our bookkeeper system, The entire system can be serviced normally,bookkeeper metadata is stored in zookeeper (only metadata is stored using zookeeper, and the actual log stream data is stored in bookie).

50, in the M/R system, consider the following situation: HDFs using the default data block size (64M); InputFormat used is fileinputformat; There are now three file sizes of 64k,65m and 127M respectively, so how many map tasks will be generated?
A. 3 x
B. 4 x
C. 5 x
D. 6 x

-2015.6.14

Hadoop self-test question and reference answer (continuous update in--2015.6.14)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hadoop self-test question and reference answer (continuous update in--2015.6.14)

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support