sudo apt-get install eclipse
Open eclipse after installation, prompting for an error
An error has occurred. See the log file
/home/pengeorge/.eclipse/org.eclipse.platform_3.7.0_155965261/configuration/1342406790169.log.
Review the error log and then resolve
Open the log file and see the following error
! SESSION 2012-07-16 10:46:29.992-----------------------------------------------
eclipse.buildid=i20110613-1736
Java.version=1.7.0_05
Java.vendor=oracle Corporation
BootLoader Constants:os=linux, Arch=x86, WS=GTK, NL=ZH_CN
Command-Line arguments:-os LINUX-WS gtk-arch x86
! ENTRY Org.eclipse.osgi 4 0 2012-07-16 10:46:31.885
! MESSAGE Application Error
! STACK 1
Java.lang.UnsatisfiedLinkError:Could not load SWT library. Reasons:
No swt-gtk-3740 in Java.library.path
No SWT-GTK in Java.library.path
Can ' t load library:/home/pengeorge/.swt/lib/linux/x86_64/libswt-gtk-3740.so
Can ' t load library:/home/pengeorge/.swt/lib/linux/x86/libswt-gtk.so
How to Solve
Copy the relevant files to ~/.swt/lib/linux/x86 and you can
Cp/usr/lib/jni/libswt-*3740.so ~/.swt/lib/linux/x86_64 and then restart it.
Eclipse under the Usr/lib/eclipse
http://www.blogjava.net/hongjunli/archive/2007/08/15/137054.html troubleshoot viewing. class files
A typical Hadoop workflow generates data files (such as log files) elsewhere, and then copies them into HDFs, which is then processed by MapReduce. Typically, an HDFs file is not read directly. They rely on the MapReduce framework to read. and resolves it to a separate record (key/value pair) unless you specify the import and export of the data. Otherwise, almost no programming is used to read and write HDFs files.
The Hadoop file command can interact with the HDFs file system as well as with the local file system, as well as with the Amazon S3 file system
Hadoop fs-mkdir/user/chuck Create folder Hadoop fs-ls/view Hadoop FS-LSR/view Subfolders
Hadoop fs-put Example.txt. Adding a file to/user/chuck is the equivalent of/user/chuck.
Suppose you put it in a directory that does not exist. Then the system defaults to renaming the file. Instead of creating a new directory
Note that the example.txt here is placed under the root folder user. For example, student users. Can be/home/student/example.txt the local file into HDFs
When you put data into HDFs to perform hadoop processing, the process will output a new set of HDFs files to view the Hadoop fs-cat/user/chuck/pg20417.txt
Read the Hadoop fs-get/user/chuck/pg20417.txt. Read the file into the current Linux directory, where the dots represent the current directory
A pipeline capable of using UNIX in Hadoop Hadoop fs-cat/user/chuck/pg20417.txt | Head view the last 1000 bytes of Hadoop fs-tail/user/chuck/pg20417.txt;
View Files Hadoop fs-text/user/chuck/pg20417.txt
Delete Files Hadoop fs-rm/user/chuck/pg20417.txt
View Hadoop command Help, for example, to understand LS to be able to Hadoop fs-help ls
The Hadoop command line has a getmerge that is used to merge HDFs into the local computer file before merging, and the main class in Hadoop for file operations is located in Org.apache.hadoop.fs
after the input data is divided into different nodes, the data exchange between nodes in the "Shuffle" stage, the only time for communication between nodes is the "shuffle" stage , this communication constraint is very helpful for extensibility
MapReduce provides a way to serialize key-value pairs. So only those classes that are serialized can act as keys or values in this framework. The implementation of the writable interface can be a value, the implementation of the Writablecomparable<t> interface can be keys and values, the key needs to be compared. Some pre-defined classes implement the Writablecomparable interface Ti
The method of implementation is: How to read the data, how to write the data, the comparison of the data
Be able to start the first phase of Mapper, a class to be mapper. Need to inherit mapreducebase base class and implement Mapper interface
The constructor method void Configure (jobconif job) extracts the XML configuration file, or the parameters in the application's main class, to call the function before data processing
destructor method void Close () mapper ends with a method that finishes all work, such as closing a database connection, opening a file, and so on.
Mapper only has a method map that handles a single key-value pair
The reduce function, which iterates through the values associated with the specified key. Generate a (possibly empty) list
There is an extremely important step between mapper and reduce: outputting the results of mapper to different reducer, which is Partitioner's work
Multiple reducer implement parallel computing, the default practice is to hash the keys to determine reducer,hadoop through the state Hashpartitionner to enforce this policy, but sometimes you make mistakes
(Shanghai, Beijing) and (Shanghai, Guangzhou), these two lines can be sent to different reducer routes departing from Hong Kong, if Shanghai as key. It is handled two times if Beijing is the place of departure. As key. is also handled two times. If you take Guangzhou as a possible. is also processed two times, when Beijing and Guangzhou as key of the respective two is redundant
At this time we should be tailor-made to the partitioner, only need to departure to hash, the same departure route to the same reducer
A partitioner needs to implement the Configure function (applying Hadoop jobs on Partitioner) to implement the Getpartition () function (returns an integer between 0 and the reduce task count. Point to key-value pair to send reducer)
The position that the key is placed by Partitioner (which reducer)
HDFs support to combine multiple files into a large file to HDFs processing (high efficiency) after processing to meet the use of MapReduce, one of the principles of mapreduce processing is to cut the input data into chunks, which can be processed in parallel on more than one computer, In Hadoop terms these are referred to as input shards, which should be small enough to achieve granular parallelism. It can't be too small.
Fsdatainputstream extended the Java.io.DataInputStream to support random reads, and MapReduce needed this feature because a machine might be assigned to start processing a shard from the middle of the input file. Assuming there is no random access, you need to read from the beginning to the location of the Shard
HDFs is designed to store data that is fragmented and processed by MapReduce, and HDFs is stored in blocks and distributed across multiple machines, each of which is a shard. Assuming that each shard/block is handled by the machine on which it resides, it takes its own initiative to implement parallelism, and multiple nodes are responsible for the data blocks to achieve reliability. MapReduce is free to select a node that includes a copy of a shard/block of data
The input shard is a logical division, and the HDFS data block is the physical division of the input data. When they are consistent, they are highly efficient. In practice, however, there is never a complete agreement that records may cross the bounds of a block of data, and a compute node that processes a particular shard gets a fragment of the record from a block of data
Hadoop learning; Large datasets are saved as a single file in HDFs; Eclipse error is resolved under Linux installation; view. class file Plug-in