Put the command first:
Hadoop jar/usr/hadoop-1.2. 1/contrib/streaming/hadoop-streaming-1.2. 1. Jar-mapper mapper.py- file mapper.py-reduce reduce.py- file reduce.py-file Params.txt-file params2.txt-input/data/*-output/output
Where output does not exist can only be.
The output of the mapper.py is passed directly to reduce.py
For example, in a Hadoop cluster, there are several files in the/data/directory:
[Email protected] program]# Hadoop FS-ls/data/FoundTenItems-rw-r--r--3Root supergroup35596 -- .- - -: $/data/cars-xx-rw-r--r--3Root supergroup35592 -- .- - -: $/data/cars- on-rw-r--r--3Root supergroup35588 -- .- - -: $/data/cars- Geneva-rw-r--r--3Root supergroup35584 -- .- - -: $/data/cars-Geneva-rw-r--r--3Root supergroup35584 -- .- - -: $/data/cars-Geneva-rw-r--r--3Root supergroup35596 -- .- - -: $/data/cars- to-rw-r--r--3Root supergroup35588 -- .- - -: $/data/cars- .-rw-r--r--3Root supergroup35586 -- .- - -: $/data/cars- --rw-r--r--3Root supergroup35584 -- .- - -: $/data/cars- ,-rw-r--r--3Root supergroup35574 -- .- - -: $/data/cars- the
For any one file, if you can do this locally:
cat cars- |./mapper.py |./reduce.py
Hadoop stream streaming Run python program