Python Big Data App Introduction
Introduction: At present, the industry mainstream storage and analysis platform for the Hadoop-based open-source ecosystem, mapreduce as a data set of Hadoop parallel operation model, in addition to provide Java to write MapReduce task, but also compatible with the streaming way, You can use any scripting language to write MapReduce tasks, with the advantage of being simple and flexible to develop.
Hadoop environment Deployment 1, the deployment of Hadoop requires master access to all slave host implementation without password login, that is, configure the account public key authentication. 2, Master host installation JDK environment
3, Master host installation Hadoop3.1, download Hadoop, extract to/usr/local directory 3.2, modify the Java environment variables in hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.41.x86_64
3.3, modify the Core-site.xml (Hadoop core configuration file)
<configuration><property> <name>hadoop.tmp.dir</name> <value>/data/tmp/hadoop-${user.name}</value></property><property> <name>fs.default.name</name> <value>hdfs://192.168.1.1:9000</value></property></configuration>
3.4. Modify Hdfs-site.xml (Configuration entry for Hadoop's HDFS components)
<configuration><property> <name>dfs.name.dir</name> <value>/data/tmp/name</value></property><property> <name>dfs.data.dir</name> <value>/data/hdfs/data</value></property><property> <name>dfs.datanode.max.xcievers</name> <value>4096</value></property><property> <name>dfs.replication</name> <value>2</value></property></configuration>
3.5. Modify Mapred-site.xml (Configure the properties of the Map-reduce component, including Jobtracker and Tasktracker)
<configuration><property> <name>mapred.job.tracker</name> <value>192.168.1.1:9001</value></property></configuration>
3.6. Modify the Masters,slaves configuration file
Masters file
192.168.1.1
Slaves file
192.168.1.1192.168.1.2192.168.1.3
4, slave host configuration 4.1, configuration and master host-like JDK environment, the target path remains the same 4.2, the master host configuration of the Hadoop environment to the slave host 5, configuration firewall
Master Host
iptables -I INPUT -s 192.168.1.0/24 -p tcp --dport 50030 -j ACCEPTiptables -I INPUT -s 192.168.1.0/24 -p tcp --dport 50070 -j ACCEPTiptables -I INPUT -s 192.168.1.0/24 -p tcp --dport 9000 -j ACCEPTiptables -I INPUT -s 192.168.1.0/24 -p tcp --dport 90001 -j ACCEPT
Slave host
iptables -I INPUT -s 192.168.1.0/24 -p tcp --dport 50075 -j ACCEPTiptables -I INPUT -s 192.168.1.0/24 -p tcp --dport 50060 -j ACCEPTiptables -I INPUT -s 192.168.1.1 -p tcp --dport 50010 -j ACCEPT
6, test results 6.1, execute the start command on Master host (under the installation directory)
./bin/start-all.sh
The results shown are as follows, indicating a successful start
6.2. Test the MapReduce sample on the master host
./bin/hadoop jar hadoop-examples-1.2.1.jar pi 10 100
The results shown are as follows, indicating a successful configuration
7. Add: Visit the Management page provided by Hadoop
Map/reduce Management Address: 192.168.1.1:50030
HDFs Management Address: 192.168.1.1:50070
1. Python Big Data application-Deploy Hadoop