Ha highly available configuration-json-remote debugging-cluster distance

Source: Internet
Author: User

Join:map End Join//Big table + Small table, only need mapReduce end Join//Big table + Big table, need map and reduce//Design combination key and flag//Group Contrast deviceJson:fastjson Technical Json.parseobject (str)//turn a string into JsonobjectJo.get (Key)//gets the value,string form that corresponds to the key specified by the JSON stringJo.getjsonarray//gets the value of the key specified by the JSON string, the array formin MR1 generation and MR2 generation resource scheduling differences:1, the node name in the generation is called Jobtracker (Master node) and tasktracker the node name in the second generation is called ResourceManager (Master node) and NodeManager2, Jobtracker acts as all resource allocations and schedules, and the node assignment ResourceManager is only responsible for allocating resources, then starting the Mrappmaster process from the node, which is responsible for the execution and monitoring of all jobs by Appmaster for remote debugging: Reason : The local client cannot submit the Mr Job remote debugging directly to the cluster:1, server: Linux (listening) Java-AGENTLIB:JDWP=TRANSPORT=DT_SOCKET,SERVER=Y,SUSPEND=Y,ADDRESS=192.168.23.101:8888-CP myhadoop-1.0-Snapshot.jar Helloworld2, Client: Idea (connection) under class name, Edit Configuration==> +remote, modify port and host, then debug remotely debug HADOOP program: Modify ${hadoop_home}\etc\hadoop\hadoop-Add the following line to the env.sh file: Export hadoop_opts= "-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=192.168.23.101:8888"calls from Resourcemgrdelegate to Yarnclientimpl RPC and IPC: RPC: Remote procedure call, using serialization technology IPC in RPC for Hadoop: interprocess communication, also using Serialization technology ha: High availability: Use two namenode to avoid problems1, High availability problem: Two Namenode simultaneous operation, one node in active state and another node in standby state//Standby Status2, data synchronization problem: Journalnode: Responsible for synchronizing data between the two Namenode, typically configured on the Datanode high-availability configuration:=====================================HDFs-Site.xmlLogical names for--------------------------------------------------------<!--namespaces--><property> <name> Dfs.nameservices</name> <value>mycluster</value></property><!--Point to two Namenode-- <property> <name>dfs.ha.namenodes.mycluster</name> <value>nn1,nn2</value></ property><!--Configuring the RPC address of the Namenode--><property> <name>dfs.namenode.rpc-address.mycluster.nn1 </name> <value>s101:8020</value></property><property> <name> Dfs.namenode.rpc-address.mycluster.nn2</name> <value>s105:8020</value></property><!- -Configure HTTP ports for Namenode--><property> <name>dfs.namenode.http-address.mycluster.nn1</name> < Value>s101:50070</value></property><property> <name> Dfs.namenode.http-address.mycluster.nn2</name> <value>s105:50070</value></property>< !--Configure Journalnode address, configured at Datanode address--><p roperty> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal: //s102:8485;s103:8485;s104:8485/mycluster</value></property><!--Configure disaster recovery agent, default to--><property> <name> Dfs.client.failover.proxy.provider.mycluster</name> <value> org.apache.hadoop.hdfs.server.namenode.ha.configuredfailoverproxyprovider</value></property><!- -Configure protection by using the shell (/bin/true)--><property> <name>dfs.ha.fencing.methods</name> <value>shell (/bin/true) </value></property>Core-Site.xml---------------------------------------<!--use logical names to determine file system--><property> <name>fs.defaultfs</ Name> <value>hdfs://mycluster</value></property><!--journalnode working directory address--><property> <name>dfs.journalnode.edits.dir</name > <value>/home/centos/hadoop/full/journal</value></property> 1, distribute the files after the configuration is complete:2, configure s105 SSH login: Directly copy the private key on the s101 can also ssh-copy-ID [email protected] SSH-copy-ID [email protected] SSH-copy-ID [email protected] SSH-copy-ID [email protected] SSH-copy-ID [email protected]3, distribute all working directories in Namenode to s105 rsync-LR ~/hadoop [Email protected]05:~ 4, start journalnode:s101 Hadoop-daemons.sh start Journalnode5, HDFs Namenode-initializesharededits6. Start the process: start-dfs.sh7, turn Namenode into active state HDFs haadmin-transitiontoactive nn1 Clustering: To turn seemingly unrelated collections into a related cluster example: The process of seeing a star's collection as a cluster is called a poly-clustering class algorithm that is used to recommend or group data into clusters: K-means cluster computing distance between data: European-style distance1, read center point2, compares all the rows in the center point and map, and calculates the distance output group (cluster) number and group (cluster) data3, re-aggregating the data on the reduce side and updating the center point4, each iteration of Mr, until the center data no longer changes, to determine when the Mr Job reaches the end distance:1, European distance//Euclidean distance A (1,0) b (in) √ (x1-x2) ^2 + (y1-y2) ^2√2. Manhattan Distance//|x1-x2| + |y1-y2|3, Chebyshev distance//min{(X1-X2), (Y1-Y2)} applied in probabilistic scenarioswhen the physical dimension exceeds three dimensions, the Euclidean distance loses its meaning and can only be used as a measure of distance.

Ha highly available configuration-json-remote debugging-cluster distance

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.