Ha highly available configuration-json-remote debugging-cluster distance

Last Update:2018-04-25 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Join:map End Join//Big table + Small table, only need mapReduce end Join//Big table + Big table, need map and reduce//Design combination key and flag//Group Contrast deviceJson:fastjson Technical Json.parseobject (str)//turn a string into JsonobjectJo.get (Key)//gets the value,string form that corresponds to the key specified by the JSON stringJo.getjsonarray//gets the value of the key specified by the JSON string, the array formin MR1 generation and MR2 generation resource scheduling differences:1, the node name in the generation is called Jobtracker (Master node) and tasktracker the node name in the second generation is called ResourceManager (Master node) and NodeManager2, Jobtracker acts as all resource allocations and schedules, and the node assignment ResourceManager is only responsible for allocating resources, then starting the Mrappmaster process from the node, which is responsible for the execution and monitoring of all jobs by Appmaster for remote debugging: Reason : The local client cannot submit the Mr Job remote debugging directly to the cluster:1, server: Linux (listening) Java-AGENTLIB:JDWP=TRANSPORT=DT_SOCKET,SERVER=Y,SUSPEND=Y,ADDRESS=192.168.23.101:8888-CP myhadoop-1.0-Snapshot.jar Helloworld2, Client: Idea (connection) under class name, Edit Configuration==> +remote, modify port and host, then debug remotely debug HADOOP program: Modify ${hadoop_home}\etc\hadoop\hadoop-Add the following line to the env.sh file: Export hadoop_opts= "-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=192.168.23.101:8888"calls from Resourcemgrdelegate to Yarnclientimpl RPC and IPC: RPC: Remote procedure call, using serialization technology IPC in RPC for Hadoop: interprocess communication, also using Serialization technology ha: High availability: Use two namenode to avoid problems1, High availability problem: Two Namenode simultaneous operation, one node in active state and another node in standby state//Standby Status2, data synchronization problem: Journalnode: Responsible for synchronizing data between the two Namenode, typically configured on the Datanode high-availability configuration:=====================================HDFs-Site.xmlLogical names for--------------------------------------------------------<!--namespaces--><property> <name> Dfs.nameservices</name> <value>mycluster</value></property><!--Point to two Namenode-- <property> <name>dfs.ha.namenodes.mycluster</name> <value>nn1,nn2</value></ property><!--Configuring the RPC address of the Namenode--><property> <name>dfs.namenode.rpc-address.mycluster.nn1 </name> <value>s101:8020</value></property><property> <name> Dfs.namenode.rpc-address.mycluster.nn2</name> <value>s105:8020</value></property><!- -Configure HTTP ports for Namenode--><property> <name>dfs.namenode.http-address.mycluster.nn1</name> < Value>s101:50070</value></property><property> <name> Dfs.namenode.http-address.mycluster.nn2</name> <value>s105:50070</value></property>< !--Configure Journalnode address, configured at Datanode address--><p roperty> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal: //s102:8485;s103:8485;s104:8485/mycluster</value></property><!--Configure disaster recovery agent, default to--><property> <name> Dfs.client.failover.proxy.provider.mycluster</name> <value> org.apache.hadoop.hdfs.server.namenode.ha.configuredfailoverproxyprovider</value></property><!- -Configure protection by using the shell (/bin/true)--><property> <name>dfs.ha.fencing.methods</name> <value>shell (/bin/true) </value></property>Core-Site.xml---------------------------------------<!--use logical names to determine file system--><property> <name>fs.defaultfs</ Name> <value>hdfs://mycluster</value></property><!--journalnode working directory address--><property> <name>dfs.journalnode.edits.dir</name > <value>/home/centos/hadoop/full/journal</value></property> 1, distribute the files after the configuration is complete:2, configure s105 SSH login: Directly copy the private key on the s101 can also ssh-copy-ID [email protected] SSH-copy-ID [email protected] SSH-copy-ID [email protected] SSH-copy-ID [email protected] SSH-copy-ID [email protected]3, distribute all working directories in Namenode to s105 rsync-LR ~/hadoop [Email protected]05:~ 4, start journalnode:s101 Hadoop-daemons.sh start Journalnode5, HDFs Namenode-initializesharededits6. Start the process: start-dfs.sh7, turn Namenode into active state HDFs haadmin-transitiontoactive nn1 Clustering: To turn seemingly unrelated collections into a related cluster example: The process of seeing a star's collection as a cluster is called a poly-clustering class algorithm that is used to recommend or group data into clusters: K-means cluster computing distance between data: European-style distance1, read center point2, compares all the rows in the center point and map, and calculates the distance output group (cluster) number and group (cluster) data3, re-aggregating the data on the reduce side and updating the center point4, each iteration of Mr, until the center data no longer changes, to determine when the Mr Job reaches the end distance:1, European distance//Euclidean distance A (1,0) b (in) √ (x1-x2) ^2 + (y1-y2) ^2√2. Manhattan Distance//|x1-x2| + |y1-y2|3, Chebyshev distance//min{(X1-X2), (Y1-Y2)} applied in probabilistic scenarioswhen the physical dimension exceeds three dimensions, the Euclidean distance loses its meaning and can only be used as a measure of distance.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Ha highly available configuration-json-remote debugging-cluster distance

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Ha highly available configuration-json-remote debugging-cluster distance

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support