MapReduce achieves a two-degree friend relationship

Source: Internet
Author: User

I. Definition OF the problem

I found some online, about the two-degree network algorithm implementation, most of it is only through the breadth of search algorithm to find, the depth of hesitation has been clearly defined within 2; the algorithm is very simple, the first step to find the people you care about, the second step to find the people who are concerned about, and finally find the second step results in the highest frequency of (The frequency of this piece is not completed), that is done.

But if there are tens other users, that in the operation, it will certainly put these user's follow relationship in memory, the calculation of the time to find, first of all, I do not have a clear diagnosis of the contrast, the effect of this will not be based on the implementation of Hadoop good, just yourself, want to use Hadoop implementation, I've been learning a few days, and if there are inadequacies, please.

The task is to ask for a two-degree network of people, potential friends, such as:

For example I know C, G, H, but C do not know G, then c-g is a pair of potential friends, but g-h long known, so not counted as potential friends.

So one of the key questions is how to enter input.

First of all, there are five five-ring diagram, you can see a total of 13 edges, then the input data also has 13 is enough, such as the first input AB, then the turn of the B time will not enter the BA, the level of speed as well, because it will go heavy.

Second, the principle analysis

First, we do the first MapReduce, which is also an input line, which creates a pair of reciprocal relationships, pressing into the context, such as Tom Lucy, the input line, which makes Tom Lucy-lucy Tom a reciprocal relationship in the map phase. The map-reduce will then automatically merge the same key in the context. For example, the existence of Tom Lucy, Tom Jack, obviously produces a tom:{lucy,jack}, which is the key-value pair to start in the reduce phase. This key is equivalent to the person Tom knows. First, the following output, the potential friends will obviously be in {lucy,jack} This Tom knows the person produced, the array to do the Cartesian product, forming a relationship: {<lucy,lucy>,<jack,jack>,<lucy,jack >,<jack,lucy>}, that is, <Lucy,Lucy> this kind of meaningless culling,<lucy,jack>,<jack,lucy> identified as a relationship, the remaining relationship to the following output.

But the Cartesian product is like double for pairs of the same array, repeated calculation of half, how to reduce, my program is HashSet, the second how to iterator from the first set of where to start.

Third, code 3.1 Mapper
Package Friends;import Java.io.ioexception;import Java.util.stringtokenizer;import Org.apache.hadoop.io.longwritable;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.Mapper;  public class Deg2friendmapper extends mapper<longwritable, text, text, text> {public void map (longwritable key, text Value, Context context) throws IOException, interruptedexception {String line = value.tostring ();//   "\ T" means tab// StringTokenizer st = new StringTokenizer (line, ","),//while (St.hasmoretokens ())///with a while loop is a lot to need string[] SS = Line.split (","); Context.write (new text (Ss[0]), new text (Ss[1])), Context.write (new text (Ss[1]), new text (Ss[0]));}}

  

3.2 Reducer
Package Friends;import Java.io.ioexception;import java.util.hashset;import java.util.iterator;import java.util.Set; Import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.reducer;public class Deg2reducer extends Reducer <text, text, text, text> {public void reduce (Text key, iterable<text> value, Context context) throws Ioexceptio N, interruptedexception {//Process values//First is the same merge with key, and the repeat relationship after the value Cartesian product is removed set<string> set = new hashset< String> (); for (Text t:value) {//Same key merge//But why use HashSet, because the map inside thanks to the response, such as for a node, Xie ab,ba,//for the b node, Xie Ba,ab, So a starts with two AB, go heavy,//Why Want for loop because a may have a lot of friends//set.add (T.tostring ());} if (Set.size () >=2) {//Otherwise the description is only a once friend relationship//The value of a cartesian product iterator<string> iter = Set.iterator (); while (Iter.hasnext ()) {String name = Iter.next ();//iterator is written as a for loop. The third condition does not have to be married to an element for (iterator<string> iter2 = Set.iterator () ; Iter2.hasnext ();) {String name2 = Iter2.next (); if (!name2.equals (name)) {//the same element is not relational Context.write (new text (name), new text (name2));}}}}}} 

  

3.2 Main
Package Friends;import Org.apache.hadoop.conf.configuration;import Org.apache.hadoop.fs.path;import Org.apache.hadoop.io.intwritable;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.job;import Org.apache.hadoop.mapreduce.lib.input.fileinputformat;import Org.apache.hadoop.mapreduce.lib.output.fileoutputformat;public class Deg2main {public static void main (string[] args)  Throws exception{//TODO auto-generated method Stubconfiguration conf = new Configuration ();//corresponds to mapred-site.xmljob job = New Job (conf, "DEG2MR"); Job.setjarbyclass (Deg2main.class); Job.setmapperclass (Deg2friendmapper.class); Job.setreducerclass (Deg2reducer.class); Job.setoutputkeyclass (Text.class); Job.setoutputvalueclass (Text.class); Job.setnumreducetasks (1);//"/in" Cannot parse the hint file does not exist because they think it is a local file because there is a file:/fileinputformat.addinputpath (job, New Path (" Hdfs://192.168.58.180:8020/mltest/deg2mr/deg2mr.txt "));//output file cannot exist Fileoutputformat.setoutputpath (job, New Path (" Hdfs://192.168.58.180:8020/mltest/deg2mr/deg2out")); System.exit (Job.waitforcompletion (true)? 0:1);}}

  

3.4 Logs
M:org.apache.hadoop.mapreduce.job.updatestatus (job.java:323) info-job job_local1127799899_0001 completed Successfully debug-privilegedaction As:hxsyl (auth:simple) from:org.apache.hadoop.mapreduce.Job.getCounters ( job.java:765) info-counters:38file System countersfile:number of bytes Read=740file:number of bytes Written=509736fi Le:number of Read Operations=0file:number of large read operations=0file:number of write Operations=0hdfs:number of by TEs read=132hdfs:number of bytes Written=206hdfs:number of read Operations=13hdfs:number of large read Operations=0hdfs : Number of Write operations=4map-reduce frameworkmap input records=13map output records=26map output bytes=106map output Materialized bytes=164input split bytes=116combine input records=0combine output records=0reduce input groups=10reduce Shuffle bytes=164reduce input records=26reduce output records=50spilled records=52shuffled Maps =1failed shuffles= 0Merged Map OUTPUTS=1GC Time Elapsed (ms) =3CPU time spent (ms) =0physical Memory (bytes) snapshot=0virtual memory (bytes) Snapshot=0total committed heap usage (bytes) = 456130560Shuffle errorsbad_id=0connection=0io_error=0wrong_length=0wrong_map=0wrong_reduce=0file Input Format Counters Bytes read=66file Output Format Counters Bytes written=206 debug-privilegedaction as:hxsyl (auth:simple) from:o Rg.apache.hadoop.mapreduce.Job.updateStatus (job.java:323) debug-stopping client from cache: [email protected] Debug-removing client from cache: [email protected] debug-stopping actual client because no more references Remai N: [email protected] debug-stopping client DEBUG-IPC Client (521081105) connection to/192.168.58.180:8020 from H xsyl:closed DEBUG-IPC Client (521081105) connection to/192.168.58.180:8020 from hxsyl:stopped, remaining connections  0

  

3.5 output
BHHBACCABDBFBI DBDFDI fbfdfi I BI DI FCECFECEFFCFEDFFDCDCECGDCDEDGECEDEGGCGDGEFHFIHFHIIFIHAGAIGAGIIAIGGHHG

  

Iv. Thinking of 4.1 unidirectional

Similar to the father-son relationship to find the relationship between the sun, or concern or follow relationship, then the mapper phase is not deposited with each other.

4.2 Your most popular two-degree connections

Simple description: The person you are concerned with N people are concerned about XXX at the same time.

4.3 Set traversal

Double iterator convenient hashset, second weight How to start from the first set of iterator where to begin. This can be less than one-fold, you should be able to set into a number array bar.

But this is good, A is b two degrees, then B is a two degrees ....

4.4 In addition

At first reducer wrote wrong, Set.add (Tostring.tostring ()), unexpectedly no error, no toString this variable. Then the log is reducer stage without any writes.

V. References

http://blog.csdn.net/yongh701/article/details/50630498

http://blog.csdn.net/u013926113/article/details/51539306

https://my.oschina.net/BreathL/blog/75112

MapReduce achieves a two-degree friend relationship

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.