I. Background of the problem
The actual business needs, such as the mobile for example, Henan users to Beijing Internet, then his Internet information is saved in the Beijing base station, then we want to query the Beijing area of the Internet log information By default also contains other regional users in the area of the Internet information, or can only scan the log to find Beijing, very slow, So zoning is very much needed.
Second, data set analysis
1363157985066 1372623050300-fd-07-a4-72-b8:cmcc120.196.100.82i02.c.aliimg.com24272481246812001363157995052 138265441015c-0e-8b-c7-f1-e0:cmcc120.197.40.44026402001363157991076 1392643565620-10-7a-28-cc-0a: CMCC120.196.100.992413215122001363154400022 139262511065c-0e-8b-8b-b1-50:cmcc120.197.40.44024002001363157993044 1821157596194-71-ac-cd-e6-18:cmcc-easy120.196.100.99iface.qiyi.com Video Site 1512152721062001363157995074 841384135c-0e-8b-8c-e8-20:7daysinn120.197.40.4122.72.52.122016411614322001363157993055 13560439658c4-17-fe-ba-de-d9:cmcc120.196.100.99181511169542001363157995033 159201332575c-0e-8b-c7-ba-20: cmcc120.197.40.4sug.so.360.cn Information Security 20203156293620013631579830191371919941968-a1-b7-03-07-b1: cmcc-easy120.196.100.824024002001363157984041 136605779915c-0e-8b-92-5c-20: cmcc-easy120.197.40.4s19.cnzz.com Site Statistics 24969606902001363157973098 150136858585c-0e-8b-c7-f7-90: Cmcc120.197.40.4rank.ie.sogou.com search engine 2827365935382001363157986029 15989002119e8-99-c4-4e-93-e0: Cmcc-easy120.196.100.99www.umeng.com Site Statistics 3319381802001363157992093 13560439658c4-17-fe-ba-de-d9:cmcc120.196.100.9915991849382001363157986041 134802531045c-0e-8b-c7-fc-80:cmcc-easy120.197.40.4331801802001363157984040 136028465655c-0e-8b-8b-b6-00: cmcc120.197.40.42052.flash2-http.qq.com Integrated Portal 1512193829102001363157995093 1392231446600-fd-07-a2-ec-ba: cmcc120.196.100.82img.qfc.cn1212300837202001363157982040 135024688235C-0A-5B-6A-0B-D4: cmcc-easy120.196.100.99y0.ifengimg.com Integrated Portal 5710273351103492001363157986072 1832017338284-25-db-4f-10-1a: Cmcc-easy120.196.100.99input.shouji.sogou.com search engine 2118953124122001363157990043 1392505741300-1f-64-e1-e6-9a: Cmcc120.196.100.55t3.baidu.com search engine 696311058482432001363157988072 1376077871000-fd-07-a4-7b-08: CMCC120.196.100.82221201202001363157985066 1372623888800-fd-07-a4-72-b8: cmcc120.196.100.82i02.c.aliimg.com24272481246812001363157993055 13560436666C4-17-FE-BA-DE-D9: CMCC120.196.100.9918151116954200
Look at the phone number of a column, see the top three is divided into mobile, unicom and telecommunications, but there are 84 beginning with the consent to belong to the overseas, then we need a total of 4 reducer, then need to partitioner inside need to be divided into four categories.
A reducer corresponds to a result file.
Can no longer run locally, so the words can only be a map, a reducer, regardless of the settings.
Iii. preparation of the theory 3.1 abstract classes and interfaces
We all know that in an object-oriented world, everything is an object, and all objects are described by classes, but not all classes are intended to describe objects. If a class does not have enough information to describe a specific object and needs other concrete classes to support it, then such a class is called an abstract class. For example New Animal (), we all know that this is an animal Animal object, but this Animal exactly what it looks like we do not know, it does not have a specific animal concept, so he is an abstract class, need a specific animal, such as dogs, cats to the specific description of it , we knew what it was like.
The difference between an abstract class and a normal class is forcing subclasses to rewrite the Frey method.
Public abstract class Animal {public abstract void Cry (); } public class Cat extends animal{ @Override public void Cry () { System.out.println ("Cat: Meow ..."); } Public class Dog extends animal{ @Override public void Cry () { System.out.println ("Dog bark: Bark ..."); } } public class Test {public static void Main (string[] args) { Animal a1 = new Cat (); Animal A2 = new Dog (); A1.cry (); A2.cry (); } } -------------------------------------------------------------------- Output: Cat's name: Meow meow ... Dog Bark: Bark ...
In fact, abstract class is a specification, such as the printer must have a printing function, but the specific printing color or black and white by the specific printer to implement, forcing other printers to implement the Pronunciation method, but the ordinary class does not have such a requirement, may be wrong.
Different levels of abstraction. An abstract class is an abstraction of a class, and an interface is an abstraction of the behavior. An abstract class is an abstraction of the whole class as a whole, including properties, behaviors, but an interface that abstracts the local (behavior) of a class.
An abstract class spans a class that has similar characteristics, whereas an interface can span different classes of domains. We know that the abstract class is discovering the public part from the subclass, and then generalizing it into an abstract class, and the subclass inherits the parent class, but the interface is different. The subclass that implements it can have no relationship, in common. For example, cats and dogs can be abstracted into an abstract class of animals, with a method called. Birds, airplanes can achieve flying fly interface, with the behavior of flying, here we can not be birds, airplanes share a parent class bar! So the abstract class embodies an inheritance relationship, in order to make the inheritance relationship reasonable, there must be a "is-a" relationship between the parent class and the derived class, that is, the parent class and the derived class should be the same in nature. For an interface, it does not require that the implementation of the interface and the interface definition are inherently consistent in concept, but only the contract that implements the interface definition.
Java itself does not support multiple inheritance, by implementing multiple interfaces to achieve multiple inheritance purposes. 3.2 Static block and single case
The static block executes before the instance is initialized, so you can do some initialization before the method call,
A singleton is a way to get an object, guaranteed to have only one implementation class,
The actual development is almost impossible, the Singleton spring provides the implementation, static can be used when testing, and loading some system configuration files may be loaded in the static block.
Iv. implementation of the Code
Partitioner is the result of the map execution and reduce has not been executed, so his type is the output type of map public class Datacountpartitioner extends Partitioner<text, databean> {//did not perform a change to read once the database is not good, can do cache, or make simple interest,//for a simple direct to make a static block private static map<string, integer> Datacountmap = new hashmap<string, integer> (), static {//statically executed from top to bottom, that is, first execute above Datacoutnmap, otherwise static block// NET Datacountmap in Cotton put things datacountmap.put ("135", 1);d atacountmap.put ("136", 1);d atacountmap.put ("137", 1); Datacountmap.put ("138", 1);d atacountmap.put ("139", 1);d atacountmap.put ("2",;d atacountmap.put ("159", 2); Datacountmap.put ("182", 2);d atacountmap.put ("183", 2);} int represents the area code//numpartitions: Several reducer have several of this value @overridepublic int getpartition (Text key, Databean value, int Numpartitions) {//TODO auto-generated method stubstring Telno = key.tostring ();//starting from 0 take 3 bit string subtelno = Telno.substrin G (0, 3), Integer code = datacountmap.get (Subtelno),//186 843, etc. the default is foreign if (null==code) {code = 0;} return code;}}
V. Results Analysis 5.1 _success
This is useless, mapreduce comes with, but if your program has more than one mapreduce, there must be intermediate results, then it can be _success to determine whether to perform the last step, that is, when the data, if found a step _success Then the previous step does not need to run the MapReduce, directly execute the following program.
5.2 Results
See results found 0 inside is 134 and 841 start, to achieve the expected, 1 and 2 are Unicom, 3 is empty, why? Because partitioner inside the classification set Class 3, and reducer number is 4, one reducer no data powder sent past so is empty.
If the number of reducer is less than the number of partitioner, I found the output file refueling, no error, is empty folder.
MapReduce for mobile Internet log analysis (partitioning)