PV (Page views): Count (session_id)
Thread safety issues under multi-threading
First, PV statistics
Scenario Analysis
Is the following feasible?
1, define static long PV, Synchronized control cumulative operation
Synchronized and lock are valid under a single JVM, but are not valid under multiple JVMs
Two possible scenarios:
1, shufflegrouping, PV * executer concurrent number
2, BOLT1 for multi-concurrent local summary, BOLT2 single thread for global summary
Thread Safety: Multithreading results are consistent with single-threaded processes
Summary-based scenarios:
1, Shufflegrouping, PV (single thread result) * Executer concurrency
A executer default one task, if you set the number of tasks greater than 1, the formula should be:
PV (single-threaded result) * Task count,
The thread ID of the task under the same executer is the same, TaskID different
Advantages: Simple, low computational capacity
Cons: Slight error, but most scenarios can accept
Optimization:
Each task in the case Pvbolt outputs a summary value that actually requires only a single task output summary value.
Use zookeeper locks to achieve only one task output summary value, and every 5 s output once
2, BOLT1 for multi-concurrent local summary, BOLT2 single thread for global summary
Advantages: 1, absolute accuracy; 2, if using fieldgrouping can get the median value, such as a single user access to PV (access depth, is also a useful indicator)
Cons: Slightly larger calculation, and one more bolt
Spout:
PackageBase;ImportJava.util.Map;ImportJava.util.Queue;ImportJava.util.Random;ImportJava.util.concurrent.ConcurrentLinkedQueue;ImportBacktype.storm.spout.SpoutOutputCollector;ImportBacktype.storm.task.TopologyContext;Importbacktype.storm.topology.IRichSpout;ImportBacktype.storm.topology.OutputFieldsDeclarer;ImportBacktype.storm.tuple.Fields;Importbacktype.storm.tuple.Values; Public classSourcespoutImplementsirichspout{/*** Data source spout*/ Private Static Final LongSerialversionuid = 1L; Queue<String> queue =NewConcurrentlinkedqueue<string>(); Spoutoutputcollector collector=NULL; String Str=NULL; @Override Public voidnexttuple () {if(Queue.size () >= 0) {Collector.emit (NewValues (Queue.poll ())); } Try{Thread.Sleep (500) ; } Catch(interruptedexception e) {e.printstacktrace (); }} @Override Public voidOpen (Map conf, topologycontext context, Spoutoutputcollector collector) {Try { This. Collector =collector; Random Random=NewRandom (); String[] Hosts= {"Www.taobao.com" }; String[] session_id= {"Abyh6y4v4scvxtg6dpb4vh9u123", "xxyh6ycgfjyertt834r52fdxv9u34", "Bbyh61456fghhj7jl89rg5vv9uyu7", "cyyh6y2345ghi899ofg4v9u567", "vvvyh6y4v4sfxz56jipdpb4v678" }; String[] Time= {"2014-01-07 08:40:50", "2014-01-07 08:40:51", "2014-01-07 08:40:52", "2014-01-07 08:40:53", "2014-01-07 09:40:49", "2014-01-07 10:40:49", "2014-01-07 11:40:49", "2014-01-07 12:40:49" }; for(inti = 0; I < 100; i++) {Queue.add (hosts[0]+ "\ T" +session_id[random.nextint (5)]+ "\ T" +time[random.nextint (8)]); } } Catch(Exception e) {e.printstacktrace (); }} @Override Public voidClose () {//TODO auto-generated Method Stub} @Override Public voiddeclareoutputfields (Outputfieldsdeclarer declarer) {//TODO auto-generated Method StubDeclarer.declare (NewFields ("Log")); } @Override PublicMap<string, object>getcomponentconfiguration () {//TODO auto-generated Method Stub return NULL; } @Override Public voidack (Object msgId) {//TODO auto-generated Method StubSYSTEM.OUT.PRINTLN ("Spout ack:" +msgid.tostring ()); } @Override Public voidActivate () {//TODO auto-generated Method Stub} @Override Public voidDeactivate () {//TODO auto-generated Method Stub} @Override Public voidfail (Object msgId) {//TODO auto-generated Method StubSystem.out.println ("spout fail:" +msgid.tostring ()); }}
Bolt:
Packagecom.storm.visits;Importjava.net.InetAddress;ImportJava.util.Map;ImportOrg.apache.zookeeper.CreateMode;Importorg.apache.zookeeper.WatchedEvent;ImportOrg.apache.zookeeper.Watcher;ImportOrg.apache.zookeeper.ZooDefs.Ids;ImportOrg.apache.zookeeper.ZooKeeper;ImportBacktype.storm.task.OutputCollector;ImportBacktype.storm.task.TopologyContext;ImportBacktype.storm.topology.IRichBolt;ImportBacktype.storm.topology.OutputFieldsDeclarer;ImportBacktype.storm.tuple.Fields;Importbacktype.storm.tuple.Tuple;/*** shufflegrouping, PV (single-threaded result) * Executer concurrent number one executer default one task, if set task number is greater than 1, the formula should be: PV (single-threaded result) * Task number, An execute can have more than one task under the same Executer task's thread ID, taskid different use zookeeper lock to do only one task output summary value, and every 5 s output once **/ Public classPvboltImplementsirichbolt{Private Static Final LongSerialversionuid = 1L; PrivateOutputcollector collector; Public Static FinalString Zk_path = "/LOCK/STORM/PV"; ZooKeeper keeper=NULL; String Lockdata=NULL; @Override Public voidPrepare (Map stormconf, Topologycontext context, Outputcollector collector) { This. Collector =collector; Try{keeper=NewZooKeeper ("hadoop:2181", 3000,NewWatcher () {@Override Public voidprocess (Watchedevent event) {System.err.println ("Event:" +Event.gettype ()); } }); //determine if the zookeeper is connected, if no connection has been successfully waited, ensure that zookeeper can connect while(Keeper.getstate ()! =ZooKeeper.States.CONNECTED) {Thread.Sleep (1000); } inetaddress Address=Inetaddress.getlocalhost (); Lockdata= address.gethostaddress () + ":" +Context.getthistaskid (); //Other threads find that the directory already exists, guaranteeing a unique if(Keeper.exists (Zk_path,false) ==NULL) { //Create a temp directorykeeper.create (Zk_path, Lockdata.getbytes (), Ids.open_acl_unsafe, createmode.ephemeral); } } Catch(Exception e) {Try{keeper.close (); } Catch(interruptedexception E1) {e1.printstacktrace (); }}} String logstring=NULL; String session_id=NULL; LongPV =0; LongBeginTime =System.currenttimemillis (); LongEndTime = 0; @Override Public voidExecute (Tuple input) {Try{logstring= input.getstring (0); EndTime=System.currenttimemillis (); if(Logstring! =NULL) {session_id= Logstring.split ("\ t") [1]; if(session_id! =NULL) {PV++; } } //use zookeeper locks to achieve only one task output summary value, and every 5 s output once if(Endtime-begintime >= 5*1000 ) { //determine if it is equal to ensure that only one task can match if(Lockdata.equals (Keeper.getdata (Zk_path,false,NULL))) { //shufflegrouping, PV * executer concurrent numberSystem.err.println (Thread.CurrentThread (). GetName () + "PV =" +pv*4); } beginTime=System.currenttimemillis (); } collector.ack (input); } Catch(Exception e) {collector.fail (input); E.printstacktrace (); }} @Override Public voiddeclareoutputfields (Outputfieldsdeclarer declarer) {Declarer.declare (NewFields ("")); } @Override Public voidCleanup () {} @Override PublicMap<string, object>getcomponentconfiguration () {return NULL; }}
Package Com.storm.visits;import Java.util.hashmap;import Java.util.map;import backtype.storm.config;import Backtype.storm.localcluster;import Backtype.storm.stormsubmitter;import Backtype.storm.topology.TopologyBuilder; Import Backtype.storm.tuple.fields;import base. sourcespout;/** * * topologybuilder * */public class Pvtopo {public static void main (String [] args) throws exception{ Topologybuilder builder =new Topologybuilder ();//The tuple emitted by Message Queuing will not repeat builder.setspout ("spout", New Sourcespout (), 1); Builder.setbolt ("Bolt", New Pvbolt (), 4). shufflegrouping ("spout");//Set parameter Map conf = new HashMap (); if (Args.length > 0) {//Distributed commit stormsubmitter.submittopology (Args[0], conf, builder.createtopology ());} else{//Native mode commits localcluster Localcluster = new Localcluster (); Localcluster.submittopology ("Mytopology", conf, Builder.createtopology ());}}}
Kill Job:
Storm Kill Pvtopo
Submit topo:
Storm jar./starter.jar visits. Pvtopo Pvtopo
Storm's website PV statistics using zookeeper Lock to control threading operations