Storm's website PV statistics using zookeeper Lock to control threading operations

Source: Internet
Author: User
Tags ack stub zookeeper

PV (Page views): Count (session_id)

Thread safety issues under multi-threading

First, PV statistics

Scenario Analysis

Is the following feasible?

1, define static long PV, Synchronized control cumulative operation

Synchronized and lock are valid under a single JVM, but are not valid under multiple JVMs

Two possible scenarios:

1, shufflegrouping, PV * executer concurrent number

2, BOLT1 for multi-concurrent local summary, BOLT2 single thread for global summary

Thread Safety: Multithreading results are consistent with single-threaded processes

Summary-based scenarios:

1, Shufflegrouping, PV (single thread result) * Executer concurrency

A executer default one task, if you set the number of tasks greater than 1, the formula should be:

PV (single-threaded result) * Task count,

The thread ID of the task under the same executer is the same, TaskID different

Advantages: Simple, low computational capacity

Cons: Slight error, but most scenarios can accept

Optimization:

Each task in the case Pvbolt outputs a summary value that actually requires only a single task output summary value.

Use zookeeper locks to achieve only one task output summary value, and every 5 s output once

2, BOLT1 for multi-concurrent local summary, BOLT2 single thread for global summary

Advantages: 1, absolute accuracy; 2, if using fieldgrouping can get the median value, such as a single user access to PV (access depth, is also a useful indicator)

Cons: Slightly larger calculation, and one more bolt

Spout:

 PackageBase;ImportJava.util.Map;ImportJava.util.Queue;ImportJava.util.Random;ImportJava.util.concurrent.ConcurrentLinkedQueue;ImportBacktype.storm.spout.SpoutOutputCollector;ImportBacktype.storm.task.TopologyContext;Importbacktype.storm.topology.IRichSpout;ImportBacktype.storm.topology.OutputFieldsDeclarer;ImportBacktype.storm.tuple.Fields;Importbacktype.storm.tuple.Values; Public classSourcespoutImplementsirichspout{/*** Data source spout*/    Private Static Final LongSerialversionuid = 1L; Queue<String> queue =NewConcurrentlinkedqueue<string>(); Spoutoutputcollector collector=NULL; String Str=NULL; @Override Public voidnexttuple () {if(Queue.size () >= 0) {Collector.emit (NewValues (Queue.poll ())); }        Try{Thread.Sleep (500) ; } Catch(interruptedexception e) {e.printstacktrace (); }} @Override Public voidOpen (Map conf, topologycontext context, Spoutoutputcollector collector) {Try {             This. Collector =collector; Random Random=NewRandom (); String[] Hosts= {"Www.taobao.com" }; String[] session_id= {"Abyh6y4v4scvxtg6dpb4vh9u123", "xxyh6ycgfjyertt834r52fdxv9u34", "Bbyh61456fghhj7jl89rg5vv9uyu7",                    "cyyh6y2345ghi899ofg4v9u567", "vvvyh6y4v4sfxz56jipdpb4v678" }; String[] Time= {"2014-01-07 08:40:50", "2014-01-07 08:40:51", "2014-01-07 08:40:52", "2014-01-07 08:40:53",                     "2014-01-07 09:40:49", "2014-01-07 10:40:49", "2014-01-07 11:40:49", "2014-01-07 12:40:49" };  for(inti = 0; I < 100; i++) {Queue.add (hosts[0]+ "\ T" +session_id[random.nextint (5)]+ "\ T" +time[random.nextint (8)]); }                    } Catch(Exception e) {e.printstacktrace (); }} @Override Public voidClose () {//TODO auto-generated Method Stub} @Override Public voiddeclareoutputfields (Outputfieldsdeclarer declarer) {//TODO auto-generated Method StubDeclarer.declare (NewFields ("Log")); } @Override PublicMap<string, object>getcomponentconfiguration () {//TODO auto-generated Method Stub        return NULL; } @Override Public voidack (Object msgId) {//TODO auto-generated Method StubSYSTEM.OUT.PRINTLN ("Spout ack:" +msgid.tostring ()); } @Override Public voidActivate () {//TODO auto-generated Method Stub} @Override Public voidDeactivate () {//TODO auto-generated Method Stub} @Override Public voidfail (Object msgId) {//TODO auto-generated Method StubSystem.out.println ("spout fail:" +msgid.tostring ()); }}

Bolt:

 Packagecom.storm.visits;Importjava.net.InetAddress;ImportJava.util.Map;ImportOrg.apache.zookeeper.CreateMode;Importorg.apache.zookeeper.WatchedEvent;ImportOrg.apache.zookeeper.Watcher;ImportOrg.apache.zookeeper.ZooDefs.Ids;ImportOrg.apache.zookeeper.ZooKeeper;ImportBacktype.storm.task.OutputCollector;ImportBacktype.storm.task.TopologyContext;ImportBacktype.storm.topology.IRichBolt;ImportBacktype.storm.topology.OutputFieldsDeclarer;ImportBacktype.storm.tuple.Fields;Importbacktype.storm.tuple.Tuple;/*** shufflegrouping, PV (single-threaded result) * Executer concurrent number one executer default one task, if set task number is greater than 1, the formula should be: PV (single-threaded result) * Task number, An execute can have more than one task under the same Executer task's thread ID, taskid different use zookeeper lock to do only one task output summary value, and every 5 s output once **/ Public classPvboltImplementsirichbolt{Private Static Final LongSerialversionuid = 1L; PrivateOutputcollector collector;  Public Static FinalString Zk_path = "/LOCK/STORM/PV"; ZooKeeper keeper=NULL; String Lockdata=NULL; @Override Public voidPrepare (Map stormconf, Topologycontext context, Outputcollector collector) { This. Collector =collector; Try{keeper=NewZooKeeper ("hadoop:2181", 3000,NewWatcher () {@Override Public voidprocess (Watchedevent event) {System.err.println ("Event:" +Event.gettype ());                         }            }); //determine if the zookeeper is connected, if no connection has been successfully waited, ensure that zookeeper can connect             while(Keeper.getstate ()! =ZooKeeper.States.CONNECTED) {Thread.Sleep (1000); } inetaddress Address=Inetaddress.getlocalhost (); Lockdata= address.gethostaddress () + ":" +Context.getthistaskid (); //Other threads find that the directory already exists, guaranteeing a unique            if(Keeper.exists (Zk_path,false) ==NULL) {                //Create a temp directorykeeper.create (Zk_path, Lockdata.getbytes (), Ids.open_acl_unsafe, createmode.ephemeral); }                    } Catch(Exception e) {Try{keeper.close (); } Catch(interruptedexception E1) {e1.printstacktrace (); }}} String logstring=NULL; String session_id=NULL; LongPV =0; LongBeginTime =System.currenttimemillis (); LongEndTime = 0; @Override Public voidExecute (Tuple input) {Try{logstring= input.getstring (0); EndTime=System.currenttimemillis (); if(Logstring! =NULL) {session_id= Logstring.split ("\ t") [1]; if(session_id! =NULL) {PV++; }            }                         //use zookeeper locks to achieve only one task output summary value, and every 5 s output once             if(Endtime-begintime >= 5*1000 ) {                                  //determine if it is equal to ensure that only one task can match                 if(Lockdata.equals (Keeper.getdata (Zk_path,false,NULL))) {                                           //shufflegrouping, PV * executer concurrent numberSystem.err.println (Thread.CurrentThread (). GetName () + "PV =" +pv*4); } beginTime=System.currenttimemillis ();                    } collector.ack (input); } Catch(Exception e) {collector.fail (input);        E.printstacktrace (); }} @Override Public voiddeclareoutputfields (Outputfieldsdeclarer declarer) {Declarer.declare (NewFields ("")); } @Override Public voidCleanup () {} @Override PublicMap<string, object>getcomponentconfiguration () {return NULL; }}

Package Com.storm.visits;import Java.util.hashmap;import Java.util.map;import backtype.storm.config;import Backtype.storm.localcluster;import Backtype.storm.stormsubmitter;import Backtype.storm.topology.TopologyBuilder; Import Backtype.storm.tuple.fields;import base. sourcespout;/** *  * topologybuilder * */public class Pvtopo {public static void main (String [] args) throws exception{ Topologybuilder builder =new Topologybuilder ();//The tuple emitted by Message Queuing will not repeat builder.setspout ("spout", New Sourcespout (), 1); Builder.setbolt ("Bolt", New Pvbolt (), 4). shufflegrouping ("spout");//Set parameter Map conf = new HashMap (); if (Args.length > 0) {//Distributed commit stormsubmitter.submittopology (Args[0], conf, builder.createtopology ());} else{//Native mode commits localcluster Localcluster = new Localcluster (); Localcluster.submittopology ("Mytopology", conf, Builder.createtopology ());}}}

  

Kill Job:

Storm Kill Pvtopo

Submit topo:

Storm jar./starter.jar visits. Pvtopo Pvtopo

Storm's website PV statistics using zookeeper Lock to control threading operations

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.