ebay Open Source Pulsar: Real-time Big data analytics platform

Source: Internet
Author: User

Wangchinglang Wang Ming Wang

EBay as a global business platform and payment industry leader , has a huge amount of user behavior data. Based on the existing Hadoop Big Data processing, it has not been able to meet the needs of real-time business. Based on ebay 's past experience of big data processing and the use of the latest technology,ebay explores a platform for real-time collection, processing, distribution, and analysis of massive data streams. And at the end of the year 2 Open Source this platform : Pulsar.

Pulsar as a complex event processing platform, with fast, accurate and flexible characteristics, to ensure low latency and high reliability to point-to, so well satisfied with the EBay second-level real-time data analysis needs. At the same time, millions traffic processing capacity per second, to bring customers a better personalized experience, to help customers monitor real-time business information and customized real-time marketing strategy, timely monitoring network fraud and reduce robot intervention. and Pulsar is a standard-based, distributed cloud architecture deployment that spans multiple data centers, ensuring no cluster downtime during system upgrades and topology updates. 

The Pulsar platform provides a complete solution for real-time Big data analytics:

the platform enables real-time collection Event Stream , and to Event for real-time Enrichment and the  push to different real-time applications while being able to perform statistics and analysis in real time, providing Key Insights to the business.
in the Pulsar inside the platform, it puts Event Stream As a kind of database table, on the above through the application of the statement-type 4GL to define  and at the same time open source as a support  of a new big data stream processing framework
pulsar.stream   is a generic new processing framework for big data streams.   He implemented an open, auto-discovered  topology,   different apps can be distributed in different  data Center,   automatically discovers and establishes connections over the network, and the data is active from the  producer   push to  subscriber.   pipeline   4GL   epl   topology   is open and dynamically extensible, corresponding to the  epl   is also capable of dynamic updates without service interruption. 
A typical deployment structure
EPL Sample:

Event Filtering and routing

Insert INTO Substream Select D1, D2, D3, D4
From rawstream where D1 = 2045573 or D2 = 2047936 or D3 = 2051457 or D4 = 2053742; Filtering
@PublishOn (topics= "TOPIC1")//Publish sub stream at TOPIC1
@OutputTo ("Outboundmessagechannel")
@ClusterAffinityTag (column = D1); Partition key based on column D1
SELECT * from Substream;

Aggregate computation

Create 10-second Time Window context
Create context Mccontext start @now end pattern [Timer:interval (10)];
Aggregate event count along Dimension D1 and D2 within specified time window
Context Mccontext INSERT INTO AGGREGATE select COUNT (*) as METRIC1, D1, D2 from Rawstream Group by D1,D2 output Snapshot W Hen terminated;
SELECT * from AGGREGATE;

TopN computation

Create 60-second Time Window context
Create context Mccontext start @now end pattern [Timer:interval (60)];
Sort to find top ten event counts along Dimensions D1, D2, and D3
Within specified time window
Context Mccontext INSERT INTO Topitems select COUNT (*) as TotalCount, D1, D2, D3 from Raweventstream Group by D1, D2, D3 o Rder by Count (*) limit 10;
SELECT * from Topitems;

For more information, see

Www.ebaytechblog.com/2015/02/23/announcing-pulsar-real-time-analytics-at-scale

Related Events :

1. Pulsar on QCon Shanghai 2014–

Http://www.infoq.com/cn/presentations/ebay-user-behavior-data-stream-processing-system

2.   http://www.milibo.com/talent/events.aspx?id=34

ebay Open Source Pulsar: Real-time Big data analytics platform

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.