A case study of zookeeper-based split-step queue system integration

Source: Internet
Author: User
Tags message queue split static class tostring zookeeper


The Hadoop family of articles, mainly about the Hadoop family of products, commonly used projects include Hadoop, Hive, Pig, HBase, Sqoop, Mahout, Zookeeper, Avro, Ambari, Chukwa, and new additions to the project including, YARN, Hcatalog, Oozie, Cassandra, Hama, Whirr, Flume, Bigtop, Crunch, hue, etc.

Since 2011, China has entered the era of big data surging, and the family software, represented by Hadoop, occupies a vast expanse of data processing. Open source industry and vendors, all data software, no one to Hadoop closer. Hadoop has also become the standard for big data development in the areas of high-fidelity from a small audience. On top of Hadoop's legacy technology, the Hadoop family of products has evolved through the concept of "big data".

As developers of it, we have to keep up with the rhythm, seize the opportunity, and follow Hadoop together.

About the Author: Zhang Dan (Conan), programmer Java,r,php,javascript Weibo: @Conan_Z blog:http://blog.fens.me email:bsspirit@gmail.com

Reprint please specify the source:
http://blog.fens.me/hadoop-zookeeper-case/

Objective

Software system integration has always been a difficult problem for the industry, such as 10 years of legacy system integration, after the acquisition of multi-system integration, the global system integration, such as the step. Although SOA-based software architectures can theoretically address these integration issues, some integration projects are too complex to fail in the specific implementation process.

With the innovation and development of the technology, for the integration of the step-down cluster application, there is a better open source software support, like zookeeper is a good step-stepping collaboration software platform. This article will introduce a case study of Zookeeper's power.

Catalog Project background: Distributed message middleware Requirements analysis: Business System upgrade Scheme Architecture design: Building Zookeeper-Step Collaborative Platform Program Development: Program design program running based on Zookeeper 1. Project background: Distributed message middleware

With the popularity of Hadoop, more and more companies are starting to build their own hadoop systems. Sometimes, different departments within the company or different teams have their own hadoop clusters. This multi-cluster approach allows each team to personalize Hadoop and avoid the high level of cluster complexity. When the amount of data is not particularly large, small clusters can be used for many occasions.

Of course, multiple small clusters also have the disadvantage that resource allocation can cause waste. Each team's Hadoop cluster is equipped with servers and operations personnel. Some powerful teams, built Hadoop clusters, can achieve true personalization requirements, while some of the ability to build Hadoop cluster performance is worse than poor teams.

There are also times when multiple teams need to work together to accomplish a task, for example, a team with the results of the Hadoop cluster calculation, to the B team to continue working, B completed its own task to the C team to continue to do. This is a bit like the workflow of a business system, which passes through the loop until the last part is finished.

In a business system, we often use the SOA architecture to solve this problem, each team deploys its own service on the ESB server and then completes the dispatch task through the message middleware. The same architecture can be used for the collaboration of multiple, multi-step Hadoop cluster systems, as long as the messaging middleware engine is replaced by the engine that supports the step-up messaging middleware.

Zookeeper can be used as a step-by message middleware to complete the above-mentioned business requirements. Zookeeper is a high-performance distributed collaborative product of the Hadoop family, a distributed and open source coordination service designed for distributed applications, which is mainly used to solve some data management problems frequently encountered in distributed applications, simplifying the coordination of distributed applications and the difficulty of management. Provide high-performance distributed services. Zookeeper installation and use, please refer to the article Zookeeper pseudo-distributed cluster installation and use.

Zookeeper provides distributed collaboration services without the need for a hadoop-dependent environment. 2. Demand Analysis: Business system upgrade Solution

I'll start with a case study that explains how to design a system for a step-by-stage collaboration platform.

2.1 Case Introduction

A large software company, engaged in the field of supply chain management, the main business includes procurement management, accounts payable management, Receivables Management, supplier management, return management, sales management, inventory management, e-commerce, system integration and so on.

Each business logic is very complex, by a separate department for Software development and maintenance, the system between departments do not have direct communication needs, each department to complete its own function on the line, finally through the database to share data, to achieve the data exchange between the functions.

With the development of the business, customers demand more and more response speed, through the database to share data, has not reached the requirements of information exchange, the system has been upgraded for the first time, through the Enterprise Service Bus (ESB) unified management of all business within the company. Through the WebServices Publishing Service, the dispatch of business functions is realized through message queue.

The size of the company continues to expand, multinational acquisition of a number of companies. The centralized deployment of the business system from one computer room into a global multi-engine room of the deployment of the step-by. At this time, the message queue can not meet the multi-engine room cross-regional business system functional requirements, a step-by-phase message middleware solution to replace the original message middleware services.

The second upgrade of the system is carried out, and the zookeeper is used as the scheduling engine of the step-up middleware.

From the above description, we can see that when a company from small to large, from domestic business development to global business.
In order to cooperate with the business development, IT system is also more and more complex, from the earliest master-slave database design, to the expansion of ESB Enterprise system bus, and then to the step-up ESB with the step-by-stage message system, each upgrade needs the support of software technology.

2.2 Functional Requirements

Global sourcing and global sales operations give companies a competitive edge in the market. However, because the procurement and sales are different departments of software development and maintenance, and business dealings in different countries and regions. So at the end of each month, the workload is particularly large.

For example, calculate the Profit statement (please do not dwell on the accuracy of the formula)

Monthly Profit = Monthly Sales amount-month purchase amount-other expenses for the month

This is a very simple formula, but it is not easy for multinationals and departments.

From a system point of view, the procurement Department to statistical procurement data (huge amount of data), sales Department statistics sales data ((massive data), other department statistics of other expenses (summary of small amounts of data), and finally the system calculates the profit of the month.

Here to illustrate is that the procurement system is a separate system, sales are separate systems, and the other dozens of large and small systems, how to make a number of systems, together to do this calculation problem? 3. Architecture Design: Build Zookeeper collaborative platform

Next, we build a zookeeper-based application to address the functional requirements above. The following section excludes the portion of the ESB, leaving only the zookeeper for implementation. Sourcing data for massive amounts of data, based on Hadoop storage and analytics. Sales data for massive amounts of data, based on Hadoop storage and analytics. Other expense expenditures, for small amounts of data, based on file or database storage and analysis.

We design a synchronization queue that has 3 conditional nodes, corresponding to the purchase (purchase), sales (sell), and other expenses (other) 3 sections. When 3 nodes are created, the program automatically triggers the calculated profit and creates a profit (profit) node. Creation of the above 3 nodes, no order required. Each node can only be created once.

System environment 2 separate Hadoop clusters 2 separate Java applications 3 zookeeper cluster nodes

Icon Explanation: Hadoop app1,hadoop App2 is 2 separate hadoop cluster applications Java App3,java APP4 is 2 separate Java applications ZK1,ZK2,ZK3 is zookeeper cluster of 3 connection points/queue, is the Znode queue directory, assuming that the queue length is 3/queue/purchase, is the Znode queue, the number 1th pairs, submitted by the Hadoop App1, used to count the purchase amount. /queue/sell, is the Znode queue, number 2nd pairs, submitted by the Hadoop App2, used to count sales amounts. /queue/other, is the Znode queue, number 3rd pairs, submitted by the Java APP3, to be used to count other expense expense amounts. /queue/profit, when the Znode queue is full, the create profit node is triggered. When/qeueu/profit is created, APP4 is started, all ZK connections are notified to the Synchronizer (red line), the queue is complete, and all programs end.

Additional note: When creating the/queue/purchase,/queue/sell,/queue/other directory, there is no order after the program is submitted, the/queue directory will be generated to the sub-directory APP1 can be submitted through ZK2, APP2 can also be submitted via ZK3. In principle, find the most recently routed Znode node submission. Each application cannot be repeated until 3 tasks are committed, and the task of calculating profits will be executed. After the/queue/profit is created, the ZK application will listen to the event, notify the application, and the queue is complete.

Here's the structure of the synchronization queue more detailed design ideas, please refer to the article zookeeper implementation of distributed queuing queue 4. Program development: Programming based on Zookeeper

Final functional requirements: Calculate profit for January 2013.

4.1 Experimental environment

In real enterprise development, our experimental environment should be consistent with the requirements, but my hardware conditions are limited, because some do a simplified environment settings. The zookeeper fully phased deployment of 3 server cluster nodes, the 3 cluster nodes on a single server. Change the 2 independent Hadoop clusters to 2 separate mapreduce tasks for a cluster.

Development environment: Win7 64bit JDK 1.6 Maven3 Juno Service Release 2 ip:192.168.1.10

Zookeeper Server environment: Linux Ubuntu 12.04 LTS 64bit Java 1.6.0_29 zookeeper:3.4.5 ip:192.168.1.201 3 cluster nodes

Hadoop Server environment: Linux Ubuntu 12.04 LTS 64bit Java 1.6.0_29 hadoop:1.0.3 ip:192.168.1.210

4.2 Experimental data

3 sets of experimental data: purchasing data, purchase.csv sales data, sell.csv other cost data, other.csv

4.2.1 Purchase Data Set

Total 4 columns, corresponding to Product ID, product quantity, product unit price, purchase date.

1,26,1168,2013-01-08
2,49,779,2013-02-12
3,80,850,2013-02-05
4,69,1585,2013-01-26
5,88,1052,2013-01-13
6,84,2363,2013-01-19
7,64,1410,2013-01-12
8,53,910,2013-01-11
9,21,1661,2013-01-19
10,53,2426,2013-02-18
11,64,2022,2013-01-07
12,36,2941,2013-01-28
13,99,3819,2013-01-19
14,64,2563,2013-02-16
15,91,752,2013-02-05
16,65,750,2013-02-04
17,19,2426,2013-02-23
18,19,724,2013-02-05
19,87,137,2013-01-25
20,86,2939,2013-01-14
21,92,159,2013-01-23
22,81,2331,2013-03-01
23,88,998,2013-01-20
24,38,102,2013-02-22
25,32,4813,2013-01-13
26,36,1671,2013-01-19

//omit partial data

4.2.2 Sales Data Set

Total 4 columns, corresponding to Product ID, sales quantity, sales unit price, sales date.

1,14,1236,2013-01-14
2,19,808,2013-03-06
3,26,886,2013-02-23
4,23,1793,2013-02-09
5,27,1206,2013-01-21
6,27,2648,2013-01-30
7,22,1502,2013-01-19
8,20,1050,2013-01-18
9,13,1778,2013-01-30
10,20,2718,2013-03-14
11,22,2175,2013-01-12
12,16,3284,2013-02-12
13,30,4152,2013-01-30
14,22,2770,2013-03-11
15,28,778,2013-02-23
16,22,874,2013-02-22
17,12,2718,2013-03-22
18,12,747,2013-02-23
19,27,172,2013-02-07
20,27,3282,2013-01-22
21,28,224,2013-02-05
22,26,2613,2013-03-30
23,27,1147,2013-01-31
24,16,141,2013-03-20
25,15,5343,2013-01-21
26,16,1887,2013-01-30
27,12,2535,2013-01-12
28,16,469,2013-01-07
29,29,2395,2013-03-30
30,17,1549,2013-01-30
31,25,4173,2013-03-17

//omit partial data

4.2.3 Other expense data sets

Altogether 2 columns, corresponding to the date of occurrence, the amount of the occurrence

2013-01-02,552
2013-01-03,1092
2013-01-04,1794
2013-01-05,435
2013-01-06,960
2013-01-07,1066
2013-01-08,1354
2013-01-09,880
2013-01-10,1992
2013-01-11,931
2013-01-12,1209
2013-01-13,1491
2013-01-14,804
2013-01-15,480
2013-01-16,1891
2013-01-17,156
2013-01-18,1439
2013-01-19,1018
2013-01-20,1506
2013-01-21,1216
2013-01-22,2045
2013-01-23,400
2013-01-24,1795
2013-01-25,1977
2013-01-26,1002
2013-01-27,226
2013-01-28,1239
2013-01-29,702
2013-01-30,1396

//omit partial data

4.3 programming

We want to write 5 files: Calculate the purchase amount, Purchase.java calculate the sales amount, Sell.java calculate the other expense amount, Other.java calculate the profit, Profit.java zookeeper the Dispatch, Zookeeperjob.java

4.3.1 Calculate purchase Amount

The purchase amount is a MapReduce statistical calculation based on Hadoop.

public class Purchase {public static final String HDFS = "hdfs://192.168.1.210:9000";

    public static final Pattern DELIMITER = pattern.compile ("[\ T,]");
        public static class Purchasemapper extends Mapper {private String month = "2013-01";
        Private text k = new text (month);
        Private Intwritable v = new intwritable ();

        private int money = 0; public void Map (longwritable key, Text values, context context) throws IOException, interruptedexception {Sys
            Tem.out.println (Values.tostring ());
            string[] tokens = Delimiter.split (values.tostring ()); if (Tokens[3].startswith (month)) {//January data Money = Integer.parseint (tokens[1]) * Integer.parseint (tokens[2
                ]);//Unit Price * Quantity V.set (money);
            Context.write (k, v); }}} public static class Purchasereducer extends Reducer {private intwritable v = new Intwritab
        Le (); private int money =0; 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.