Wang Zhenping: Architecture and challenges based on Hadoop log trading platform
Source: Internet
Author: User
KeywordsTrading platform trading day background
Shanghai Bao Xin Senior engineer Wang Zhenping from the financial industry, from the background, needs and objectives, problems, system architecture and other knowledge of Hadoop 5 aspects of the log trading platform based on Hadoop to share deeply:
background
Use scene: credit card consumption delay, transaction failure and failure of the reasons and types, not standardized trading institutions and merchants to find and produce reasons.
Data features: Nearly 300 million transactions per day on the amount of data; In the data state, only the fitted transaction is currently stored and is not available to the original transaction log.
Requirements and Objectives: transaction log of the second level query, transaction failure analysis, the analysis of irregular transactions, user self-help analysis, and other data, to identify the reasons for the failure of transactions and analysis reports, reports.
challenges: How to get the log has minimal impact on the production system, how to quickly translate the 300 million + transaction log daily into the Hadoop cluster, how to manage a large number of jobs, and how to implement a second level query.
Building and architecture of
system
system is a process to find and solve problems, based on needs and background, to solve the problem, Wang Zhenping share his valuable experience:
1. Minimize the impact of data collection: Overall, it is simply based on the business to choose the right time and manner, the reality here is: Every morning 1:00~5:00, because the data stored in a binary way in the local file, and involved in multiple machines, but also in order to be able to quickly obtain data, Using the client and the same business data source one by one corresponding relationship, each client can be configured to obtain different business system data at the same time.
2. Quickly translates and stores 300 million + transaction logs into Hadoop cluster
here Wang Zhenping abandoned the MapReduce, chose the independent research and development mainly because: HDFs to the file to cut distribution, and the file is 2 in the form of storage. Based on the factors such as file cutting, demarcation between packets, incomplete messages, and the availability of the log in the parsing process is not controllable, but also due to the complexity of the log resolution specification.
3. Management of a large number of operations
The image above is the job management structure within the company, mainly involves 4 components: job choreographer, mainly responsible for scheduling operations, job manager, mainly responsible for job scheduler, job Status Manager, to audit and identify the problem; job triggers, triggering jobs, triggering dependency jobs, or other jobs.
Second level query: Wang Zhenping through HBase storage, two-level index, parallelregionquery, support data interval query, for the HBase Access API encapsulation, improve development efficiency and the optimization of the cluster to achieve a wonderful level of query.
finally Wang Zhenping also shared the Shanghai Bao Xin's cluster status, Hadoop related knowledge and the use of Hadoop and learning related experience, in the use of experience he believes that the initial stage to do a good job of scale, network, server hardware configuration of the environment and other planning, while using the process to pay attention to cluster monitoring, The collection and analysis of the running log and the common tuning of the operating system, in which the emergency process is an indispensable link. In terms of learning, he believes that it is necessary to read the source code and understand the operating principle of the system, but it does not need to be modified early.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.