The realization of historical data online based on Hadoop

Source: Internet
Author: User
Keywords Computer room data cable bank Everbright Bank

In the case of domestic banking without Hadoop technology, Everbright Bank's first application pilot project based on Hadoop technology--The historical Data query project was successfully put into production at the end of October 2013, which is an important milestone in the application of Hadoop technology in the banking system.

From Silicon Valley to Beijing, from Zhongguancun to Jinrongjie, the topic of big data is becoming more and more popular, and the exploration of large data technology is more and more extensive. China Everbright Bank, which is committed to creating the most innovative bank, closely follows the business and technology development trend, conducted in-depth research on large data technology and tried to apply Hadoop technology in large data fields to the construction of bank IT system. Everbright Bank's first application based on Hadoop technology pilot project-The historical Data query project was successfully put into production at the end of October 2013, this is the application of Hadoop technology in the banking system is an important milestone. This paper will analyze the historical data of Everbright Bank and the application of Hadoop technology.

First, the historical data is brought to the system demand

China Everbright Bank's customer history transaction data was originally stored in the Tape library and CD library, the query efficiency is low, the workload is large. Putting these historical data into the online service system and providing the real-time and efficient inquiry service is not only an urgent requirement to enhance customer service, but also an important basis for activating data assets and exerting data value.

From the business requirements point of view, this system needs to provide the historical transaction data import and the inquiry service, has the function characteristic which the data writes repeatedly reads, from the system function, this system needs to be able to store the more than 10 year historical transaction detail data of Everbright Bank, and relies on the big data of the historical accumulation, Support the high performance query of large time span, realize the target of offline data line; From the point of view of system operation, the system needs to have the ability of sustainable development, to meet the continuous precipitation of future incremental transaction data and the expansion of data scale extension, and has better scalability. As can be seen, although the business functions of this system are not complex, but the technical requirements of the system is very high.

Second, the choice of Hadoop technology

Everbright Bank in the project technology selection period, first considered the possibility of using traditional technology. Using the traditional technology, the advantage is that there are plenty of molding cases, relatively small risk. However, the traditional storage technology for hardware equipment requirements high, expensive. At the same time, the response efficiency decreases with the increase of the amount of stored data and query data;

Limited to the problems brought by traditional technology, in the era of rapid technological development, whether there is more suitable for this project characteristics of new technologies? The answer is yes.

Hadoop technology, in the era of large data, has been widely used in the Internet industry, relying on distributed architecture to achieve large data storage and large data operations, with low hardware costs, high-performance, high availability, high scalability characteristics, especially for the data write multiple times read. These characteristics are the historical data online requirements. However, the technology has not yet formed a case in domestic banks, whether it is suitable for the banking operation system, whether it can meet the stability and security requirements of the banking system remains to be inspected.

To "build the most innovative capacity of the Bank" as the vision of Everbright Bank, continued to maintain the new business, new technology development trend of tracking, Everbright technology for the Internet in the field of application of a wide range of Hadoop technology has been fully studied. In historical data query projects, Hadoop solutions stand out in various scenarios with low-cost, scalable, and highly reliable features.

Application of Hadoop technology in project

Hadoop, composed of members of HDFs, MapReduce, Hbase, Pig, Hive and zookeeper, has unique features and distinctive features. According to the characteristics of the historical data, the main choice of this project is to use HDFs, MapReduce, HBase and zookeeper members.

The following describes the features of applying Hadoop technology.

1. Single cluster vs dual cluster. In order to meet the requirements of system disaster preparedness, it is necessary to deploy the cluster at the same time in the production room and disaster preparedness room, and when the production room fails, the cluster of disaster preparedness room can provide service support independently.

Hadoop technology is designed to rely on cluster power to work efficiently, based on the advantages of cluster architecture, can theoretically deploy the same large cluster in multiple room. Deployment of a single cluster in two computer rooms although the concept of Hadoop technology can be fully utilized, but also encounter some problems: first, the need to ensure the integrity of a single room data, so that the data copies distributed to different computer rooms, to ensure that the unilateral room can be independent of external service to provide support. Secondly, it is difficult to ensure a continuous and stable service supply when there is a non systematic emergency, which leads to the confusion of management in the cluster. In addition, the data transmission between the Datenode is normal and will occupy the network bandwidth resource of the room. To solve the above problems, the Hadoop architecture needs to be carefully reconstructed and a large-scale transformation.

Comprehensive analysis, combined with project background, to improve the original ideas, the use of dual-cluster solutions. A cluster is deployed in two computer rooms, and the service and data are independent. In a single room, when the single system is abnormal, it is fast to switch to ensure data security and business continuity; The data of double cluster is independent transmission, integration and loading, and the maximum power is to save the network resources of the machine room. Dual cluster-specific dual-live design, that is, two clusters provide query services at the same time,

2. High availability. Prevent a single point of failure, to ensure the operation of the system, the use of double namenode in the HDFs, in the hbase of dual-hmaster settings, and by the zookeeper management, the failure of automatic switching.

Using Hadoop technology to ensure that there is no single point of failure. When a node of a data node fails, the data is still intact, services can be provided, and data can be replicated automatically to ensure the number of copies set.

3. Scalability. The system uses Hadoop architecture to achieve dynamic expansion, system expansion, the platform to add new nodes, automatically balance the data between all nodes. Background according to the free and busy degree of automatic initiation, occupy a small amount of system resources, without human intervention, to achieve a balanced distribution of data (as shown in the picture).

Iv. Innovation of bank data support platform

Through the implementation of historical data Inquiry project, Everbright Bank has completed the historical data line, improved the efficiency of business management and improved customer service quality. This project enables the bank to study the Hadoop technology to put into practice, makes the Hadoop technology and the bank operation system Deep Union, is to the bank data support platform brave innovation. This technology is low-cost, high availability, easy to expand the three characteristics, effectively solve the massive data storage problems, breakthrough the bottleneck of computing capacity, substantial savings in investment, so as to optimize investment efficiency, improve input-output ratio. In addition, this project has carried on the beneficial exploration to the application of Hadoop technology in the fields of large data storage, large data query and large data operation, and accumulated valuable experience for the bank to meet the "big Data" era.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.