Laxcus Big Data Management System 2.0 (14)-PostScript

Source: Internet
Author: User

PostScript

Laxcus originally originated from a failed search engine project, the project ended, but some of the technology in the project, including the FIXP protocol, Diffuse/converge algorithm, and many new data processing concepts have been preserved, which became the basis for the development of Laxcus later. After a number of large-scale data processing projects, because the times and industry changes, using relational database to do the underlying access has become more and more unable to meet the expanding business needs, so want to use to support the massive data processing software, and then on the actual application to further integration. After the completion of the project, in the promotion and use encountered a lot of obstacles. In addition to some of the problems in the product itself, more difficulties come from the user itself, when the user is already familiar with the relational database, accustomed to the SQL data expression, and then let them adapt to a new data products and new processing methods, in fact, it is very difficult to do one thing. At the same time, the user's general idea is to spend less money and more work, hoping that the hardware infrastructure will not change, no increase or less increase the cost of the case, get more and more strong data processing capacity. These conditions eventually contributed to the development of Laxcus, and were incorporated into the design of the beginning. In the subsequent development process, it gradually incorporates a number of new technologies and design concepts, such as multi-domain cluster parallelism, load adaptive, mixed data storage, distribution Description language, distributed task components, transaction management, various fault-tolerant processing, security management. In the past few years, several versions have been launched, and gradually evolved to become today a more complete and universal Big Data management system.

Laxcus is aimed at the current large-scale data processing, and focuses on the future of ultra-large-scale data processing environment. To achieve ease of use, one of the most important requirements in design is the simplification of data handling. This includes lower-cost hardware, fast deployment, easy maintenance, and simple development and operation. Enable users to complete the big data processing in a relaxed mood, in the use of the embodiment, feel closer to the database, rather than what new data products. To reduce the learning pressure, improve the efficiency of use. In addition, there is a very important element is that the real world is the existence of "relationship", the essence of the data is the "things" and "relationship" of the correlation reflects, from the perspective of "relationship" to understand, organize, processing data, more in line with people's thinking habits and stereotypes.

So, unlike many of today's big data products, Laxcus began by focusing on the next generation of massive data processing, requiring a full-system integration of large-scale functions in a single product, providing ultra-large storage and computing capabilities, lightweight management and ease of operation, All this has prompted itself to have many of its own characteristics.

For example, Laxcus uses the real-time image system to manage meta-information, and the dynamic real-time image of meta-information to realize the data interaction among cluster nodes. Meta-information is generated in the system, passed between networks, resides in memory, is not written to disk, and is refreshed when it is uncertain, always guaranteed to be in the latest state. And because of its small amount of data, it will not affect the operating environment during operation, so it can do real-time data tracking and processing.

Diffuse/converge Network Computing algorithm occupies an important position in the Laxcus system, which is the key to realize large-scale parallel computing in distributed environment. Abstract and modular processing has been implemented, the user only need to invoke API interface, you can easily get distributed, large data processing capacity. While reducing the developer's work, it also reduces the chance of errors in the run. Unless you are interested in the mechanism of the algorithm itself, you can go directly to the source code.

The problem of the average distribution of the data in the distribution calculation process is also solved properly, and the effect is the basic consistency of processing time after the average distribution of the data. Allowing each user to quickly get out of the computing environment and leave computing resources to follow-up services is critical to ensuring efficient cluster processing. In addition, the use of "pull", rather than "push", is a very important guideline to ensure data balance.

At present, based on the Diffuse/converge algorithm interface, a variety of distributed computing work has been provided, including nested retrieval (SUB SELECT) and connection (join) services.

In the Laxcus system, the concept of indexing is preserved and given a new meaning. Some of them are integrated into the metadata, which realizes the fast data locating in the cluster environment, and the other part is used in the data storage model.

Based on the consideration of this important indicator of "relationship", Laxcus also adopted the row/column two storage models. Row storage is essentially a scheme that extends the relational database. Column storage has been greatly improved, in fact, the index is eliminated in the data retrieval of the intermediate link, to reduce the amount of data and improve the efficiency of the search. In the calculation of data, row/column storage in accordance with the requirements of the instructions at the storage level of multiple logical relations of compound retrieval processing, the data can be classified as a unit of free division of the combination, to minimize the output of redundant data. In addition, laxcus through multi-cluster cooperative parallel work to improve the number of storage calculation, data format all adopt binary system to improve computational efficiency, continue the organization of the database structure, real-time full-network data processing, these are very important in practical applications.

Laxcus Big Data Management System 2.0 (14)-PostScript

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.