Tencent distributed http://www.aliyun.com/zixun/aggregation/8302.html "> Data Warehouse (Tencent distributed, Warehouse, hereinafter referred to as TDW) Tencent Engineering and Technology Enterprise Group data platform is based on open source software research and development of a large processing platform, based on Hadoop, Hive, PostgreSQL on the research and development, and on the basis of open source software to do a lot of customization and optimization. At present, TDW is the largest distributed system within Tencent, which focuses on the data of various products of Tencent, and provides a great deal of data storage and Analysis Services for various products of Tencent, including data mining, product report, business analysis and other services.
TDW as Tencent's first batch of Open-source software, the code has been hosted to the C++SDN code platform.
Two, TDW and PostgreSQL mutual access function: Make TDW function to a higher level
As an off-line data analysis system, TDW has a good performance advantage in processing massive data by parallel computing. But we know, want to use a all-inclusive system to solve all problems are generally unrealistic, the same, TDW also has its disadvantages, such as small data processing performance, Update/delete performance, interface is not rich.
Therefore, we introduce a powerful open source database PostgreSQL, and do some function extension to make it have the ability to access TDW data, and we have developed a new storage engine in TDW, we call it Pgdata storage engine, Enables TDW to read and write data in PostgreSQL.
The realization of TDW and PostgreSQL access functions is a powerful complement to TDW, which is embodied in the following 3 points:
1. Make up the short board with not rich TDW interface
TDW lacks standardized jdbc/odbc, programming interfaces are not rich, and PostgreSQL has a powerful community that provides interfaces to various languages such as JDBC/ODBC, Shell, C + +, C #, Python, and Perl, The user accesses the data in TDW through these rich interfaces and the PostgreSQL TDW bridging tool Tdwlink we developed.
2. Make up for TDW small data analysis efficiency bottom of the short board
TDW in mass data processing, it can play the advantages of parallel execution. But for small data analysis, its performance is inferior to the traditional db. With PostgreSQL, for data analysis within 10GB, better performance and time response can be obtained, typically returning results at the second level, and TPG has an advantage in this scenario compared to the TDW-minute response.
3. As a TDW pgdata storage engine, make up for tdw update/delete efficiency of the short board
TDW as a data warehousing system, for record-level update/delete support is not very good. Record-level update/delete in TDW can result in overrides of the entire table, which means that even a delete of a piece of data can cause the entire table to be rewritten, consuming a lot of system resources. TPG as a traditional database, record-level update and delete efficiency is very high.
The position of the PostgreSQL system in the TDW biosphere is shown in the following diagram, and TPG is one of our names for the expanded PostgreSQL:
Below we are divided into two parts of the TDW and PostgreSQL of the exchange of the function to do an introduction, that is, Pgdata storage engine and Tdwlink functions.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.