Data is the most important asset of an enterprise. The mining of data value has always been the source of innovation of enterprise application, technology, architecture and service. After ten years of technical development, the core data processing of the enterprise is divided into two modules: the relational database (RDBMS), mainly used to solve the transaction transaction problem; Based on analytical Data Warehouse, mainly solves the problem of data integration analysis, and when it is necessary to analyze several TB or more than 10 TB data, Most enterprises use MPP database architecture. This is appropriate in the traditional field of application. But in recent years, with the rapid development of the Internet, in particular, the development of mobile internet, things networking, enterprise data more than ever before the production of faster, in the face of dozens of TB, hundreds of TB, or even PB-level data analysis, the traditional architecture has almost "exhausted", difficult to parry. Hadoop is the focus of the enterprise market, and is gradually considered the best and only choice in the new situation.
In the Enterprise data center, in order to deal with the data of different levels, the calculation requirements of different performance, the need of different computing modes, we have to adopt hybrid architecture, namely: relational database + Memory database +MPP database +hadoop platform.
However, customers using such a hybrid architecture often face the following problems:
1. Large amounts of data require frequent migration from one platform to another, or even to several other platforms, with a huge network overhead.
2. MPP architecture in the upgrade or expansion, external services will be affected.
3. Hybrid architecture of the online, late operation, upgrade costs remain high.
4. Multiple sets of platforms require more than a set of operational units, master multiple sets of platform skills.
5. Each platform's fault tolerance, backup, disaster preparedness and other programs and implementation, need to consider separately.
Faced with many of the above issues, the experience of Hadoop customers has been bold thinking, and gradually implemented an innovative architecture: let Hadoop unified data analysis Platform hybrid architecture. The idea was considered unrealistic only a year ago. It was generally accepted that Hadoop had an advantage in handling hundreds of terabytes or petabytes of data, but it was cumbersome to let MapReduce handle gigabytes or terabytes of data on Hadoop.
Star Ring Information Technology (Shanghai) Co., Ltd. (hereinafter referred to as Star-ring technology) in the field of large data sensitive judgment, strong executive power and research and development capabilities, outside the hadoop/mapreduce, through the introduction of spark, fully solve the original Hadoop defects. Star Ring Technology released a one-stop large data platform--Transwarp data Hub (hereinafter referred to as TDH), not only to provide a large number of strong data analysis capabilities, in the analysis of small and medium-sized data, performance is superior to MPP architecture, even comparable to the professional Memory database platform. TDH thus enables the enterprise to unify the mixed structure, forms the complete data analysis platform.
The memory analysis engine in the TDH product family Inceptor Spark as the core computing engine, making up for the disadvantages of using the MapReduce computing engine. Spark memory technology is significantly ahead of MapReduce by portraying tasks as DAG, abstracting distributed data into resilient distributed Datasets (RDD), intermediate results in memory, and reducing the shuffle process of disk IO. Star ring technology through the accumulation of several successful cases and enrich the experience of the summary, has been inceptor to create a mature, stable, high-performance analysis platform to solve the open source spark instability, such as: Run 24 hours automatically die, run SQL fast and slow, sometimes slower than the MapReduce, Large memory calculations often do not respond to problems.
Inceptor for the characteristics of the complex analysis of Data Warehouse, the spark has been improved, innovated and optimized. The separate development of the column-type mixed storage layer Holodesk enables data to be loaded in memory and SSD to meet larger memory cache requirements, greatly extending the memory limitations of TB-level data analysis. Inceptor has done more performance optimization on memory computing, for example, a cost-based optimizer is completed to automatically select the optimal execution plan, and the data-scanning IO is reduced by data partitioning, bucket, query filtering, condition decentralization, etc. Greatly improve the query speed and so on. Inceptor has a 2-10-fold performance boost compared to MPP databases when querying multiple 1 billion-record large tables.
Inceptor Another enables it to enter the field of data warehousing, unifying the enterprise data analysis platform and providing complete support for ANSI SQL1999 standards. Inceptor supports commonly used data types, various table join queries, seed queries, operators, window aggregation functions, and even DML operations for individual data. Through the implementation of SQL, running in the existing memory database, MPP database platform, such as statistical analysis, business analysis and other comprehensive enterprise reports, almost without any modification, you can smoothly migrate to the TDH platform. TDH has successfully run more than 300 reports in an operator's system and has made little changes. TDH successfully ran 210,000 lines of SQL code in a grid company and only more than 10 code modifications. In other cases, the SQL that runs on MPP database platforms such as Teradata is also migrated directly to the TDH platform and performance is greatly improved.
Inceptor solved the problem of MapReduce being criticized. In dealing with complex data analysis, ad hoc query, self-service analysis, iterative data analysis and machine learning, it is not possible to provide fast response time problems, This makes it possible for a large number of front-line business personnel to make interactive data analysis and exploration flexibly through large data visualization tools. Inceptor integrates R-language statistical analysis, data mining and machine learning algorithms, and data analysts can quickly analyze the TB and even PB-level data of TDH platform by parallel R language algorithm. Such a powerful data analysis capability, has far exceeded the existing MPP platform to implement the relevant means.
Thus, for the industry's relatively high voice, Hadoop and MPP database integration scheme, has not been very strong necessity. Through the TDH platform, you can completely replace the MPP platform: first, the TDH platform for the ANSI SQL1999 standard of the complete support, has been able to meet the enterprise for large-scale data warehouse complex data analysis needs; second, integrated spark memory technology, in the order of magnitude supported by the Data Warehouse (GB >TB->PB), can provide better performance than the traditional MPP platform, third, the TDH platform provides more scalability than the MPP platform, stronger computing and analysis of structured, semi-structured, unstructured capabilities. Finally, the TDH platform provides customers with a unified data platform, data calculation and analysis of different data levels can be quickly satisfied. The TDH platform provides a unified data tolerance, backup and disaster preparedness, and provides more convenient measures and security for enterprises.
For more information, please visit http://www.transwarp.io/
(Responsible editor: Mengyishan)