Having just announced the easing of computing and memory limitations, Teradata promptly announced the acquisition of R-language analytics capabilities from Aster and announced the acquisition of Revelytix and Hadapt by Teradata Labs. The former focuses primarily on data management on Hadoop, while Hadapt is a focused SQL-on-Hadoop company. Clearly, Teradata is accelerating its efforts to build a unified data architecture.
Build a unified data architecture
In fact, structured data and unstructured data are generating new value through traditional SQL analytics and new analytics algorithms (time series, paths, diagrams, and text). For the highest efficiency, optimal storage, analytics and application costs, the big data technology chain is stratifying. In his analysis of Teradata's Big Data Business Unit in Greater China, Kong Yuhua, the unified data architecture can be divided into three layers: Teradata integrated data warehouse, Aster-based exploration and analysis platform, and Hadoop-based data platform.
Kong Yuhua, Director, Big Data Business Unit, Teradata Greater China
Hadoop is naturally for fast data loading and retrieval, data filtering and preprocessing, and online archiving. Aster is data discovery, fast hypothesis checking and trial and error, pattern monitoring, path mapping, and time series analysis. Teradata Data Warehousing is Realize strategic intelligence, predictive analytics and operational intelligence.
This is also the industry's more recognized technology stratification model. Through the integration of the three platforms, to meet more industry in-depth data analysis needs.
In the medical industry, for example, to analyze the patient's hospitalization, you need to review the inpatient data on the data platform and identify a patient from hospital to discharge using the time series path function in Aster, the aggregate function, and the Sigma value function All medical procedures and physicians who provide them with medical services to generate visual analytic maps of time, geography, crossover, medical effects and more in Teradata. And by analyzing the associated pneumonia patients in a U.S. hospital and guiding business improvement, "reducing hospital stays by 10% and saving $ 50 million." Kong Yuhua said.
The same can be shared case operators, banks, retail, e-commerce, high-tech manufacturing. Back in technology, consolidating data warehouses is more mature in terms of sharing relevance, consistency and integration of data, rapid deployment of new applications, and formation of business views. Correspondingly, the most technical challenge is the data platform and exploration platform.
Aster advantages based on Hadoop
Both of them overlap in the development of technology and also have different emphases. For Teradata, it's about how to make the most of Hadoop, and mining and analyzing it with Aster.
To Kong Yuhua, Aster and Hadoop are also MPP architectures, but there are more differences in storage, computing engines and interfaces, which determine the difference between the two (see figure).
The difference between Aster and Hadoop (click to enlarge)
Innovations based on Hadoop are not uncommon. But not so much in the enterprise market has so many engines. Aster's strength lies in this. Taking Aster SQL-Graph engine for example, the advantages of Aster SQL-Graph over Hadoop Giraph or Google related products are:
Figure Parallel architecture BSP framework for general purpose No memory binding, high scalability Easy to develop APIs for use
Apex Programming API
SDK and IDE for building user-defined graph functions
Predefined graph functions Out-of-the-box functions for graph parallel execution and integration with existing platforms Ability to work with Aster relational storage, file storage, data from external data sources Integration with other analysis engines (SQL, SQL-MR)
Other business services breakthrough open-source R language restrictions
More than that, Aster's R support has entered the enterprise standard. This is in line with the trend. Rexer Analytics Consulting survey shows that 70% of respondents said they are using the R language. Data show that from 2010 onwards to 2013, the use of R's population has dramatically increased.
But R also has to face the challenge. For example, R is scattered across nodes or servers, and each node or each server runs independently. Although it facilitates the independent analysis and processing of rows, such as model scoring, it does not help to analyze all the data required by the functions, such as model building. To break through the limitations of the open source R language, integrating Aster and R to achieve enterprise-wide analytics needs more technical optimization:
Run Async MPP Architecture with Open Source R Language for Efficient Parallel Analytics Ease Memory and Data Processing Constraints for Massively Concurrent Exploit Aspire Portfolio Portfolio Enhances R Language Analytics By combining more than 100 Aster Discovery Portfolio analytics with over 5,000 R tools package
Kong Yuhua said: "Teradata Aster R in the form of software packages to achieve large-scale concurrency of open source R language, which is more advantageous for data analysts."
Read data from Hadoop and intelligently leverage the power of multiple heterogeneous processing engines in Teradata data warehouses or Teradata Aster databases to generate visual reports that drive business insight and innovation. This technology architecture is already extremely fluid and for Teradata, the more important challenge is how to get involved in more industries as quickly as possible, driving data analytics changes.