ebay, the online auction site, has open source for its database technology, called Kylin, which says it can quickly query petabytes of data stored in Hadoop. ebay is not a big data user compared to companies like Google and Facebook, but it does also run technology solutions such as Hadoop on a sizeable scale, and the Kylin project appears to be a great example of how it is based on technology innovation.
In a post in 20th last month, EBay unveiled specific information about the Kylin project, the most compelling of its many features, its rest API, ansi-sql compatibility, docking capabilities with analytics tools such as Tableau and Excel, and sub-second query capabilities. However, the most unique ability of the Kylin project is to count its strong performance at the scale level. According to ebay, Kylin is able to query tens of billions of rows of data--equivalent to the size of a dataset larger than 14TB--and outperform traditional Apache hive tools.
In general, Kylin's operation mechanism is to obtain data from hive, use MapReduce to preprocess large-scale query operations, and finally save the processing results in the form of a key-value "cuboid" in HBase. When a user runs an Kylin query with a specific set of variables, the values corresponding to those variables can be delivered directly to the user without duplication of processing. Although there is no essential difference from the cube used by the industry to analyze the database for many years, the cuboid of Kylin takes into account the data structure tendency of hbase in design thinking.
Let's look at how ebay describes the actual performance of Kylin in its internal business system:
While Kylin is contributing to the open source community, we have applied it to production practices in many of ebay's business units. One of the largest use cases is to analyze more than 14TB of cube data generated by 120多亿条 source Records. 90% of the query request is within 5 seconds. Now we have more use cases for analysts and business users who are able to access and easily get the results of the analysis through the Tableau dashboard-without the need for complex mechanisms such as hive queries or shell commands.
We look forward to seeing how Kylin will collaborate with the next generation of Hive, Spark SQL, and other SQL analytics projects in the Hadoop environment, because Apache The first Yarn Explorer scenario in the latest version of Hadoop will inevitably lead to a wave of upgrades related to the matching project. According to my personal guess, kylin should be slightly slower in terms of speed performance than in-memory options or other schemes that do not require a mapreduce-handling mechanism, but are more resilient to capacity. For this reason, it is a stable and reliable solution for Hadoop users who are still running an early version of the software-a relatively high percentage of users--kylin.
- Related articles recommended:
- How to properly import an open source library with Eclipse Androidstaggeredgrid
- Entboost v1.4 release, open source IM, add Admin Center
- 11 ways to help you become an open source programming expert
- This article from: Hobby Linux Technology Network
- This article link: http://www.ahlinux.com/open/9337.html
Ebay has open source its large, high-speed Sql-on-hadoop database