ebay Open source New database technology Kylin, support TB to PB-level data volume
Source: Internet
Author: User
KeywordsBig data open source Hadoop
"Editor's note" ebay opens up a database technology called Kylin, and ebay shared many of the details of Kylin on a Wednesday blog, providing SQL interfaces and OLAP interfaces based on Hadoop, supporting terabytes to petabytes of data, Kylin is designed to reduce the query latency of Hadoop at more than 1 billion rows of data levels. All this shows that ebay has made good progress in using Hadoop technology.
The following is the translation:
Online auction site ebay opens up a database technology called Kylin, which claims the technology can support fast queries on Hadoop for PB-level data storage. ebay is not a big data company like Google and Facebook, but its use of technology such as Hadoop has reached a fairly large scale, and Kylin is a good example, suggesting that its innovation in the field is ahead.
ebay shared the details of Kylin in a Wednesday blog, including rest APIs, ansi-sql compatibility, connection analysis tools tableau and Excel, and latency below the second level on some queries. However, the most unique feature of Kylin is how it handles scale. ebay says it can query billions of of rows of data-faster than the traditional Apache hive Tools on a 14TB dataset.
Kylin works at a very high level, it takes data from Hive, uses MapReduce to preprocess large queries, and then stores these results as key values cuboids on HBase. When a user runs a Kylin query with a specific set of variable values, the results are ready and do not need to be processed again, which is quite different from an analytic database that has been in use for many years.
Here is how ebay shares Kylin usage within the company:
Open Kylin, we already have some ebay business units used in production. Our biggest use case is the +TB cube generated by 120+ billion source records. Its 90% query latency is less than 5 seconds. Now, our use cases target analysts and business users who can easily analyze and get results through tableau-no more hive queries, shell commands, and so on.
It would be interesting to know who will win in the Kylin with the next version of Hive, Spark SQL, and other options for Hadoop SQL analysis, Kylin as part of the Yarn Explorer, available on the latest version of Apache Hadoop. I guess it's a bit slower, but it's more scalable than the memory options or those that don't require mapreduce processing, but it might be a reliable option for those who still run earlier versions of the software.
Original link: EBay Open sources a big, fast Sql-on-hadoop database (compiled/Wei revisers/Zhonghao)
Free Subscription "CSDN cloud Computing (left) and csdn large data (right)" micro-letter public number, real-time grasp of first-hand cloud news, to understand the latest big data progress!
CSDN publishes related cloud computing information, such as virtualization, Docker, OpenStack, Cloudstack, and data centers, sharing Hadoop, Spark, Nosql/newsql, HBase, Impala, memory calculations, stream computing, Machine learning and intelligent algorithms and other related large data views, providing cloud computing and large data technology, platform, practice and industry information services.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.