What are the differences and advantages of Kylin compared to Spark SQL?

Source: Internet
Author: User

Sparksql is essentially a DAG model-based MPP. And the Kylin core is cube (Multidimensional cube). For the difference between MPP and Cube preprocessing, repeat as follows:

The basic idea of > MPP [1] is to increase the number of machines for parallel computing, thus increasing query speed. For example, scanning 800 million records a machine to be processed for 1 hours, but if it is handled in parallel with 100 machines, it will take less than a minute. With Columnstore and some indexes, queries can be returned more quickly. Note that there is no reduction in the amount of online computing, 800 million records are to be scanned once, only the number of participating machines, so fast.

> MOLAP cube [2][3] is a pre-computing technology, the basic idea is to pre-dimensional data indexing, query only scan index without accessing the original data to speed up. 800 million records of a 3-dimensional index may have only tens of thousands of records, the scale is greatly reduced, so the online calculation is greatly reduced, the query can be very fast. Index tables can also be used in the form of column storage, parallel scanning and other MPP commonly used techniques. But multi-dimensional index to the various groups of multi-dimensional cooperation is expected, the offline index requires a large amount of computation and time, the final index will also occupy more disk space.

In addition to having no preprocessing differences, Sparksql and Kylin have different preferences for dataset size. If the data can be basically put into memory, Spark's memory cache will give Sparksql a good performance. However, for ultra-large datasets, spark cannot avoid frequent disk reads and writes, and performance can drop dramatically. In turn, Kylin's cube preprocessing significantly reduces online data size, and is more advantageous for ultra-large data.

http://wenda.chinahadoop.cn/question/867

What are the differences and advantages of Kylin compared to Spark SQL?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.