Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞
In recent days, database start-ups citus data to implement fast SQL queries on Hadoop, which is not a big deal, because for them, the bigger goal is in the back. Citus data has gone beyond postgres to extend its high-speed, analytical database Citusdb to Hadoop, and then it should be expanding to MongoDB and other database products you already think of. Gigaom's journalist, Derrick Harris, argues that citus data is the only analytical database that everyone needs, not only to query the data, but also, in whatever storage environment-relational databases, Hadoop, MongoDB, Amazon S3 or other places. It all occupies a place.
Large data has opened the vision of enterprise data analysis and selective data storage. The combination of these two items often means learning new languages, using multiple tools, and possibly sacrificing some performance on the analysis platform.
The Citus Data Company's flagship product is CITUSDB, which is usually built on PostgreSQL, and the first generation is designed to design databases such as Google Dremel, the size and speed of the relational type. Because one of these features is "foreign data wrappers", it can run SQL on a variety of data types (such as CSV, log, and JSON files, which do not match on the native Postgres). So when citusdb in addition to Postgres, the official support of the Hadoop file Distributed System (HDFS), which means that it is not limited to these.
Matt Ocko, the head of data and one of the early investors in Citus data, believes that the database should technically support any ODBC-driven data source and even directly query log files directly from stored data. In fact, Citus is working on support for MongoDB-a capability that is now on the beta. Ocko emphasizes the ability of the CITUSDB to connect to a variety of data sources without requiring the user to make independent inquiries and then manually connect the data. He cited an example of using CITUSDB to make join queries across Postgres and Hadoop.
Another point is that citusdb not only has good flexibility but also very quickly. Ocko says Citusdb has gone beyond the Oracle Exadata Machine's proud TPC benchmark (data stored directly on the hard disk). The Postgres-hadoop query that was mentioned above in the Amazon EC2 Cloud was completed in just a few seconds.
Citus's co-founder, Umur Cubukcu, told Derrick,citusdb that it was so fast because of its architecture: instead of transferring data across the network, it focused on the computing of data locations and had a strong load-balancing capability between resources. For example, you need a very slow node to save resources to complete a task, instead of blindly waiting for it will go to other nodes to seek the same resources.
In the case of Hadoop, MapReduce takes the computation to the data, but each job needs to scan the entire dataset. This is why the SQL query tools on the earlier Hadoop hive are still slow. Carl Steinbach, a citus software engineer who worked at Cloudera, says Citusdb is 3 to 20 times times faster than the Hive query data type. In a typical interactive environment, the actual speed of a short query can be faster. But he also points out that these are not hive real design goals.
However, Citusdb's real competitor is Sql-on-hadoop's project, with many start-ups. Next month "Structure:data" has a series of topics to start around, when Aster Data,platfora,cloudera (Impala), Apache Drill,drawn to Scale and HADAPT will show their style.
These are impressive technologies (at least theoretically, they are still in the development phase), and citus may inadvertently ignore them. But in addition to querying multiple data sources, Citus still has its own unique product that other companies do not have. "When you're talking about an enterprise-class database," says Steinbach, "the talk is more than just a query execution engine." ”
Inventory Database 2012: Big Data Market scramble for 10 most useful cloud databases Aliyun products--relational database RDS high Load low latency: We're using hadoop+aws+nosqlnosql, not just big data, but also application architecture changes. Executive Editor: Xiao Yun TEL: (010) 68476606 "
This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or
reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or
complaint, to email@example.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
and provide relevant evidence. A staff member will contact you within 5 working days.