Cassandra Joint spark Big Data analysis will usher in what changes?
Source: Internet
Author: User
Keywordsnbsp welcome Join hands live
2014http://www.aliyun.com/zixun/aggregation/13383.html ">spark Summit held in San Francisco, the database platform provider DataStax announced, In collaboration with Spark supplier Databricks, in its flagship product DataStax Enterprise 4.5 (DSE), the Cassandra NoSQL database is combined with the Apache Spark Open Source Engine, Provide users with real-time analysis based on memory processing.
Databricks is a company founded by the founder of the Apache Spark. Speaking of this cooperation, DataStax Vice President John Glendenning said: "The integration of Spark and Cassandra, this is the database industry for the first time cooperation." ”
Cassandra is a distributed, highly scalable database that allows users to create online applications that process large amounts of data in real time.
The Apache Spark is a processing engine that is applied to the Hadoop cluster, which can be accelerated 100 times times for Hadoop under memory conditions and 10 times times faster when running on disk. Spark also provides functions such as SQL, stream data processing, machine learning, and graph computing.
The combination of Cassandra and Spark makes it easier to implement end-to-end analysis workflows. In addition, the transaction database analysis performance can be greatly improved, the enterprise can respond to customer demand more quickly.
The combination of Cassandra and Spark is the gospel for companies that need to provide customers with real-time referrals and personalized online experiences.
Cassandra/spark application precedent for video analytics companies
The use of Cassandra+spark architecture has precedent, and Ooyala is one of them. Ooyala is a video analytics provider. Ooyala handles 2 billion video events per day, with about 28TB of data to be processed on approximately 220 nodes. But Ooyala's technical team leader, Harry Robertson, can confidently say: "We're not just telling our customers that your video was played 100 times a few days, we'll give you more details, like 80 times from Beijing and 20 from yahoo.com." "And it is the Cassandra cluster that underpins it all."
However, the ability to handle only large data is not enough, and Ooyala needs to turn the "mountains" of primitive events into small, operable events. The company has previously considered Hadoop, but Hadoop is more scalable and less real-time. The real-time flow processing framework such as Storm is considered, but it has the advantage of dealing with the fixed flow, and the elasticity query ability is poor. Finally, Ooyala chose the memory distributed computing framework Spark.
Now Ooyala is running the Spark/cassandra architecture.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.