And look at Cassandra's NoSQL database.

Source: Internet
Author: User
Tags cassandra hadoop ecosystem

Cassandra may not be interested in being a nosql, but it's fascinating to do certain jobs, as Netflix and Instagram two companies must know.

Over the years, NoSQL participants, such as MongoDB, have gotten a lot of attention, but the halo of Apache Cassandra has faded, and Facebook, which created Cassandra, has given up on it, The community of Cassandra seems to be out of date soon. But Cassandra's fortunes are on the verge of a turnaround, and Netflix has recently decided to change Oracle in its own data center to Cassandra Amazon Cloud. Soon after, Facebook started using Cassandra again, and Twitter and WebEx were using Cassandra in varying degrees. One of the big problems with Cassandra is that it's too personal to classify it, though like HBase, it's a column-family database, but it has a lot of independence. It is also different from the document NoSQL database, such as MongoDB or couchbase, or Key-value-pair database Dynamodb,redis,riak, and so on, and so on, it is a category. What is a cloumn-family database? A column-family database It stores data by line keyword, which is very similar to a table that is easily confused with tables of familiar relational databases, especially since Google's famous column-family app is called BigTable. It is easier to confuse the storage structure of column with the table under the relational database, but it is important to note that this table is not a table. First of all, they basically lack an architecture schema, the architecture can make it easy to add the columns you want to add (but you need to consider storage optimization, data distribution and other issues), in the Cassandra, the table key is not tightly surrounding the easy to find, it can allow the system to effectively partition the data on the cluster, Look at the standard column family structure below, which looks very similar to JSON.
Columnfamilyname {Rowkey1= {Column1:"Value1",Column2:"Value2"},Rowkey2= {Column1:"value2.1", Column2: "value2.2" ,: "value2.3" },rowkey3 = {column2: "value3.2" , Column3: "value3.3" },rowkey4 = {: "value4.1" , Column3 : "value4.3" }}    
The above is the standard column family database data storage structure, which seems to have some limitations, but as a supplement, it also includes a super-column concept, similar to the following structure:
Supercolumnfamily {ColumnFamily1 {Cf1rowkey1= {Column1:"Value1",Column2:"Value2"},Cf1rowkey2= {Column1:"value2.1",Column2:"value2.2",Column3:"value2.3"},Cf1rowkey3= {Column2:"value3.2",Column3:"value3.3"},Cf1rowkey4= {Column1:"value4.1",Column3:"value4.3"}},ColumnFamily2 {Cf2rowkey1= {Column1:"Value1",Column2:"Value2"},Cf2rowkey2= {Column1:"value2.1",Column2: "value2.2" , Column3: " value2.3 "},cf2rowkey3 ={column2: "value3.2"  Column3: "value3.3" },= {column1:  "value4.1" , Column3: " value4.3 "}}}    
In addition you can find several other structures, such as composite columns and other types of columns, including static and dynamic columns, and so on. There are also special column types including count columns, timestamp columns, and so on.   What is the use of Cassandra? one, time type of dataThe time-type data includes any temperature sensor, logbook, or stock price data, including blog data, movie fragment data, and so on, which are proven to cause failure if using a document database such as MongoDB. Two, product catalog   Thirdly, subsystem information recommendation system of information filtering system-recommendations   four, fraud and garbage information listening system   Five, back-end Big data storage System   It should be said that the background database storage system, with Cassandra Place is not too much, mainly applies its cache function global network data replication , you can use the next Cassandra if you need to back up your dual Live database offsite.   Cassandra and HBase  Cassandra have their own advantages and disadvantages, like hbase, they all have their own use of the situation both in the Hadoop ecosystem, Cassandra is used to high-level reading and writing work, and the millisecond data consistency is not too good, and hbase can compensate for these, From a larger scope, Cassandra is suitable for business systems, while HBase is more used in data warehousing and transactional scenarios. talk about Business ...    Developers often wonder when atomic-level consistency is really needed, and you'll be confused if you start using an RDBMS, because relational databases often need to collect the same set of data across multiple locations. Such a person's information may be disrupted into multiple tables, because they have different phones, and there are different addresses, and so on, which leads to the-good database architecture in the end what is it?    under any current system, you need to compromise your data consistency, because no matter what type of database you use, it is impossible to provide long-running transactions (lightweight enough to meet the requirements of the Internet application level), because no one can operate in a persistent message mode, And each user corresponds to a connection, and the transaction often interrupts the user's actions, affecting the user's experience, which is the cost of atomic-level data consistency.   However, in many cases, a millisecond or even 500 milliseconds will not be too different, if I modify some of the line keywords, eventually Cassandra will be persistent, then I can read the operation, perhaps this read operation is a virtual read. The problem here is that you will not focus on a short-lived error, such as you are watching a movie, you find that there are fewer fragments, no data loading, but after a while, you click the movie again, back to normal, you can watch the normal, here you do not care too much, you care about, whenever the movie directory is updated, Your operation will be strangely interrupted, so at the moment you need to compromise between performance and system size and data consistency at the Internet level, and sometimes consistency is not too important for performance and scale.                                  &NBS P                          ,         &NB Sp                          ,         &NB Sp                          ,         &NB Sp                 translation articles    

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.