A detailed example of NoSQL database usage

Last Update:2015-07-25 Source: Internet

Author: User

Tags cassandra neo4j couchdb

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, the basic knowledge of NoSQL

1. About NoSQL

The term "NoSQL" is actually created by a colleague called Racker, when John Evanseric to organize an event to discuss open source distributed databases. This is the name and concept.

Some people oppose the NoSQL term because it sounds like we define what we are. In a certain degree, but the long term is still valuable, because when a relational database is the only tool you know, every problem that looks like a thumb. NoSQL is to let people know where other choices are. But we are not against relational databases, because when this is really the best tool for work.

One of the real concerns with the NoSQL name is that it is a very large tent, with very different design space. If this is not clear in the discussion, it is a result of confusion in various products. So I want to suggest a lot of database options along the three axes of thinking: extensibility, data and query models, and persistence design.

Unprecedented data volumes are driving companies to focus on traditional relational database technologies and have served more than 30 years of good alternatives. In general, these alternatives have been called "NoSQL databases." ”

The fundamental problem is that the relational database cannot handle many modern workloads. There are three specific questions: expand to (3TB Green badge) or Facebook (search in 50TB Inbox) or ebay (whole 2PB) to the website like the Digg News review site,

Per-server performance and rigorous architecture design.

Note: (Digg's concept comes from Digg, USA.) It relies entirely on the true power of the netizens. All content on the website is published by netizens themselves, and the location of the content is determined by netizens themselves. When the content of the topness, comments and so on to a certain number, these content may be from a wide range of information to stand out. ）

I recently wrote an email to Cassandra about the resources of the non-relational database, and we promised to have other non-relational databases at work, what we call "the NoSQL movement." ”

2. A simple NoSQL instance

I chose some as an example of a NoSQL database. This is not an exhaustive list, but the concept of discussion is also crucial for measuring others.

Scalability

Scaling reading and copying is easy when we zoom in on this, we mean to write data that scales to automatic partitioning of multiple machines. We call for a system that is not sound, such as "distributed database." "This includes cassandra,hbase,riak,scalaris,voldemort, and so on. If you write volumes or data capacity more than one machine can handle, then these are your only options if you do not want to manually partition management.

There are two things to look at in the distributed database: 1) support for multiple data centers and 2) the ability to add new machines to a live cluster of applications that are transparent.

Second, NoSQL database use

1. NoSQL Data and query model

Non-distributed NoSQL databases include Couchdb,mongodb,neo4j,redis, and Tokyo Cabinet. These can be used as the persistence layer of distributed system; MongoDB offers limited sharing support, made a separate lounge for the COUCHDB project, and Tokyo Cabinet can be used as a Voldemort storage engine.

Data and Query models

In NoSQL there are many different data models and query APIs for databases.

Some highlights:

The columnfamily model Cassandra shared and HBase is inspired by Google's BigTable file, which is described in section 2nd. (Cassandra Drop, historical version, and add Super columns.) In both systems, you must be accustomed to rows and columns like yours, but sparse rows: each row can have as many or as few columns as needed, and the column must not be defined in advance.

The key/value model is the simplest and easiest to implement, but inefficient, only if you are interested in a part of the query or update value. This is also a more complex structure/value that is difficult to perform on the top of the distributed ring.

The document database is essentially the next phase of focus/value, allowing nested values to be associated with each key. The efficiency of a file database to support queries is simpler than returning an entire blob at a time.

NEO4J has a truly unique data model, where objects are stored in graphs and nodes and edge relationships. For queries that fit this model (for example, hierarchical data), they can be 1000 times times faster than alternatives.

Scalaris is unique, providing distributed transactions across multiple keys. (The tradeoff between discussion and trade is beyond the scope of this position, but another aspect is to keep in mind the evaluation of distributed systems.) ）

Durable design

By continuing the design I mean, "How is the data stored internally?" "The persistence model tells us how many of these databases can work.

The In-memory database is very, very fast (Redis reaches more than 100,000 operations per second on a single computer), but cannot work with datasets that exceed the available RAM. Durability (preserving data, even if the server crashes or loses power) will also be an issue of the amount of data that can be expected to be dashed between losses (copying data to disk) can be very large. Scalaris, other memory databases on our list, intent to deal with replication durability issues, but because it does not support multiple data center data will still be vulnerable to power outages like things.

Memtables and Sstables cache are written in memory (1 "memtable"), with a written append only commitment for durability logs. When written enough to be accepted memtable sorted and written to disk all at once as "sstable." "This provides near-memory performance as there is no requirement involved, while avoiding the sheer durability of the problem in memory methods. (This is a detailed description of the file in section 5.3 and previously mentioned 5.4 bigtable, as well as the merging tree in the log structure.) ）

The B-tree has been used to actually be the starting point of time from the database. Indexes they offer strong support but perform poorly on rotating disks (which is still the most cost-effective since many) require reading or writing what work.

An interesting variant is the addition of COUCHDB, only B-tree, which avoids the purpose of managing expenses in limiting Couchdb to write one-time costs.

Conclusion

The NoSQL campaign has exploded over the past 2009 years as more and more companies are grappling with massive amounts of data. The Rackspace Cloud is delighted to play the early role of the NoSQL movement and continues to devote resources to Cassandra like NoSQL support events.

This paper draws on http://database.51cto.com/art/201004/192283.htm

A detailed example of NoSQL database usage

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More