About NoSQL
NoSQL (NoSQL = not-only sql), meaning "not just sql."
On modern computing systems, a huge amount of data is generated every day on the network.
A significant portion of this data is handled by the relational database management system (RDMBSS). The 1970 E.f.codd's paper "A Relational model of data for large shared data banks", which makes data modeling and application programming easier.
The application proves that the relational model is well suited to client server programming, far exceeding the expected benefits, and today it is the leading technology for structured data storage in network and business applications.
NoSQL is a new revolutionary movement in the database, and early on, there was a growing trend in the 2009. NoSQL advocates are advocating the use of non-relational data storage, which is undoubtedly a new kind of thinking injection, relative to the overwhelming use of relational databases.
relational databases Follow acid rules
Transactions are transaction in English, similar to real-world transactions, and have the following four features:
1, A (atomicity) atomicity
Atomicity is easy to understand, that is, all operations in a transaction are either done or not, and the transaction succeeds because all operations in the transaction are successful, and as long as one operation fails, the entire transaction fails and needs to be rolled back.
For example, bank transfer, transfer from a account 100 to B account, divided into two steps: 1) from a account 100 yuan, 2) deposited into the account of 100 to B. The two steps are either completed together, or not completed together, if only the first step, the second step fails, the money will be inexplicably less than 100 yuan.
2, C (consistency) consistency
Consistency is also relatively easy to understand, that is, the database should always be in a consistent state, the operation of the transaction will not change the original consistency of the database constraints.
For example, if an existing integrity constraint a+b=10, if a transaction changes A, then the B must be changed so that the transaction will still satisfy a+b=10, otherwise the transaction fails.
3, I (isolation) Independence
The so-called independence refers to the concurrent transactions do not affect each other, if one transaction to access the data is being modified by another transaction, as long as another transaction is not committed, the data it accesses is not affected by uncommitted transactions.
For example, there is a transaction from a to 100 yuan to the B account, in the case of the transaction is not completed, if at this time B query their own account, is not see the new increase of 100 yuan.
4, D (durability) Persistence
Persistence means that once a transaction commits, its modifications are persisted to the database, even if the outage occurs.
Distributed Systems
Distributed Systems (Distributed system) consist of multiple computers and software components that communicate through a computer network connection (local network or WAN).
Distributed system is a software system built on the network. Because of the nature of the software, distributed systems are highly cohesive and transparent.
Therefore, the difference between a network and a distributed system is more about high-level software (especially the operating system) than the hardware.
Distributed systems can be used on different platforms such as PCs, workstations, LANs and WANs.
Benefits of distributed computing
Reliability (Fault tolerance):
One of the important advantages of distributed computing systems is reliability. A system crash on one server does not affect the rest of the servers.
Scalability:
In distributed computing systems, more machines can be added as needed.
Resource sharing:
Sharing data is essential for applications such as banks, booking systems.
Flexibility:
Since the system is very flexible, it is easy to install, implement and debug new services.
Faster Speed:
A distributed computing system can have more than one computer's computing power, making it faster to process than other systems.
Open Systems:
Because it is an open system, local or remote access to the service is available.
Higher performance:
Higher performance (and better value for money) compared to centralized computer network clusters.
Disadvantages of distributed computing
Troubleshooting::
Troubleshooting and diagnosing problems.
Software:
Less software support is a major disadvantage of distributed computing systems.
Internet:
Network infrastructure issues, including: transmission problems, high load, information loss, etc.
Security:
The characteristics of the development system have the problems of data security and sharing risk in distributed computing systems.
What is NoSQL?
NoSQL, refers to a non-relational database. NoSQL, sometimes referred to as the abbreviation of not-only SQL, is a generic term for a database management system that differs from a traditional relational database.
NoSQL is used for storage of hyper-scale data. (for example, Google or Facebook collects trillions of bits of data for their users every day). These types of data stores do not require a fixed pattern and can be scaled horizontally without extra action.
Why use NoSQL?
Today we can easily access and crawl data through third-party platforms (e.g., google,facebook, etc.). User's personal information, social network, geographical location, user generated data and user operation log have multiplied. If we want to dig these user data, the SQL database is not suitable for these applications, and the development of NoSQL database can handle these big data very well.
Instance
Social networking:
Each RECORD:USERID1, UserID2
Separate Records:userid, first_name,last_name, age, gender,...
Task:find all friends of friends of friends of friends of a given user.
Wikipedia page:
Large Collection of documents
Combination of structured and unstructured data
Task:retrieve All pages regarding athletics of Summer Olympic before 1950.
RDBMS vs NoSQL
Rdbms
-Highly organized structured data
-Structured Query Language (SQL) (SQL)
-Data and relationships are stored in separate tables.
-Data manipulation language, data definition language
-Strict consistency
-Basic services
Nosql
-represents more than just SQL
-No declarative query language
-No pre-defined pattern
-key-value pair storage, column storage, document storage, graphics database
-final consistency, not ACID properties
-Unstructured and unpredictable data
-Cap theorem
-High performance, high availability and scalability
A Brief History of NoSQL
The term NoSQL first appeared in 1998 and is a lightweight, open source, relational database that does not provide SQL functionality developed by Carlo Strozzi.
In 2009, Last.fm's Johan Oskarsson launched a discussion on a distributed open source database [2], and Eric Evans from Rackspace again proposed the concept of NoSQL, when NoSQL mainly refers to non-relational, distributed, does not provide an acid database design pattern.
The No:sql (East) symposium, held in Atlanta in 2009, was a milestone with the slogan "Select Fun, Profit from Real_world where Relational=false;". Therefore, the most common explanation for NoSQL is "non-associative", emphasizing the advantages of key-value stores and the documentation database, rather than simply opposing the RDBMS.
Cap theorem (Cap theorem)
In computer science, the cap theorem (Cap theorem), also known as the Brewer's theorem (Brewer's theorem), points out that it is impossible for a distributed computing system to meet the following three points:
- Consistency (consistency) (all nodes have the same data at the same time)
- Availability (availability) (Ensure that each request responds regardless of success or failure)
- Segregation tolerance (Partition tolerance) (loss or failure of any information in the system will not affect the continued operation of the system)
The core of the CAP theory is that a distributed system cannot meet the three requirements of consistency, availability, and partition fault tolerance at the same time, and can only satisfy two at the same time.
Therefore, according to the CAP principle, the NoSQL database is divided into three categories: satisfying the CA principle, satisfying CP principle and satisfying AP principle.
- CA-A single point of clustering, a system that meets consistency, availability, is often less scalable.
- CP-a system that satisfies consistency, partition tolerance, and usually performance is not particularly high.
- APS-systems that satisfy availability, partitioning tolerance, and generally may have a lower consistency requirement.
The pros/cons of NoSQL
Advantages:
- -High Scalability
- -Distributed computing
- -Low cost
- -Architecture flexibility, semi-structured data
- -No complicated relationship.
Disadvantages:
- -No standardization
- -Limited query function (so far)
- -final agreement is not intuitive program
BASE
Base:basically Available, soft-state, eventually consistent. Defined by Eric Brewer.
The core of the CAP theory is that a distributed system cannot meet the three requirements of consistency, availability, and partition fault tolerance at the same time, and can only satisfy two at the same time.
Base is the weak requirement that NoSQL databases typically have for usability and consistency:
- Basically availble--Basic available
- Soft-state-Soft state/flexible transaction. "Soft state" can be understood as "no connection", while "hard state" is "connection oriented"
- Eventual consistency-final consistency is ultimately the ultimate goal of ACID.
ACID vs BASE
ACID
BASE
Atomicity (Atomicity)
Basic available (Basically Available)
Consistency (Consistency)
Soft-state/flexible transactions (S-oft states)
Isolation (Isolation)
Final consistency (Eventual consistency)
Persistence (Durable)
NoSQL Database Classification
Column Storage
Hbase
Cassandra
Hypertable
As the name implies, data is stored in columns. The biggest feature is the convenient storage of structured and semi-structured data, easy to do data compression, for a column or a few columns of the query has a very large IO advantage.
Document storage
Mongodb
Couchdb
Document storage is typically stored in a JSON-like format, and the stored content is document-based. This also gives you the opportunity to index certain fields and implement certain functions of the relational database.
Key-value Storage
Tokyo cabinet/tyrant
Berkeley DB
Memcachedb
Redis
You can quickly query to its value with key. In general, the format of the store regardless of the value of the full receipt. (Redis includes other features)
Diagram Storage
Neo4j
Flockdb
The best storage for graphical relationships. The use of traditional relational databases to address the performance of poor, and design use is not convenient.
Object storage
Db4o
Versant
The database is manipulated by object-oriented syntax, and data is accessed through objects.
XML database
Berkeley DB XML
BaseX
Efficiently stores XML data and supports internal query syntax for XML, such as Xquery,xpath.
Who is using
Many companies now use NOSQ:
- Google
- Facebook
- Mozilla
- Adobe
- Foursquare
- Linkedin
- Digg
- Mcgraw-hill Education
- Vermont Public Radio
1 What is NoSQL