Source: http://blog.csdn.net/robinjwong/article/details/185021951. Relational database
Relational database refers to a database that uses a relational model to organize data.
The relational model was first proposed by IBM researcher Dr. E.f.codd in 1970, and in the following decades, the concept of relational model has been fully developed and gradually becomes the mainstream model of the mainstream database structure.
In simple terms, the relational model refers to a two-dimensional tabular model, and a relational database is a data organization composed of two-dimensional tables and their linkages.
Common concepts in the relational model:
- Relationship: Can be understood as a two-dimensional table, each relationship has a relationship name, which is usually said table name
- Tuples: Can be understood as a row in a two-dimensional table, often referred to as records in a database
- Properties: A column in a two-dimensional table that is often referred to as a field in a database
- Domain: The value range of a property, that is, the value limit of a column in a database
- Keywords: A set of properties that uniquely identify a tuple, often referred to as a primary key in a database, consisting of one or more columns
- Relational pattern: Refers to the description of the relationship. The format is: the relationship name (attribute 1, property 2, ..., attribute N), which becomes the table structure in the database
The benefits of a relational database:
- Easy to understand: the two-dimensional table structure is very close to the logical world of a concept, relational model relative to the mesh, hierarchy and other models easier to understand
- Easy to use: Universal SQL language makes it easy to manipulate relational databases
- Ease of maintenance: rich integrity (Entity integrity, referential integrity, and user-defined integrity) greatly reduces the probability of data redundancy and data inconsistency
2. Relational database bottlenecks
- High concurrent read and write requirements
Website user concurrency is very high, often up to tens of thousands of read and write requests per second, for the traditional relational database, hard disk I/O is a big bottleneck
- High-efficiency reading and writing of massive data
The amount of data generated per day is huge, and for relational databases, it is very inefficient to query in a table containing huge amounts of data.
- High scalability and availability
In the Web-based architecture, the database is the most difficult to scale out, when the number of users and access to an application system is increasing, the database is not as simple as Web server and app server to add more hardware and service nodes to expand performance and load capacity. For many websites that need to provide 24-hour uninterrupted service, it is very painful to upgrade and extend the database system, which often requires downtime maintenance and data migration.
Many of the features of a relational database are no longer needed for a Web site:
- Transactional consistency
Relational database has a lot of overhead in maintaining the consistency of things, and now many web2.0 systems do not have high consistency in the reading and writing of things.
For a relational database, a query immediately after inserting a piece of data is sure to be able to read this data, but for many Web applications it is not required to be so high-real-time, such as after sending a message, a few seconds or even more than 10 seconds to see this dynamic is completely acceptable
- Complex SQL, especially multi-table association queries
Any large data volume of the Web system, are very taboo multiple large tables of association queries, as well as complex data analysis type of complex SQL report query, especially the SNS type of Web site, from the requirements and product class angle, to avoid the emergence of this situation. Often more than just a single table of primary key query, as well as single table simple conditional paging query, SQL function greatly weakened
The most important cause of poor performance in a relational database is the association query for multiple tables, as well as complex SQL report queries with complex types of data analysis. In order to ensure the acid property of the database, we must try to design according to its required paradigm, and the tables in the relational database are all stored in a formatted data structure. The composition of each tuple field is the same, even if not all of the fields are required for each tuple, but the database assigns all the fields to each tuple, which makes it easy to link between the banner tables, but from another point of view it is also a factor in relational database performance bottlenecks.
3. NoSQL
The term NoSQL was first introduced by Carlo Strozzi in 1998, referring to a relational database that he developed without a SQL function, a lightweight, open source. This definition is very different from the definition of nosql we have now, and it really name the name of the database "without SQL". But the development of NoSQL slowly deviated from the original intention, we want is not "no SQL", but "no relational", which is what we now often say the non-relational database.
In early 2009, Johan Oskarsson held a discussion about open-source distributed databases, and Eric Evans again introduced the term "NoSQL" to refer to data storage systems that are non-relational, distributed, and generally not guaranteed to adhere to the acid principle. Eric Evans uses the term nosql, not because of the literal "no SQL" meaning, he just feels that a lot of classic relational database names are called "**sql", so in order to express with these relational database in the location of the very different, is using the word "NoSQL".
Note: Database transactions must have acid characteristics, acid is atomic atomicity, consistency consistency, isolation isolation, durability persistence.
The non-relational database proposes another idea, for example, to store the key-value pair, and the structure is not fixed, each tuple can have a different field, each tuple can add some of their own key-value pairs as needed, so that the fixed structure will not be limited, you can reduce the cost of some time and space. In this way, users can add the fields they need as needed, so that in order to get different information of the user, it is not necessary to query the multi-tables in the relational database. The query can be completed only by taking the corresponding value out of the ID. However, the non-relational database has little constraints, and he is not able to provide a query that is provided by SQL, where this is the case for field property values. And it is difficult to embody the integrity of the design. He is only suitable for storing some simpler data, and SQL database is more appropriate for data that requires more complex queries.
4. Relational database V.s. Non-relational database
The most important feature of a relational database is transactional consistency: traditional relational database read and write operations are transactional and acid-based, and this feature enables relational databases to be used in almost all conformance-compliant systems, such as typical banking systems.
However, in the Web application, especially in the SNS application, the consistency is not so important, user a sees the content and User B see the same User C content update inconsistency is tolerable, or, two people see the same friend's data update time difference so a few seconds can be tolerated, so, The biggest feature of a relational database is useless here, at least not so important.
On the contrary, the great cost of relational database in order to maintain consistency is its poor read and write performance, and the application of SNS such as Weibo, Facebook, which is very demanding for concurrent reading and writing ability, the relational database has been unable to cope (in terms of reading, traditionally in order to overcome relational database defects, improve performance, is to increase the level of memcache to static Web pages, and in SNS, changes too fast, Memchache is powerless to do so, it is necessary to use a new data structure storage to replace the relational database.
The other characteristic of the relational database is that it has a fixed table structure, so its extensibility is very poor, and in SNS, the upgrading of the system, the increase of the function, often means the huge change of data structure, this relational database is also difficult to cope with, need new structured data storage.
Therefore, the non-relational database comes into being, because it is impossible to meet all the new requirements with a data structure, so the non-relational database is strictly not a database, it should be a collection of data structure storage methods.
It must be emphasized that the persistent storage of data, especially the persistent storage of large amounts of data, or the need for a relational database veteran.
5. Non-relational database classification
Because the non-relational database itself is natural diversity, and the occurrence of a short time, so do not want to relational database, there are several databases can unified Jiangshan, non-relational database is very much, and most of them are open source.
These databases, in fact, most of the implementation is relatively simple, in addition to some commonalities, a large part of the specific needs of the application to appear, therefore, for this kind of application, has very high performance. Depending on the structure method and the application situation, the following categories are mainly divided into:
- Key-value database for high-performance concurrency reading and writing:
The main features of the Key-value database, even with extremely high concurrent read and write performance, Redis,tokyo Cabinet,flare is the representative of this class.
- Document-oriented database for massive data access:
This type of database is characterized by the ability to quickly query data in massive amounts of data, typically for MongoDB and COUCHDB
- Distributed Database for extensibility:
The problem that this kind of database wants to solve is that the traditional database has the expansibility flaw, this kind of database can adapt to the increase of data quantity and the change of structure
From a relational database to a non-relational database