1. Relational database
Relational database refers to a database that uses a relational model to organize data.
The relational model was first proposed by IBM researcher Dr. E.F. Codd in 1970. In the following decades, the concept of the relational model was fully developed and gradually became the mainstream model of the
mainstream database structure.
In simple terms, the relational model refers to the two-dimensional table model, and a relational database is a data organization composed of two-dimensional tables and the connections between them.
Alibaba Cloud Simple Application Server: Anti COVID-19 SME Enablement Program
$300 coupon package for all new SMEs and a $500 coupon for paying customers.
Concepts commonly used in relational models:
Relationship: It can be understood as a two-dimensional table, each relationship has a relationship name, which is usually called the table name
Tuple: can be understood as a row in a two-dimensional table, often referred to as a record in the database
Attribute: Can be understood as a column in a two-dimensional table, often referred to as a field in the database
Domain: the value range of the attribute, that is, the value limit of a column in the database
Keywords: a set of attributes that can uniquely identify tuples, often called primary keys in the database, and consist of one or more columns
Relationship model: refers to the description of the relationship. Its format is: relationship name (attribute 1, attribute 2, ..., attribute N), which becomes a table structure in the database
Advantages of relational database:
Easy to understand: Two-dimensional table structure is a concept that is very close to the logical world, and the relational model is easier to understand than other models such as mesh and hierarchy
Easy to use: Universal SQL language makes it very convenient to operate relational databases
Easy to maintain: Rich integrity (entity integrity, referential integrity and user-defined integrity) greatly reduces the probability of data redundancy and data inconsistency
Relational database bottleneck
1). High concurrent reading and writing requirements
The user concurrency of the website is very high, often reaching tens of thousands of read and write requests per second. For traditional relational databases, hard disk I/O is a big bottleneck
2). Efficient reading and writing of massive data
The amount of data generated by the website every day is huge. For a relational database, querying in a table containing massive data is very inefficient
3). High scalability and availability
In the web-based structure, the database is the most difficult to scale horizontally. When the number of users and access of an application system is increasing, the database has no way to simply add more hardware and Service nodes to expand performance and load capacity. For many websites that need to provide 24-hour uninterrupted service, it is very painful to upgrade and expand the database system, which often requires downtime maintenance and data migration.
For websites, many features of relational databases are no longer needed:
Transaction consistency: Relational databases have a large overhead in maintaining the consistency of things, and now many web2.0 systems have low read and write consistency for things
Read and write real-time: For relational databases, querying immediately after inserting a piece of data is sure to read the data, but for many web applications, such high real-time is not required, such as after sending a message , It’s totally acceptable to see this dynamic after a few seconds or even a dozen seconds
Complex SQL, especially multi-table related queries: Any web system with a large amount of data is very taboo for multi-table related queries, as well as complex SQL report queries for complex data analysis types, especially SNS-type websites (SNS, Specifically refers to social networking services, including social software and social networking sites.), from the perspective of demand and product class, to avoid this situation. Often more are just the primary key query of a single table, and the simple conditional paging query of a single table, the function of SQL is greatly weakened
In relational databases, the most important reason for poor performance is multi-table related queries and complex SQL report queries of complex data analysis types. In order to ensure the ACID characteristics of the database, we must try to design according to the required paradigm. The tables in the relational database store a formatted data structure. The composition of each tuple field is the same, even if not all fields are required for each tuple, but the database will allocate all the fields for each tuple, this structure can facilitate the link between the slogan table and other operations, But from another perspective, it is also a factor in the performance bottleneck of relational databases.
2. NoSQL
The term NoSQL was first proposed by Carlo Strozzi in 1998, referring to a lightweight, open source
relational database that he developed without SQL features. This definition is very different from our current definition of NoSQL, it is indeed the real name, referring to the "no SQL" database. But the development of NoSQL has slowly deviated from its original intention. What we want is not "no sql", but "no relational", which is the non-relational database we often talk about now.
In early 2009, Johan Oskarsson organized a discussion on open source distributed databases. Eric Evans once again proposed the term NoSQL in this discussion to refer to those non-relational, distributed, and generally not guaranteed to follow ACID principle data storage system. Eric Evans uses the word NoSQL not because of the literal meaning of "no SQL", he just thinks that many classic relational database names are called "**SQL", so in order to indicate the positioning of these relational databases The difference is that the word "NoSQL" is used.
Note: The database transaction must have ACID characteristics, ACID is Atomic atomicity, Consistency consistency, Isolation isolation, Durability persistence.
The non-relational database proposes another concept. For example, it is stored in key-value pairs, and the structure is not fixed. Each tuple can have different fields, and each tuple can add some own key-value pairs as needed. It will not be limited to a fixed structure, and can reduce some time and space overhead. In this way, users can add the fields they need according to their needs. In this way, in order to obtain different information of users, there is no need to perform related queries on multiple tables like in a relational database. Only need to extract the corresponding value according to the id to complete the query. However, due to few constraints, the non-relational database is also unable to provide queries such as where SQL provides the field attribute values. And it is difficult to reflect the integrity of the design. He is only suitable for storing some simple data. For data that requires more complex queries, the SQL database is more suitable.
2-1. Classification of non-relational databases
Due to the natural diversity of non-relational databases and the relatively short period of time, there are several types of databases that can unify relational databases. There are many non-relational databases, and most of them are open source.
In these databases, most of the implementations are relatively simple. Except for some commonalities, a large part of them appear for specific application requirements. Therefore, they have extremely high performance for such applications. Depending on the structured method and the application, it can be divided into the following categories:
1). Key-value database for high-performance concurrent reading and writing: the main features of key-value database even with extremely high concurrent reading and writing performance, Redis, Tokyo Cabinet, Flare are representatives of this type
2). Document-oriented database for massive data access: The characteristic of this type of database is that it can quickly query data among massive data, and the typical representatives are MongoDB and CouchDB
3). Scalability-oriented distributed database: The problem that this kind of database wants to solve is that the traditional database has defects in scalability. This type of database can adapt to the increase in data volume and changes in data structure
3. Relational database V.S. Non-relational database
The biggest feature of the relational database is the consistency of the transaction: the traditional relational database read and write operations are transactional, with ACID characteristics, this feature makes the relational database can be used in almost all systems that require consistency, Such as a typical banking system.
However, in web applications, especially in SNS applications, consistency is not so important. The content that user A sees and user B see the same user C is inconsistent in content update is tolerable, or that two people see The time difference of updating the data of the same friend can be tolerated by a few seconds. Therefore, the biggest feature of the relational database is already useless here, at least not so important.
On the contrary, the huge price paid by relational databases to maintain consistency is that their read and write performance is relatively poor, while applications such as Weibo and facebook have extremely high requirements for concurrent read and write capabilities, and relational databases have been unable to cope. (In terms of reading, traditionally, in order to overcome the defects of relational databases and improve performance, a memcache has been added to statically web pages, and in SNS, the change is too fast, memchache is already helpless), therefore, you must use a new one Data structure storage instead of relational database.
Another characteristic of relational database is that it has a fixed table structure, so its scalability is extremely poor, and in SNS, system upgrades and increased functions often mean that the data structure changes greatly, which is also difficult for relational databases. To cope with, new structured data storage is required.
As a result, non-relational databases came into being. Because it is impossible to use a data structured storage to meet all new needs, non-relational databases are strictly not a database, but a data structured storage method. set.
It must be emphasized that the persistent storage of data, especially the persistent storage of massive data, still requires a veteran of a relational database.