I. Database rankings and trends 1.1 complete ranking
- Links: https://db-engines.com/en/ranking
This site lists all the database rankings, and you can see the type of database you belong to, and update it once a month. You can have a look when you are free.
1.2 Db-engines ranking-trend Popularity
- Links: Https://db-engines.com/en/ranking_trend
You can see that Oracle, MySQL, and Microsoft SQL Server have been very mainstream, and NoSQL MongoDB, Redis, Cassandra, elasticsearch have risen very quickly.
II. relational database
The relational database (RDBMS) is a database based on the relational model, and the relationships between various entities and entities in the real world are represented by the relational model.
Relational model refers to two-dimensional tabular model, a relational database is a data organization composed of two-dimensional table and its relation, which consists of relational data structure, relational operation set and relationship integrity constraint.
2.1 Relational data structures
The Entity Relationship model (Entity-relationship models), or E-R model, is a set of database design tools presented by Dr. Chen Pinshan (Peter p.s Chen) in 1976, using the concept of things and relationships in the real world, To interpret the abstract data schema in the database.
2.2 Relationship Operations Collection
Additions and deletions, SQL (structured query Language) is a relational database-based language that performs operations on data in a relational database.
2.3 Relationship Integrity constraints 2.3.1 data integrity
The relationship integrity constraint is a kind of constraint condition or rule for the relational model to ensure the correctness and consistency of the data in the database.
Integrity typically includes domain integrity, entity integrity, referential integrity, and user-defined integrity.
Domain integrity, entity integrity, and referential integrity are the integrity constraints that the relational model must meet.
1) Domain integrity
Domain integrity is to ensure that the value of the database field is reasonable, the value can not exceed the domain, whether it can be pre-defined null and so on.
- Domain integrity constraints (domains Integrity constrains) are the simplest and most basic constraints. In today's relational DBMS, there is generally a domain integrity constraint check feature.
2) Entity integrity
The primary key is unique, cannot be duplicated, and cannot take "null".
- In relational mode, the primary key is used as the uniqueness identifier and cannot be taken as a null value; If the primary key is a combination of multiple attributes, none of the primary properties can be empty.
3) Referential integrity
means that the data for the primary and external keywords of the two tables should be consistent (the foreign key can only be the value in the referenced gradual domain, or null; the value of the primary key is deleted, the foreign key has some constraint-constraint rules optional), ensuring consistency of data between tables, preventing data loss or meaningless data from spreading in the database.
4) User-defined integrity
User-defined integrity (users defined integrity) is based on the requirements of the application environment and the actual needs of a specific application of the data involved in the binding conditions.
- This constraint mechanism should generally not be provided by the application, but should be provided by the relational model to define and verify that the user-defined integrity mainly includes the field validity constraints and record validity.
2.3.2 Data integrity constraints [1]
1) Constraints related to the table
A constraint defined in a table that defines the constraint (column constraint) in a column definition, or a constraint (table constraint) when the table definition is defined.
Related include column constraints (not NULL) and table constraints (PRIMARY key,foreign key,check-defines a column domain, UNIQUE).
Constraints are checked at both insert and update.
2) Domain constraints
A constraint that is defined in a domain definition that can be applied to any column in a particular field.
- The related constraint is check, which is defined as the data type followed by the column to be used.
3) Assertion
A constraint defined at the assertion definition that can be associated with one or more tables.
- The related constraints are also check.
2.4 Paradigm [2]
A paradigm is a set of relational patterns that conform to a certain level, representing the degree to which relationships within a relationship are rationalized. Its greatest significance is to avoid data redundancy and insert/delete/update exceptions.
At present the database has 1NF,2NF,3NF,BCNF,4NF,5NF, the database generally takes the most BCNF.
A design that conforms to a high-level paradigm must conform to a low-level paradigm.
2.4.1 First normal form (1NF)
Each attribute in a 1NF-compliant relationship is no longer divided.
- That is, the column is not duplicated. 1NF is the most basic requirement for all relational databases.
2.4.2 Second Normal form (2NF)
2NF is based on 1NF, eliminating the partial function dependency of non-main attribute on code.
2.4.3 Third paradigm (3NF)
3NF on the basis of 2NF, eliminate the non-main property of the code transfer function dependency.
2.4.3 BCNF
On the basis of #nf, the elimination of the main attribute is dependent on the part of the code and the transfer function. (If the relationship R conforms to 3NF, and the candidate code is only one, then the relationship R must conform to BCNF)
The specific explanation refers to the material at the end of the article.
2.5 Mainstream relational database [3]
From the full database rankings you can see that the popular relational database is Oracle, MySQL, Microsoft SQL Server, PostgreSQL, DB2, Microsoft Access.
- Of these, MySQL and PostgreSQL are open source, the others are commercial;
- Access operation is simple, is a small desktop database, the average number of records reached 100,000 or so when the performance will drop sharply, if the data reached about 100M, it is easy to cause the server IIS suspended animation (upper 2GB, assuming that each row recorded 1KB, about 2 million records);
- SQL Server operation is simple, does not obtain the security certificate, the volume is large, the user has the poor performance, is suitable in the window operating system (millions record);
- MySQL is a small open source database, no security certificate, small size, fast, easy to expand, almost unlimited number of users, processing up to 5000w Records, command execution is the fastest, widely used in small and medium enterprises (more than 50 million records);
- Oracle has high performance, obtains the highest ISO standard certificate, management maintenance trouble, most enterprises and institutions use Oracle (TENS record);
- DB2 is most suitable for Yu Hai volume data, for Data Warehouse, obtains the highest ISO standard attestation, the performance is high, has the very good parallelism, the operation is simple, obtains the widespread application in the giant enterprise.
Comparison of the specific parameters can be consulted:
Https://db-engines.com/en/system/Microsoft+SQL+Server%3BMySQL%3BOracle
- In addition, the Hadoop distributed framework is popular and can handle petabytes of data. A separate article on Hadoop-related components is presented later.
Third, NoSQL
NoSQL (not only SQL), refers to the non-relational database.
With the rise of web2.0, the traditional relational database in dealing with ultra-large and high concurrency of pure Dynamic Web site has become inadequate, and these data may not need structured storage can be directly scale-out, so the NoSQL database developed to deal with these large data well. [4]
NoSQL is used for storage of hyper-scale data. (for example, Google or Facebook collects trillions of bits of data for their users every day). These types of data stores do not require a fixed pattern and can be scaled horizontally without extra action.
3.1 The difference between RDBMS and NoSQL [5]
Before we tell the difference between the two, we'll look at a couple of theories:
3.1.1 CAP
The cap theorem means that for a distributed computing system, it is not possible to meet the following three points:
- Consistency (consistency): all nodes have the same data at the same time.
在任何时间点,不管哪台服务器应答了一个请求,所有服务器都会给出同样的答案。
- Availability (availability): Ensure that every request, regardless of success or failure, should be.
即使某些服务器宕机了,整个系统仍能正常工作。
- Segmentation tolerance (Partition tolerance): the loss or failure of any information in the system does not affect the continued operation of the system.
分区宽容度是指两台服务器之间的通信可以丢失,而系统仍能正常工作。
3.1.2 ACID
Refers to the four elements that the transactional database performs correctly, including atomicity (atomicity), consistency (consistency), isolation (isolation), persistence (durability). The aim is to ensure the correctness of the data.
- atomicity (Atomic:): Either the entire transaction succeeds or the whole is unsuccessful.
- Consistency (consistency): The database is in a consistent state between transactions. For example, if a record points to another record, and the point is invalid at the end of the transaction, the entire transaction must be rolled back.
Isolation (Isolation): Transactions do not see data that they have changed until the end of other transactions.
Persistence (Durability): Once the database system notifies the user that a transaction is successful, the data is never lost.
3.1.3 BASE
- basically Available (basic availability): Distributed System in the event of unpredictable failure, allow the loss of some of the availability, but not equal to the system is not available;
- Soft State (soft states): In contrast to the hard state, that is, to allow the data in the system to exist in the middle State, and that the existence of the intermediate State does not affect the overall availability of the system, that is, allow the system to synchronize data between the different nodes of the process of data synchronization exists delay;
- eventually consistent (final consistency): Emphasizes that all copies of the data in the system can eventually reach a consistent state after a period of synchronization. The essence of the system is to ensure that the final data can be consistent, without the need for real-time guarantee the strong consistency of the system data. [6]
事实上实在没看懂这三个理论。
3.1.4 RDBMS and NoSQL differences:
- The latter does not require pre-defined patterns: You do not need to define the data schema beforehand and predefine the table structure. Each record in the data may have different properties and formatting. When inserting data, it is not necessary to pre-define their patterns.
- The latter has no shared architecture: A fully shared architecture in a storage area network that stores all data. NoSQL often divides the data and stores it on each local server. Because the performance of reading data from a local disk tends to be better than the performance of reading data over a network, it improves the performance of the system.
- The latter is elastic and extensible: you can dynamically add or delete nodes while the system is running. No maintenance is required and data can be migrated automatically.
- partitioning: rather than storing data at the same node, a NoSQL database needs to partition the data and spread the records across multiple nodes. It is usually partitioned and replicated at the same time. This improves both parallel performance and guarantees that there is no single point of failure.
- The latter is asynchronous replication: Unlike a raid storage system, the replication in NoSQL is often a log-based asynchronous replication. In this way, the data can be written to a node as soon as possible without being delayed by the network transmission. The disadvantage is that consistency is not always guaranteed, and a small amount of data may be lost in the event of a failure.
- The RDBMS adheres to the acid principle, and NoSQL has the base principle , which conforms to the CP in the CAP, which conforms to the AP in the CAP (that is, the former sacrifices availability, the latter sacrifices consistency) [7].
3.2 Different types of NoSQL databases
[8]
[1] Data integrity constraints-Baidu Encyclopedia
[2] Paradigm-Know
[3] Data volume conversion and several relational database advantages and disadvantages comparison
[4] [8] nosql-Baidu Encyclopedia
[5] Introduction to NoSQL-Rookie tutorial
[6] from acid to cap to base
[7] Cap Theory 12 review: "Rules" have changed
relational databases and NoSQL databases