Please see the original: http://www.ucai.cn/blogdetail/7033?mid=1&f=5
Be able to perform the viewing effect online Oh!
Today, this class is divided into three big points, as the title points out, is to discuss the nature of the data, concepts and application practices. The 1th essence of the discussion is to stand at a higher altitude to analyze the production of the data and various usage scenarios, and then make a summary of the data-related storage methods. And the concept, is a chuanjiang, mainly placed in the relational database, because we June public class, will also be based on relational database. The 3rd is to consolidate the contents of the previous two points through some examples.
First, the origin and nature of the emergence of the database
1.1 Origin of the database
Imagine our computer, there are folders, there are files, the size of the file, there is a location, there is a format, this is the beginning of the file system has this concept, there must be a place, stored the parameters of these files or metadata, and in our daily life, each object and person, has its own different characteristics, When a batch of these descriptive people's characteristics, want to data storage, also need to have a format to save. In addition to saving data, we have to manage the data. This is the background of the database, and as the use of the scene is increasingly complex, the database itself has many other derived version numbers.
So the database originates from the storage and management of the data.
1.2 Usage Scenarios for databases
So we have a few questions, based on the current understanding, the first question is what kinds of data are required to be stored, and what are the storage methods? Let's scatter a little to think.
The following conditions can be obtained, for example:
A, in the program of their own definition, such as arrays, only in memory, and so on after the program has stopped, the data will be gone. Can this not be done?
What is stored in this case is descriptive data, which can also be binary arrays.
B, saved in a text file can not be able to have their own format, for example, their own definition of the format, such as XML, such as JSON format
What is stored here can only be descriptive data.
C, the use of some file-based database, such as support SQL access to the text database, such as SQLite, for example,
In addition to specific data types, it is generally only the text descriptive narrative data is saved.
D, the use of MySQL such a network-based and distributed database system.
Used to save more data volume of user data, relational data, transaction data, the data should be real high-speed query update.
E, when the volume of data is large, a device can not meet the requirements, which requires the use of distributed databases, may be a data cluster.
F, the use of memcached this software holds the cached data.
Typically used to hold text data or count.
G, the use of Fastdfs Distributed File system to save the file data.
It is not used to save descriptive narrative data other than the descriptive narrative of the file itself, for storing a large number of documents.
H, the use of Redis Kvdb software to preserve the simple relationship of descriptive narrative data, or a specific structure.
For example, to save Weibo data, key can quickly locate the content of Weibo.
I, similar to Kvdb, have you heard the students in the queue? Is it also a special means of data storage? While someone is producing data, someone consumes data.
J, used to save geographic relational data, and to provide operations. In English, it's called Spatialdata, which is spatial data. Oracle is best at it, and now there is room for MySQL to expand.
Http://dev.mysql.com/doc/refman/5.1/zh/spatial-extensions-in-mysql.html
K, and for a large number of, a steady stream of massive data, relatively less regular data storage, we may be stored in the Hadoop system.
L, finally talk about a scene, like Baidu search engine, he will put his data all in MySQL, and then to retrieve it? Obviously not. Has its own index format and system that supports distributed queries.
1.3 Database Scenarios and classification summary
To summarize:
A, in the computer era, human beings have produced a lot of information, and since the computer era, has produced a large number of digitized data. The preservation of these databases in the form, from the above can be seen, depending on the scene, different ways of preservation, are diverse. There is only in memory, there are saved in the file, there is a dedicated strict database system to save, there are less rigorous kvdb preservation.
B, according to the above various storage scenarios, we also do two simple classification of data, one is structured database and unstructured data classification, of course, plus a semi-structured data. The so-called structured, is a very regular property of a situation. Like a person, he has height, weight, age, this descriptive narrative is structured, you take a picture, record a sound, take a video, although this is data, but not structured. For these two kinds of data, the storage method is not the same, for example, structured data is the structure of a typical two-dimensional table. Unstructured data is simply stored in the form of raw data.
The above scenes are mostly structured data processing scenarios, like text databases, MySQL databases. On Fastdfs, on Hadoop, there is a lot of unstructured data storage.
C, according to the size of the data itself, we can also do a classification. or a specific technical application scenario.
For example, small data volume data, we may be for convenience, in the form of text to save. And for a regular service site, such as user data, such as excellent network user data, will be used to save the MySQL database, in order to access the speed, will also use memcached this cache software. You can also use Fastdfs to save small files. When there is a large number of logs and other data generated, the need for data analysis, such as the daily data volume of hundreds of g,t to count, it will use Hadoop software.
Therefore, different data volumes also use different ways of saving. Small data, medium to large data, massive data, or big data.
D, the fourth angle is from the above can be seen, data storage methods, there are very many categories, different categories for different scenarios. The database is very powerful, but not all scenarios are appropriate to use the database. For example, the above-mentioned full-text search scenario, generally need to use a specific storage, a specific index method. Let's say in detail, in full-text indexing, an index that is used, called an inverted index.
Ii. What are the concepts of the database
All right, we're done. Some usage scenarios for the data and related software have a concept for different software scenarios. Let's take a look at some of the terminology commonly used in the domain of databases, which are used throughout our database learning.
Because of the time of the Open class, we mainly exchange the basic concept, for the advanced concept, the following Open class will continue to be discussed.
2.1 Basic concepts of the database
When we introduce these concepts of our database, we use a company as a comparison.
2.1.1, library
Library, as opposed to a company, the following nearly all of the concepts are installed in this company.
Before using the database, we have to create the database on the database software.
In MySQL, the same connection, can only be attached to the same library, of course, only permission to agree, but also to achieve cross-Library query.
The syntax for creating a database is
CreateDatabase xxx;
Only the root user can create a library.
2.1.2, table
The table is relative to, a department of the company, this department tube user, that Department tube order. is one table after another.
Tables consist of records and fields.
The field table is a table definition.
Records are composed of table data.
CREATE TABLE ' hstesttbl ' ( ' k ' int (one) not null auto_increment, ' V ' char (255) is not null DEFAULT ' ', PRIMARY KEY ( ' K ')) Engine=innodb DEFAULT Charset=utf8
Field definitions have field names, field types, whether they are empty, self-growing, default values, and so on.
The record is the actual data value, and the definition of the record for the same table is the same. That means it has the same characteristics.
2.1.3, view
The view class is equivalent to the Virtual project team, which can be the same department or cross-department.
The view is said to be virtual, because there is no actual data. Both the definition and the data are grouped together from other tables.
CREATE VIEW Hsview as SELECT * from HSTESTTBL; Mysql> Show tables;+--------------------+| Tables_in_hstestdb |+--------------------+| Hstesttbl | | hsview |+--------------------+2 rows in Set (0.02 sec) mysql> Show create TABLE hsview\g;******** 1. row*************************** view:hsview Create view:create algorithm=undefined definer= ' root ' @ ' localhost ' sqlsecurity definer VIEW ' Hsview ' as select ' Hstesttbl '. ' K ' as ' k ', ' hstesttbl '. ' V ' as ' V ' from ' hstesttbl '
As you can see from the above command, views are much like the operations of a table. And you can change the field name from table to view.
2.1.4, Index
An index is an attribute that is used by each employee of a department to find employees at high speed. For example, a job, like a hobby, like a graduate school. By doing this, you can classify the people in the Department, and each query can narrow the result set.
Creating an index on a table is, for example, the above table, ALTER TABLE HSTESTTBL add index (XXX);
2.1.5, key
The key is equivalent to the Department of each person some of the most efficient and unique contact, such as telephone, mailbox, number, QQ number is.
Join the key and join the index, like, delete the key here to introduce,
Example:
ALTER TABLE HSTESTTBL drop PRIMARY key;
2.1.6, permissions
refers to the internal restrictions of the company some of the discipline, such as the company outside people, is not aware of some things inside the company.
We are able to pass
Grantall privileges on dbname.* to ' user ' @ ' 127.0.0.1 ' identified by ' password ' withgrant option;
This statement authorizes a user to access a library in full access. This user is only able to access this library, without a lot of other authorizations.
It says that only the root user can create the database, and for the other operations above, such as creating a table, adding a view, new, delete the index and the key, just go through the above authorization, you can do it.
2.1.7, SQL
SQL is equivalent to some of the company's rules, directives. For example, we communicate in the company that we can publish to the production environment. It sounds simple, including the possibility of product validation, a variety of tests, can be carried out by the operation of the line.
SQL is our instructions for dealing with databases. As a matter of fact, there are a few SQL commands that have been told about each concept, and I would like to mention a few points here.
Handwriting commands, very important, excellent network requirements of their students must have the ability to hand-written commands, can not leave phpmyadmin there is no way to work.
SQL above, though introduced so much, we classify it, in fact, this classification has been done early. Mainly for DDL, DML, DCL (of course another TCL, we don't speak today).
(1) Data definition. (SQL DDL) to define the creation and undo operations of SQL schemas, basic tables, views, and indexes. The above, mostly DDL.
(2) Data manipulation. (SQL DML) data manipulation is divided into two categories: Data query and Data update. Data update is divided into three types of operations: INSERT, delete, and change. This is what we are familiar with the deletion and modification of the operation.
(3) Data Control (DCL). Includes authorization to basic tables and views, descriptive narration of integrity rules, transaction control, and so on. This is what we have involved, permission handling.
2.2 Academic Concepts
? 1, Paradigm and redundancy
– The first paradigm, the column can not be cut, for example, the interest field, it may be written in basketball, movies and many other items. In principle, this table is not satisfied with the first paradigm. A user interest table should be created specifically, with UID and interest. Then a person has multiple records. So when you define a database field, you need to explicitly indicate the data type, except for the text type, which makes it very difficult to attach more than one message to a field.
– The second paradigm, where the primary key determines all other properties, such as a class student table, a school number, determines the student's other information, for example, there is a table, there is a school number, the school number, student name, age, college, dean. The student number and the college number is the primary key, it seems that there is no mistake, the students in this college number, decided which college, the Dean is who who. Only this is not satisfied with the second paradigm, because the college and Dean is not determined by the school number of the primary key, but by the college number to decide. So to spin it off.
– and three paradigms, all other properties are determined by the primary key, such as the above table, which has the school number, the college number, student name, age, college, dean. Study number is the primary key, it seems that there is no mistake, students in this college, the Dean is who who. But this is not satisfied with the second paradigm, because the Dean is not determined by the study number of the primary key, but by the college to decide. So to spin it off.
? 2, ACID:
– Atomicity (atomicity), not intermediate failure. For example, we buy things, while buyers pay, from the buyer account deduction, and one side is to save money into the seller account, this is two steps, this is not atomic. Due to possible intermediate failures. For example, to deduct money successfully, save is unsuccessful.
– Consistency (consistency), or the example above, when the transaction is complete, the data is consistent and ends up. Can not one side more, one side less.
-Isolation (isolation), avoid concurrency confusion, a lot of data, a customer access to ask when there is no problem, multiple users in high concurrency access to ask, but also to meet the data will not be garbled.
– Persistence (durability), permanently saved, is not lost. Can not be shut down without a machine, even if you lose a piece is not.
2.3 Many other high-level terminology
1. Storage engine, transaction, composite index, connection pool, backup, recovery
2. master-slave copy, row lock, table lock, slow query, binary log, temporary table, content distribution, synchronization, database partition, library, sub-table, horizontal split, vertical split
This is a lot of other high-level terminology that we can talk about later, or in a full-stack Project Master course.
The essence and concept of database and its application practice (i)