The essence and concept of database and its application practice (i)

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Please see the original: http://www.ucai.cn/blogdetail/7033?mid=1&f=5

Can be run online to see the effect Oh!

Today, this class is divided into three big points, as the title points out, is to discuss the nature of the data, concepts and application practices. The 1th essence of the discussion is to stand at a higher altitude to analyze the production of data and various use scenarios, and then the data-related storage means to make a summary of the explanation. And the concept, is a chuanjiang, mainly placed in the relational database, because we June public class, will also be based on relational database. The 3rd is to consolidate the contents of the previous two points through some examples.

First, the origin and nature of the emergence of the database

1.1 Origin of the database

Imagine our computer, there are directories, files, files have size, location, there is a format, this is the beginning of the file system has this concept, there must be a place, stored the parameters of these files or metadata, and in our daily life, every object and person, have their own different characteristics, When a batch of these descriptive people's characteristics, want to data storage, also need to have a format to save. In addition to saving data, we have to manage the data. This is the background of the database, and as the usage scenario grows more complex, the database itself has more derived versions.

So the database originates from the storage and management of the data.

1.2 Usage Scenarios for databases

So we have a few questions, based on your current understanding, the first question is what kinds of data need to be stored, what are the storage methods? Let's scatter a little to think.

The following conditions can be obtained:

A, in the program custom, such as arrays, only in memory, and so on, after the program has stopped, the data will be gone. Does that make it possible?

Stored in this case is descriptive data, or it can be a binary array.

B, saved in a text file can not, have their own format, such as a custom format, such as XML, such as JSON format

Only descriptive data is stored here.

C, the use of some file-based databases, such as SQL Access support text database, such as SQLite, such as access

In addition to specific data types, text description data is typically saved only.

D, the use of MySQL network-based and distributed database system.

User data, relational data, and transaction data that are used to hold more data, and that data is quickly queried for updates.

E, when the volume of data is large, a device can not meet the requirements, which requires the use of a distributed database, may be a data cluster.

F, using software such as memcached to save the cached data.

Typically used to hold text data or count.

G, the use of Fastdfs such a distributed file system to save the file data.

Not used to save descriptive data other than the description of the file itself, to hold a large number of files.

H, using KVDB software such as Redis to store descriptive data for simple relationships, or specific structures.

For example, to save Weibo data, key can be used to quickly locate micro-blog content.

I, similar to Kvdb, have you heard the students in the queue? Is it also a special means of data storage? While someone is producing data, someone consumes data.

J, used to save geographic relational data, and to provide operations. In English, it's called Spatialdata, which is spatial data. Oracle is best at it, and now there is room for MySQL to expand.

Http://dev.mysql.com/doc/refman/5.1/zh/spatial-extensions-in-mysql.html

K, and for a large number of, a steady stream of massive data, relatively less regular data storage, we may be stored in the Hadoop system.

L, finally talk about a scene, like Baidu search engine, he will put all his data in MySQL, and then to retrieve it? Obviously not. Has its own index format and system that supports distributed queries.

1.3 Database Scenarios and classification summary

To summarize:

A, before the computer era, human beings have produced a lot of information, and since the computer era, has produced a large number of digitized data. And the preservation of these databases in the form, from the above can be seen, depending on the scene, different ways of preservation, are diverse. There is only in memory, there are saved in the file, there is a dedicated strict database system to save, there are less rigorous kvdb save.

B, according to the above various storage scenarios, we also do two simple classification of data, one is structured database and unstructured data classification, of course, plus a semi-structured data. The so-called structured, is a very regular property of a situation. Like a person, he has height, weight, age, this description is structured, you take a picture, record a sound, take a video, although this is the data, but is not structured. For these two kinds of data, the storage method is different, for example, structured data is the structure of a typical two-dimensional table. Unstructured data is stored only in the form of raw data.

The above scenes are mostly structured data processing scenarios, like text databases, MySQL databases. On Fastdfs, on Hadoop, there is a lot of unstructured data storage.

C, according to the size of the data itself, we can also do a classification. or a specific technical application scenario.

such as small data volume data, we may for convenience, in the form of text to save. And for a regular service to provide the site, like user data, such as the network user data, will be used to save the MySQL database, in order to access the speed, will also use memcached such caching software. You can also use Fastdfs to save small files. When there is a large number of logs and other data generated, the need for data analysis, such as the daily data volume of hundreds of g,t to count, it will use software such as Hadoop.

Therefore, different data volumes also use different ways of saving. Small data, medium to large data, massive data, or big data.

D, the fourth angle is from the above can be seen, data storage methods, there are many categories, different categories for different scenarios. The database is very powerful, but not all scenarios are appropriate to use the database. For example, the above-mentioned full-text search scenario, the general need to use a specific storage, specific indexing method. If specifically, a full-text search uses an index called an inverted index.

Ii. What are the concepts of the database

All right, we're done. Some usage scenarios for the data and related software have a concept for different software scenarios. Let's take a look at some of the terminology commonly used in the domain of databases, which are used throughout our database learning.

Due to the time of the Open class, we mainly exchange the basic concepts, for the advanced concept, the following public lessons will be continued to be discussed.

2.1 Basic concepts of the database

When we introduce these concepts of our database below, we compare them with one company.

2.1.1, library

Library, as opposed to a company, almost all of the following concepts are installed in this company.

Before using the database, we have to create the database on the database software.

In MySQL, the same connection, can only be attached to the same library, of course, as long as the permissions allow, you can also implement cross-Library query.

The syntax for creating a database is

CreateDatabase xxx;

Only the root user can create the library.

2.1.2, table

The table is relative to, a department of the company, this department tube user, that Department tube order. is one table after another.

Tables consist of records and fields.

The field table is a table definition.

Records are composed of table data.

CREATE TABLE ' hstesttbl ' (  ' k ' int (one) not null auto_increment,  ' V ' char (255) is not null DEFAULT ' ',  PRIMARY KEY ( ' K ')) Engine=innodb DEFAULT Charset=utf8

Field definitions have field names, field types, whether they are empty, self-growing, default values, and so on.

The record is the actual data value, and the definition of the record for the same table is the same. In other words, there are the same characteristics.

2.1.3, view

The view class is equivalent to a virtual project team, which can be from the same department or across departments.

The view is said to be virtual because there is no actual data. Both the definition and the data are grouped together from other tables.

CREATE VIEW Hsview as SELECT * from HSTESTTBL; Mysql> Show tables;+--------------------+| Tables_in_hstestdb |+--------------------+| Hstesttbl         | | hsview            |+--------------------+2 rows in Set (0.02 sec) mysql> Show create TABLE hsview\g;******** 1. row***************************               view:hsview        Create view:create algorithm=undefined definer= ' root ' @ ' localhost ' sqlsecurity definer VIEW ' Hsview ' as select ' Hstesttbl '. ' K ' as ' k ', ' hstesttbl '. ' V ' as ' V ' from ' hstesttbl '

As you can see from the above command, views are similar to the operations of a table. You can also modify field names from tables to views.

2.1.4, Index

An index is an attribute that is used by each employee of the department to quickly find an employee. such as jobs, such as hobbies, such as graduation school. By doing this, you can classify the people in the Department, and each query can narrow the result set.

Create an index on the table, for example, or the table above, ALTER TABLE HSTESTTBL add index (XXX);

2.1.5, key

The key is equivalent to the Department of some of the most efficient and unique contact information, such as telephone, mailbox, number, QQ number is.

Add key to add index, similar, delete key here to introduce,

Like what:

ALTER TABLE HSTESTTBL drop PRIMARY key;

2.1.6, permissions

refers to the internal restrictions of the company some discipline, such as outside the company, can not know the company's internal things.

We can pass

Grantall privileges on dbname.* to ' user ' @ ' 127.0.0.1 ' identified by ' password ' withgrant option;

Such a statement authorizes a user to have full access to a library. This user can only access this library, without obtaining more authorization.

It says that only the root user can create the database, and for the other operations above, such as creating a table, adding views, new, delete indexes and keys, as long as the above authorization, you can do.

2.1.7, SQL

SQL is equivalent to some of the company's rules, directives. For example, we communicate in the company that we can publish to the production environment. It sounds a simple word, including the possibility of product validation, various tests, can be carried out by the operation of the line.

SQL is our instructions for dealing with databases. In fact, when talking about various concepts, have already talked about some SQL commands, I would like to mention a few points here.

Handwriting commands, very important, excellent network requires their students to have the ability to hand-written commands, can not leave phpmyadmin there is no way to work.

SQL above despite the introduction of so many, we have a classification of it, in fact, this classification has been done early. Mainly for DDL, DML, DCL (and of course, a Tcl, we don't speak today).

(1) Data definition. (SQL DDL) to define the creation and undo operations of SQL schemas, basic tables, views, and indexes. The above, mostly DDL.

(2) Data manipulation. (SQL DML) data manipulation is divided into two categories: Data query and Data update. Data update is divided into three kinds of operations, inserting, deleting, and modifying. This is our usual familiar with the deletion and modification of the operation.

(3) Data Control (DCL). Includes authorization for basic tables and views, description of integrity rules, transaction control, and so on. This is what we have involved, permission handling.

2.2 Academic Concepts

? 1, Paradigm and redundancy

– The first paradigm, columns can not be divided, such as interest field, it may be written in basketball, movies and many other items. In principle, this table is not satisfied with the first paradigm. A user interest table should be created specifically, with UID and interest. Then a person has multiple records. So when you define a database field, you need to explicitly indicate the data type, except for the text type, which makes it difficult to attach multiple information to a field.

– The second paradigm, the primary key determines all other properties, such as a class student table, a school number, determines the student other information, such as a table, there is a school number, the college number, student name, age, college, dean. The student number and the college number is the primary key, it seems that there is no mistake, the students in this college number, decided which college, the Dean is who who. However, this is not satisfied with the second paradigm, because the college and Dean is not determined by the main key of the study number, but by the college number to decide. So to spin it off.

– and three paradigms, all other properties are determined by the primary key, such as the above table, which has a school number, a college number, student name, age, college, dean. Study number is the primary key, it seems that there is no mistake, students in this college, the Dean is who who. However, this is not satisfied with the second paradigm, because the Dean is not determined by the main key of the study number, but is decided by the college. So to spin it off.

? 2, ACID:

– Atomicity (atomicity), not intermediate failure. For example, we buy things, while buyers pay, from the buyer's account deduction, the other side is to save money into the seller's account, this is two steps, this is not atomic. Because there may be intermediate failures. For example, the deduction of money success, saving did not succeed.

– Consistency (consistency), or the above example, when the transaction is complete, the data is consistent and ends up. Can not one side more, one side less.

-Isolation (isolation), avoid concurrency confusion, a lot of data, a customer access is not a problem, multiple users in high concurrency when access, but also to meet the data will not be garbled.

– Persistence (durability), permanently saved, is not lost. Can not be shut down without a machine, even if you lose a piece is not.

2.3 More advanced terminology

1. Storage engine, transaction, composite index, connection pool, backup, recovery

2. master-slave copy, row lock, table lock, slow query, binary log, temporary table, content distribution, synchronization, database partition, sub-Library, sub-table, horizontal split, vertical split

This is more of a high-level term, and we'll talk later, or in the whole stack of engineers ' courses.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

The essence and concept of database and its application practice (i)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

The essence and concept of database and its application practice (i)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support