1.1 Data Management Overview
1.1.1 what is Data Management
Compared with humans, the biggest advantage of computers is that they can run at high speed and accurately. The process of running computers is the process of executing program code and operation instructions and processing data. It can be said that data processing is the most basic function of a computer, and data
Management is an important part of data processing. Before further explanation, we need to clarify several basic concepts.
Information)It is a signal and message that reflects the status, development, and laws of objective things in the real world. Here
The so-called "Objective Thing" can be specific (such as people, animals, birds), or abstract (events, phenomena, or behaviors, such
The tide is changing, and the cloud is around the cloud ). Or, information can be expressed by numbers, symbols, text, sounds, images, and other media.
Knowledge that can be learned by humans. As the basis for analysis, decision-making, and behavior in people's work and life, information is objective and appropriate.
Features such as usability, portability, and sharing. Of course, information is not exclusive to us humans. It is transmitted by birds
It is the information of birds.
Data)It is a specific representation of information, such as the preceding numbers, symbols, texts, sounds, or images. In computing
In machine technology, data refers to the content processed during execution of programs or commands, while program code and commands are not "data"
. Of course, in a broad sense, program code and instructions also fall into the category of "data.
Data is usually composed of data names (Data meanings), data types, data values, and other elements.
Data Type and value range constraints. The data indicates a qualitative representation of the information (for example, a person's weight condition is "Malnutrition ",
"Normal weight", "obesity") and quantitative representation (for example, a person's weight is 18.5.
Data Processing)Data is collected, organized, organized, processed, stored, and disseminated.
In a broad sense, all work involving data can be considered as processing data.
Data processing can be divided into three types: data management, data processing, and data transmission:
Collect information and represent it with data. classify, organize, and save the data to provide data as needed.
Data.
Converts, extracts, and computes data. Processing existing data (raw data) will produce more useful data (that is
Processing results, such as the total price of a commodity in a shopping, the total consumption of members within a specified period of time) to guide people's behavior or
The development of things.
That is, information transmission-data is transmitted in various forms in space or time, including data display operations.
It does not change the structure, nature, and content of the data, but can be expressed in different forms or effects (such as tables, images, physical models
Type, audio/video, etc.), so that more people can obtain information.
As mentioned above, Data Management refers to Data classification, organization, encoding, storage, maintenance,
Search and statistics. Specifically, the collected data is reasonably classified and organized, encoded, and stored in physical storage.
Devices (such as disks) can store data for a long time. Data Maintenance refers to inserting new data and modifying the original data as needed.
And delete invalid data. Data Retrieval (that is, data query) and data statistics functions are required to quickly obtain
Data to meet various usage requirements.
As humans enter the information society, in the field of computer application technology, there is an increasing demand for data management.
The rapid development of machine data management technology can be divided into three stages:
1. manual management
In the early stages of computer technology development, computers were mainly used for scientific computing, and their hardware storage devices were mainly external tape drives,
Card machine, tape machine, etc. At that time, no direct access storage devices such as disks were invented; software was also in its infancy-
There is no operating system or software dedicated to data management. During this period, data organization, storage, and management were completed manually by programmers,
This is called the "manual management stage ".
Manual data management features:
-Data is not stored separately, but mixed with program code. Therefore, data is not independent and must be modified.
Program modification;
-The programmer must define the data storage structure in the program and implement its access operations, forcing the programmer to directly interact with things
Dealing with devices makes programming more difficult and inefficient.
-Data is related to the program, which means that even if multiple different programs use the same data, they cannot be shared.
Data must be defined, stored, and managed separately, which easily results in high data redundancy and data cannot be guaranteed.
.
We can see that the data management efficiency in this phase is very low.
2. File Management stage
Organizes and stores data in files for a long time on computer storage devices, and is implemented by the file system in the operating system.
Unified management-the file system has specialized data management software to provide relevant data access, query and maintenance functions, specifying
1. File structure and common access methods.
The management of data files is not specific to specific applications, but it still lacks the independence of data and programs.
Positive data sharing. Another problem is that when the logical structure of data changes, the structure definition and
The corresponding program code increases the difficulty of program maintenance and upgrade.
3. Database Management stage
Database Management evolved from file management, and is an advanced stage of computer data management technology. To meet a large number
Centralized data storage, data sharing by different types of programs, concurrent access by multiple users, maintenance of data integrity, distributed processing and
In terms of security, people began to try to completely separate data storage and management with applications, and minimize
As a result of data redundancy, a database management system emerged. The features of database management are described in detail in the next section.
1.2.1 main features of the database
As the name suggests, databases are the places where data is stored. To be accurate, a database (database) is based on a certain amount of Data.
A collection of data organized by a model. It also provides the data management function. Databases have the following features:
Data sharing has two meanings: different users can access the data in the database at the same time.
Users can access the database in different ways, such as using management tools to directly operate the database, or using network client applications.
Program to access the database.
Compared with the file system, because the database system achieves data sharing, users do not need to create a dedicated
. Reduces possible duplicate data (data redundancy) to facilitate data consistency.
High independence between data and programs
Data independence includes the logical structure and application of data in the database, as well as the physical storage structure of data.
Does not affect the logical structure of data. The knowledge about the database storage structure will be described in detail later
- Centralized data management
The database management system provides unified data storage and maintenance (add, delete, and modify
Data), retrieval, and simple statistical functions. applications that use data only need to follow a unified interface to deal with the database management system,
Call its data management function without having to fight for it.
The premise for centralized data management is to establish a unified data model to organize data and reflect
Contact, but the file management method is difficult to do this, the file management data is in a scattered state,
The file type and storage structure are not uniform, and the performance in ensuring data consistency and concurrent access is also very limited.
Ensure data consistency, security, and reliability
Only by achieving centralized data management can we ensure the reliability of shared data. Otherwise, it will easily lead to concurrency of multiple applications.
Inconsistency during data access. The database management system supports concurrent access to data (multiple
To control the interaction between users and ensure data consistency;
Security and reliability are the constant theme in database technology. The database management system will provide related technologies,
Prevents unauthorized operations, error updates, and unauthorized use of data, and provides troubleshooting, diagnosis, and recovery work.
Has or functions.
1.2.2 database terminology
Database (Database, DB)
A database is a collection of data organized according to a certain data model. The data in the set is as unique as possible, and
Is a variety of application services for a specific organization. The data structure in the collection is independent of the application that uses it, and,
Deletion, modification, and retrieval are managed and controlled by a unified software (Database Management System.
Database Management System (DBMS)
A database management system is a large-scale software used to manipulate and manage databases. It is used to establish, use, and maintain databases. DBMS
Unified management and control of databases, including data addition, deletion, modification, and retrieval, to ensure database security and
Integrity. The user accesses data in the database through the DBMS, and the database administrator also performs database maintenance through the DBMS
. DBMS provides a variety of functions, supporting multiple applications and multiple users to perform concurrent access to the database in different ways.
Database Administrator (DBA)
The Database Administrator is the person responsible for creating, monitoring and maintaining the entire database. His/her responsibilities include: database installation and Data
Database configuration and management, permission settings and security management, monitoring and performance adjustment, backup and recovery, and other general problems are solved
Enable data to be effectively used by any authorized person.
The database administrator is a highly professional position, which is relatively more stable and has a higher salary, generally at the technical level.
Highly qualified personnel. At present, the domestic division of labor in this field is not very detailed, most small and medium enterprises do not have a dedicated
DBA, but also by the network administrator.
Database Application System (DBAS)
In the database field, various application systems that use databases are often referred to as database application systems. To be accurate,
A database application system is a software system that can store, maintain, and process data in real time.
The database management system, database administrator, application, and end user are composed of five parts. The database stores data in a centralized manner.
The data warehouse management system provides unified management and maintenance of databases. These two systems are commonly referred to as database software;
Applications interact with the database management system, indirectly access and operate data in the database; database administrators operate on Database Management
System for database management and maintenance, while the end user only interacts with the application.
Database
Database Management System
Application End User
Database Administrator
Data Warehousing)
Data Warehouse is a topic-Oriented (Subject Oriented), integrated (Integrate ),
Relatively stable (Non-Volatile) data set that reflects historical changes (Time Variant), used to support management decision-making.
A data warehouse is an environment rather than a product that provides current and historical data for user decision-making support.
Data is difficult or cannot be obtained in traditional operational databases. The data warehouse technology is designed to effectively integrate operational data into the system.
A general term for various technologies and modules that provide decision-making data access in an environment. Everything is done to make the user faster
It is convenient to query the required information and provides decision-making support.
Simply put, a data warehouse collects and sorts data from different databases for long-term trends.
Analysis. Data Warehouse applications are generally reflected in complicated operations, involving a large amount of data, but the degree of concurrent work is not very high
Many, and the requirements for speed are not very strong. In addition to requiring strong data management methods for database systems, such applications include: complete
In addition to stability control, security control, and reliability assurance, the database system is also required to support large data volumes and effectively
Statement optimization, powerful statistical functions, and strong batch processing capabilities.
1.2.3 database classification and Development History
As mentioned above, the emergence of databases (database management systems) overcomes the problems of file management systems.
Limitation, and its subsequent development was driven by the following demands: To provide greater independence between applications and data, thus reducing
Application development and maintenance costs; managing increasingly complex data types and structures; reducing technical requirements for database users-
-Users do not have to have programming basics, nor need to have a detailed understanding of the storage and management details of data in the database;
Convenient and fast data access service; support for stronger data sharing, consistency, security and reliability; Provide more powerful data
Supports data statistics and decision-making.
The core and foundation of the database system is the data model. Various database systems work based on a certain data model. Therefore, the traditional
The database system is divided into three types: mesh database, hierarchical database, and relational database.
Mesh Database
As the name suggests, the mesh database uses a mesh data model with the record type as the node, which is a navigation type
(Navigation) database-when operating the database, the user must specify not only the object (data) to be accessed, but also
Access path.
The world's first Network Database Management System (also the first database management system) was developed by GE.
Charles Bachman and others developed successfully in the early 1960s S. This is called IDS (Integrated DataStore, Set
The Database Management System has laid the foundation for the mesh database, which was widely promoted and applied at that time.
As a database product in the initial stage, IDS can only run on the host of a general company.
1971, number in the US Data Systems Commission (CODASYL)
Data Base Task Group (DBTG) presents a well-known DBTG report for the mesh Data model and
The language was defined and modified and supplemented twice in 1978 and 1981. Therefore, the mesh Data Model is also called CODASYL.
Model or DBTG model. In 1984, the American National Institute of Standards (ANSI) proposed a Network
Definition Language, NDL.
In 1970s, there were a large number of DBMS products for mesh databases. Famous software company Cullinet
The company's IDMS, Honeywell's idⅱ, Univac's DMS1100, and HP's IMAGE.
The mesh database model can simulate hierarchical and non-hierarchical information.
Mesh DBMS is more common than hierarchical DBMS. In the history of database development, network databases play an important role.
Hierarchical Database
A hierarchical database is immediately behind a network-type database. It uses a hierarchical data model (similar to a "Tree" in a data structure ")
To simulate things organized in layers in the real world. The hierarchical database management system also accesses data by record.
The most basic data relationship in the Data Model is the basic level relationship, which represents the one-to-many relationship between two record types, also known as double
Parent-Child Relationship (PCR ). The database has only one record type with no parent and is called the root node.
Point. Other record types have only one parent.
The advantages of the hierarchical data model are simple structure, clear hierarchy, easy to understand, and good integrity support. Its
Disadvantages include: It is cumbersome to process non-hierarchical connections, such as multi-to-Multi-contact data. You can only introduce redundant nodes or virtual knots.
The query subnode must be accessed from the root node through the parent node, which reduces the data access efficiency.
There are many restrictions on data insertion and deletion.
The most famous hierarchical database product is IBM's IMS (Information Management System ).
The earliest large-scale database system developed by the company.
Relational Database Service
Relational Database (RDB) uses the relational table structure to store and manage data, and specifies
And the data dependency between tables. It can be easily understood that the table structure in relational databases is frequently used in our daily life.
Data Tables (such as tables in Office software word/Excel ).
The data table structure can support more data types and add the required constraints, which also conforms to the Technical source from the real world,
And serves the essential laws of production activities in the real world.
Relational databases use Structured Query Language (SQL) as the client program and number
Communication between database servers-the client sends SQL commands to the server, and the server executes the relevant commands and returns
The query result. The relational model has a simple structure, a simple concept, and is easy to learn and use. The query efficiency of a relational model is often inferior to that of a non-relational model.
The relational model needs to be optimized to Improve the Performance, which increases the burden on users to a certain extent, and
Knowledge has high requirements.
In, Edgar Codd, a researcher at IBM, published his famous paper titled large shared database data
A Relational Model of Data for Large Shared Data Banks
A milestone, which began the relational database software revolution. After more than 30 years of continuous development
Is the mainstream of today's applications. Currently, popular large relational databases include Oracle, IBM DB2, Microsoft SQL Server,
SyBase and Informix.
Copyright description:
This book is the original teaching material of Beijing xinkehai school-v512 studio. It is copyrighted and cannot be used for profit purposes. For reprinting, please indicate the source.
Contact Author:
Zhang liguo zhangliguo@tsinghua.org.cn
Related Resources:
Xinkehai school http://www.jobedu.com.cn/
V512 studio http://www.v512.com
Http://www.verycd.com/topics/215898/ for HD video downloads