PostgreSQL Architecture Summary

Source: Internet
Author: User
Tags postgresql psql system log dedicated server





Reference: http://book.51cto.com/art/201201/313175.htm


System Architecture


PostgreSQL database consists of the connection management system (System Controller), the compiled execution system, the storage management system, the transaction system, the system table, and its composition structure and relationship 2-1 are shown.



The connection management system accepts the request of the external operation to the system, preprocessing and distributing the operation request, and plays the logical control function of the system.



The compiled execution system consists of the query compiler and the query executor, which completes the processing and transformation of the operation request in the database, and finally realizes the operation of the data in the physical storage medium.



The storage management system consists of index manager, memory manager, external memory manager, which is responsible for storing and managing physical data and providing support for compiling query system.



The transaction system consists of transaction manager, log Manager, concurrency control, lock manager, log Manager and transaction manager to complete the transaction consistency support for Operation request processing, and the lock manager and concurrency control provide consistent support for concurrent access data;



system table is the meta-Information Management center of PostgreSQL database, including database object information and database management control information. The system table manages the metadata information, which connects each module of PostgreSQL database organically to form an efficient data management system.













Database initialization (INITDB) process



 ? ? ? ? Execution of the INITDB program begins with the main function in the INITDB.C file, as shown in the execution flow 2-3 of the main function. Initdb performs the following tasks in sequence:



1) Gets the input command name based on the command line parameters entered by the user.



2) Set the system encoding to Lc_all, find the absolute path to execute the command, and set the path.



3) Set Environment variables (pg_data, etc.), get the source file path of the system configuration file (Postgres.bki, postgresql.conf.sample, etc.), and check the availability of the files under the path.



4) Set the interrupt signal processing function, the terminal command line SIGHUP, program interrupt SIGINT, program exit Sigquit, software interrupt sigterm and pipeline interruption sigpipe signal such as shielding, to ensure that the initialization of the work smoothly.



5) Create the data directory, and some necessary subdirectories under the directory, such as base, global, BASE/1, etc.



6) test the current server system performance, create the configuration file postgresql.conf, pg_hba.conf, pg_ident.conf with the test results, and make some settings for the parameters defined therein.



7) Create the database template1 in bootstrap mode, stored in the subdirectory base/1/of the data directory.



8) Create system views, system table toast tables, and so on, and copy template1 to create TEMPLATE0 and Postgres, which are all done with normal SQL commands.



9) The printing operation is successful and other relevant information, exit.



Initdb is a standalone program in PostgreSQL, whose main task is to initialize the dataset cluster, create a template database and system tables, and insert an initial tuple into the system tables. After that, users create databases, tables, views, indexes, and other database objects and perform other operations on the basis of the template database and system tables. ? ? ? ?





Database Set Cluster



After the PostgreSQL installation is complete, before doing anything else, you must first initialize the data store on the disk by using the INITDB program, which is the data set cluster, the user database that is managed by PostgreSQL, and the system database, which is always called the data set cluster . In PostgreSQL implementations, a database is a collection of files on a disk, except that they have a specific file name, storage location, and so on, and some files are associated with each other. By default, all data for PostgreSQL is stored in its data directory, which is typically referenced by the environment variable Pgdata, which will be used to refer to the data directory in Pgdata.



In PostgreSQL, the object identifier (OID) is used to uniquely identify a database object in the entire dataset cluster, which can be a database, a table, an index, a view, a tuple, a type, and so on. PostgreSQL provides the OID data type to represent the OID, which is actually an unsigned integer.



oid--Object identifiers



The OID is usually allocated starting from 1, but when the DataSet family is initialized, a subset of the OID is assigned to a database object such as a system table, a system table tuple, an index on a system table, and this part of the OID can be found in the header file corresponding to the system table. Also, in order to extend the scope for subsequent releases, a subset of OID resources are reserved for initializing the dataset cluster. In this way, the OID resources that can be allocated while the system is running are actually starting at 16384. In the PostgreSQL source code Src/include/catalog subdirectory There is a shell script unused_oids used to output the usage of the OID that is pre-allocated and reserved in the current version.





System Database


After the dataset cluster is created, the cluster contains three system databases Template1, TEMPLATE0, and postgres by default. Both Template0 and Postgres are copied from the TEMPLATE1 during the initialization process.



The template1 and TEMPLATE0 databases are used to create databases . PostgreSQL creates a new database in the same way that it is copied from the template database, and the "-t" option is used in the command to create the database to specify which database is to be created as a template to create a new database.



  template1 database is the default template for creating database commands template1 allows users to make a custom template database in which users can create tables, data, indexes, and so on that the application needs, you can use template1 as a template when you need to create a database of the same content multiple times in the future.



because template1 content is likely to be modified by the user, PostgreSQL provides the user with the need to create a "clean" database TEMPLATE0 Database As the most initial backup data , you can use Template0 as a template to generate a "clean" database when you need it.



The third initial database, Postgres, is used to provide the initial user with a database that can be connected, just like a user's home directory in a Linux system.



The above system database can be deleted, but the two template database must be deleted before deleting the Datistemplate property of the tuple in pg_database to false, otherwise you will be prompted to "cannot delete a template database".





system tables, views


In the relational database, in order to realize the control of the database system, the function of the data dictionary must be provided. The data dictionary stores not only the descriptive information of various objects, but also the details of the various objects required for the management of the system. In terms of content, a data dictionary contains descriptive information about all objects and their attributes in a database system, descriptive information about the relationships between objects, the natural language meaning of object properties, and the history of data dictionary changes (that is, the state information of the database). Data dictionary is the core of the management control information of relational database system, and the system table plays the role of data dictionary in PostgreSQL database system .



The system table is the place where the PostgreSQL database holds the structure metadata, which is represented in PostgreSQL as a normal table or view that holds system information. Users can delete and then rebuild the tables, add columns, insert and update values, but the user's ability to modify the system can result in inconsistencies in system information, resulting in system control disturbances. The system table information is automatically maintained by the system table operation associated with the SQL command, which should not normally be manually modified by the user. For example, creating a database statement inserts a row into the Pg_database system table and creates the database on disk.



Each database in PostgreSQL has its own set of system tables, most of which are copied from the template database when the database is created, so the data in these tables is related to the owning database. Only a handful of system tables are shared by all databases (such as Pg_database), and the data in these systems table is about all databases.



Because the system tables hold all the metadata for the database, access to the system tables is very frequent when the system is running. In order to improve the system performance, the shared system table cache is built in memory, and the hash function and hash table are used to improve the query efficiency, which will be described in detail in the 3rd chapter.





Process Structure


The main functions of the PostgreSQL system are focused on the Postgres program, where each process is a process that is formed by loading the Postgres program, except where the branch is located at runtime. PostgreSQL uses a dedicated server process architecture with the following main processes:






The entrance to the Postgres program is the main function of the main module.



Daemon Postmaster ( Postgres process in single-user mode):



After the data set cluster is initialized, the user can start a DB instance to run the database management system, and in multi-user mode A DB instance is managed by the database server daemon postmaster. Main functions:



1. It is a master process running on the server, which is responsible for the startup and shutdown of the whole system, and completes the system recovery in the event of an error in the service process.



2. It manages database files, listens for and accepts connection requests from clients (PSQL,JDBC, etc.), and fork a Postgres service process for client connection requests to execute various commands on the database on behalf of the client.



3. Postmaster also manages the worker processes associated with database operations. Users can start postmaster using the postmaster, Postgres, or PG_CTL commands.






PostgreSQL uses the C/S mode and the system assigns a service process to each client. Postmaster is like a dispatch center that handles client requests. When a front-end application wants to access a database, it calls the interface library (such as ODBC, LIBPQ) to send the user's request over the network to the daemon postmaster. Postmaster will start a new service process Postgres for the user, after which the front-end processes and service processes no longer pass through postmaster but communicate directly.



Service Process Postgres:



The service process Postgres accepts and executes commands (Interactive SQL queries)that are sent by the client (such as psql, or user applications through the interface such as JDBC) . It invokes each major function module (such as compiler, optimizer, executor, etc.) on top of the underlying module (such as storage, transaction management, indexing, etc.), completes various database operations on the client, and returns execution results.



Each time a client creates a database connection, postmaster generates a Postgres server process that directly accepts the user's commands for compilation execution and returns the results to the user. This loop is not connected until the user disconnects. the server process communicates with each other through semaphores and shared memory.



Worker processes (to implement different functions, respectively):





System Log Process Syslogger:



Background Write process Bgwriter:



Pre-write Log write process walwriter:



Pre-write Log archive process pg_archive:



System automatic Cleanup Process autovacuum: The data space in the external memory is automatically scheduled to be sorted (cleaned).



Statistical data collection Process Pg_state: Some dynamic information will be automatically counted in the system operation.





PostgreSQL Architecture Summary


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.