interprocess communication "A tour of PostgreSQL Internals" Learning notes-interprocess communication
2016-09-17 17:24 by not me in, 32 read, 0 reviews, Favorites, compilation
Mid-Autumn Festival holiday so fast, and these days have been raining, simply to read at home. This is Tom Lane's "A Tour of PostgreSQL internals". This little essay is even a study note. The garden inside the big God, if there is where the wrong, also ask you predecessors more advice ~
In this ppt, big god Tom Lane introduced the internal principle of PostgreSQL from three angles respectively.
View 1 PostgreSQL process and communication between internal processes
This section is brief, mainly describes the client/server communication between the server internal communication.
1. Communication between client/server
A picture wins thousands of words, we first
will be divided into two, the client and server side:
1.1 Client
In the client, there are two main types of processes, one is the client's own application (customer application), such as: Psql, these programs send DB requests to the server or accept the query results returned from the server side;
The other is the Client interface library, which is mainly used to deal with the front and back communication protocols. They are a set of communication protocol interface libraries: LIBPQ, ODBC, Jdbc,perl dbd, and so on. It is worth mentioning that LIBPQ is a protocol processing library implemented by PostgreSQL using the C language. Using this library makes it easier to communicate with the backend. In addition to C, other languages such as Perl and PHP are supported, and these languages also call LIBPQ internally. (Psqlodbc uses LIBPQ for both front-end communication after version 09.05.0400), and of course, there are libraries that communicate directly with PostgreSQL without relying on LIBPQ. Compared to the representative Java, PostgreSQL's JDBC driver is not dependent on LIBPQ to communicate directly with PostgreSQL.
1.2 Server-side 1.2.1postmaster processes
On the server side, postmaster is the resident process that manages the backend, by default listening on the UNIX Domain socket and TCP/IP 5432 ports, waiting for connection processing from the front end. Once there is a front-end connection, the Postgres will generate the child process by fork (2). For Windows platforms without fork (2), a new process is generated using CreateProcess (). In this case, the parent process's data is not inherited, so it is necessary to use shared memory to inherit the parent process's data.
1.2.2postgres process
Postgres will accept the front-end query, then retrieve the database, and finally return the results (select) or update the database (Update,delete,set, etc.). The updated data is also recorded in the transaction log (PostgreSQL is called the Wal log), which is mainly used when the server is restarted when the power outage is resumed. In addition, the log archive is saved and can be used when recovery is required. After PostgreSQL 9.0, by transferring the Wal log to other PostgreSQL, database replication can be done in real time, which is known as the ' database Replication ' feature.
1.2.3 Other processes
In addition to the above two processes on the server side, there are many other worker processes that are initiated by the postmaster process.
A) Writer process
Writer Process writes the cache on the shared memory to disk at the appropriate point in time. Through this process, it is possible to prevent a large number of writes to the disk at checkpoint time (checkpoint), resulting in degraded performance, allowing the server to maintain relatively stable performance. Background writer has been in memory since then, but not always working, it will sleep after working for a period of time, sleep interval through the postgresql.conf inside the parameters Bgwriter_delay set, the default is 200 microseconds.
Another important feature of this process is the periodic execution of checkpoints (checkpoint). At the checkpoint, the cached content on the shared memory is written to the database file, making the memory and file state consistent. This can shorten the time to recover from the Wal when the system crashes, and it can also prevent the Wal from growing indefinitely. You can specify the time interval to perform checkpoints through Postgresql.conf's checkpoint_segments, Checkpoint_timeout.
b) WAL writer process
The Wal writer process caches the Wal on shared memory at the appropriate point in time to write to disk. This reduces the stress of the back-end process when writing its own Wal cache, improving performance. In addition, when the asynchronous commit is set to true, the contents of the Wal cache can be guaranteed to be written to the Wal log file within a certain time interval.
c) Archive Process
The Archive process transfers the Wal log to the archive log. If you save the underlying backup and archive logs, you can reply to the latest state of the database even when the disk is completely damaged.
d) Stats collector process
The collection process for statistical information. Collect statistics on the number of visits, the number of disk visits, and so on. In addition to the information collected can be used by autovaccum, other database administrators can also be used as reference information for database management.
e) Logger Process
The active state of PostgreSQL is written to the log information file (not the transaction log), and the log file is rotate at the specified time interval.
f) autovacuum START process
Autovacuum launcher process is dependent on the postmaster indirect start vacuum process. And its own is not directly start the automatic vacuum process. By doing so, the reliability of the system can be improved.
g) Automatic Vacuum process
Autovacuum worker process processes actually perform vacuum tasks. Sometimes multiple vacuuming processes are initiated.
h) Wal Sender/wal receiver
The Wal sender process and the Wal receiver process are the processes that implement PostgreSQL replication (streaming replication). The Wal sender process transmits the Wal logs over the network, while the Wal receiver process for other PostgreSQL instances receives the corresponding logs. The host PostgreSQL (also known as standby) of the Wal receiver process receives the Wal log and restores it on its own database, generating a database that is exactly the same as the send-side PostgreSQL (also known as Master).
2. Intra-server communication
Or are you going to:
Postmaster after receiving a request from the client, the shared memory is created, and the data in memory is read from the UNIX system hard disk for the Postgres process to read and write, that is, the client's operation of the data in the PostgreSQL database is not written back to the physical disk in real time. Instead, the writer process periodically writes back to the disk, where you can refer to several of the processes mentioned above.
To be able to see more clearly, summing up the above two points, we have a more detailed picture, as follows:
3.View 1 Summary
The benefits of this mechanism between PostgreSQL processes are:
- The "hard" separation between client and server makes the system have good security and reliability.
- Make the postgrsql work well in the network environment (this should refer to the Client interface library);
- It is easy to use on most UNIX systems and works well.
The disadvantages are also obvious:
- Intra-server communication is too dependent on share memory (the size is obviously limited), limiting its extensibility;
- PostgreSQL connection Start-up time is certain, for short-running client tasks, the proportion of the time overhead. However, this disadvantage can usually be solved by the client connection pooling method.
This article describes the contents of view 1 in three view of "A tour of PostgreSQL Internals", View2 and VIEW3 Let's continue our study tomorrow.
Yes, attached link, "A Tour of PostgreSQL Internals"
And this article also references the PostgreSQL srcstructure
Inter-process communication