After the previous text PostgreSQL compilation installation and debugging (a), continue to talk about the installation and debugging PostgreSQL.
The previous article explained how to compile and install PostgreSQL on a Linux system, and this time we'll simply talk about how to debug and track code on a Linux system.
I remember reading an article about Posgresql before, PostgreSQL had only about 200,000 of the code, and now has gone through 1 million lines, so huge amount of code, without a platform is simply elephant.
In order to facilitate the commissioning work, before entering the specific debugging, we first have a good understanding of the following PostgreSQL code architecture.
- Directory Structure of 1.postgresql
First level of PostgreSQL into the first directory
Config folder is mainly placed in some configuration files, contrib folder is a number of third-party plug-ins, extensions, etc., commonly used pg_standby, POSTGRES_FDW these; Doc folders don't have to be some help documents and manuals The most important is the SRC directory, where the source code of PostgreSQL is placed, and it is our main file directory for debugging and tracking. Then install the document in more detail on how to compile and install postgresql;configure and makefile These are the files to be used when the program is compiled.
Go in the SRC directory,
First of all the few makefile files and so on do not need to introduce more, mainly look at these several folders.
Bin/placed the PostgreSQL UNIX commands, such as Psql, initdb these source code;
Backend/postgresql the source code of the backend program;
include/header file;
Interfaces/code for front-end related libraries (including Pgsql's C-language library LIBPQ);
Makefiles/platform-related make settings file;
pl/the code of the stored procedure language;
port/platform porting-related code;
template/platform-related settings file;
Test/postgresql to bring the various test scripts;
timezone/time zone-related code files;
tools/Various development tools and documentation;
tutorial/a variety of related tutorials.
It can be seen that the core is the backend, bin, interfaces three directories, where backend corresponds to the backend (server side), leaving two corresponding front-end (client).
For our debugging work, most of the focus is on the back end, the backend directory, where many directories are broken down:
access/Various storage access methods (in each subdirectory) common (common function), gin (Generalized Inverted Index universal reverse index), gist (Generalized Search Tree Universal index),
Hash Index, heap (access method for heap), Index (universal indexed function), Nbtree (Btree function), TransAm (transacted) , initialization of the bootstrap/database (when initdb)
catalog/System Directory
Commands/select/insert/update/delete thought of the processing of SQL text
executor/Actuator (Access execution)
FOREIGN/FDW (foreign Data Wrapper) processing
lib/Common function
libpq/front-end/back-end communication processing
Main function of Main/postgres
Processing functions related to nodes/tree nodes
optimizer/Optimizer
Parser/sql Structure-Text parser
port/Platform-related code
Main function of Postmaster/postmaster (resident Postgres)
replication/streaming replication
regex/Regular Processing
rewrite/rules and view-related override processing
snowball/Full Text Search related (dry processing)
storage/shared memory, storage on disk, cache, etc. all once/two records management (the following directory) buffer/(cache management), file/(file), freespace/(Fee Space map Management) ipc/(interprocess communication),
Large_object/(Access functions for large objects), lmgr/(lock Management), page/(page access related functions), smgr/(Storage Manager)
Tcop/postgres (The process of the database engine) is the main part
tsearch/Full Text Search
utils/various modules (the following directory) adt/(embedded data type), cache/(cache management), error/(Error handling), fmgr/(function Management) , hash/(hash function), init/(database initialization, initial processing of postgres),
MB (multibyte word processing), misc/(Other), mmgr/(management functions for memory), resowner/(Management of data in query processing (buffer pin and table lock), sort/(sort processing), time/(MVCC management of transactions)
- 2. Debug PostgreSQL with GDB
First we need to have gdb this tool, if not, you can use the Yum command to install it automatically.
To debug PostgreSQL, let's start with the simplest SQL example to demonstrate how to debug the trace code. For example:
Select 1;
First, we use Postgres users to enter PostgreSQL:
Since we want to use GDB to track the debug program, we first need to know the PostgreSQL back-end process PID, and then can be attach on the debugging (not familiar with the GDB command can be self-Baidu under the first).
To get the PID of PostgreSQL, we have two options.
Method 1: Use the PS command to view
[[email protected] ~] - | grep postgres
We can see
The [local] idle prompt that is what we want, we know that the process PID is 16581;
Method 2. Run the following query statement directly after entering PostgreSQL:
Select pg_backend_pid ();
can also get the PID of the process, convenient and quick.
After we get the PID of the process, we can go to GDB debugging.
To open another window, we enter the following command:
[[email protected] ~] 16581
entered the GDB command line interface.
In this state, you can accept the GDB command, where we use the B command to make a breakpoint at Execresult:
This time we go back to PostgreSQL's window and execute the SQL text:
We can see that because the Postgres process has been paused, the SQL card will not move there, which is also our goal, or how to step-by-step (like the Devil's pace) debugging it?
We go back to the gdb side, run the C command, the program will continue to execute, and then stop at the breakpoint (Execresult).
As a curious baby, of course we're curious. What files are called on the execution path (nonsense, or why debug)?
OK, we execute GDB's BT command:
This big string is the stack of the function calls we dreamed of. This makes the function calls from the beginning of the program to the Execresult. Since the word "stack", we naturally have to look at the opposite, for example, we can see the first call is the main function, it in the (at) main.c file, in the main function of the No. 228 line, called the Postmastermain function, and so on to know the function's call path.
Knowing the invocation path of the function, we can take a step-by-step look at how this statement is going. Take postgresql9.5.4 as an example (limited in length and time limit, only roughly speaking):
Within the #13 main.c:
Line99: Function Memorycontextinit () starts the necessary subsystem error and memory management system;
line110: Function Set_pglocale_pgservice () Gets and sets environment variables;
line146~148: Function Init_locale initialize environment variables;
line219~228: According to the input parameters to determine the direction of the program, here entered the Postmastermain (), jump to the postmaster.c file.
Within the #12 #11#10#9 postmaster.c:
This file defines the main function interfaces and data structure definitions used by the back-end resident process "postmaster". Postmaster accepts front-end requests to establish a new backend process.
line561~623: Read context information and configuration file, complete initialization;
line630~812: Read the command arguments of the psql command line;
LINE930~1000: Establish socket communication;
LINE1100~1159: Establishing shared memory and semaphores as well as stacks and pipes, initialization subsystem (stats collection, autovacuum);
line1296: Enter Serverloop () function, jump to line1604;
Line1604:serverloop () function entry. This function loops through the connection requests on the port;
line1673~1699: To determine whether there is a "legitimate" connection request, fork a sub-process to deal with it, into the Backendstartup () function, jump to line3857;
Line3857:backendstartup () function entry. This function is responsible for opening a new backend process;
line3858~3914: Do some initialization preparation (data structure, open and close some necessary processes, etc.);
line3917: Enter Backendrun () function, jump to line4179;
Line4179:backendrun () function entry, the function runs the backend process, mainly two things: 1. Create a parameter list and initialize 2. Call the Postgresmain () function;
line4243: Call the Postgresmain () function to enter the POSTGRES.C file.
#8 within # POSTGRES.C file:
The file defines the main module of the Postgres backend, which is equivalent to the back-end main, and is responsible for scheduling the backend processes.
Line3572:postgresmain () function entry. Establish a session based on the input dbname,username and input parameters;
line3573~3801: Initialization work. Open initialization environment and default parameters, set signal processing function and other parameters, establish memory context, set share buffer, etc.
LINE3825: Enter the main processing loop of Postgres, this if statement is mainly used to determine whether the input processing is abnormal, etc.
line3933: Enters the processing loop. The loop listens for new query requests and determines the category of requests;
line4045: Judge query request is simple query, call Exec_simple_query () function, jump to line884;
Line884:exec_simple_query () function entry. This function does some initialization work, establishes a transaction command, makes a simple grammar rule judgment, parses the rewrite, and establishes a query plan for the query, and returns the query results;
line1104: Enter function Portalrun () to enter the Pquery.c file.
#6 within # pquery.c file:
The file defines the code for the Postgres back-end query statement.
Line706:portalrun () function entry. This function is responsible for running one or a set of queries;
line786: Enter Portalrunselect () function, jump to line888;
Line888:portalrunselect () function entry. This function can only perform a simple select query operation;
line942: Enter the Executorrun () function and enter the Execmain.c file.
#4 in the #3#2 execmain.c file:
This file gives the four interface functions executed, namely Executorstart () Executorrun () Executorfinish () Executorend ().
Line279:executorrun () function entry. When the function executes the main part of the module, it accepts a query descriptor and actually executes a query statement;
line285: Enter the Standard_executorrun () function. Jump to line289;
Line289:standard_executorrun () function entry. It performs "standard" queries;
line337: Enter Executeplan () function, jump to line1517;
Line1517:executeplan () function entry. Remember the query plan that exec_simple_query () said earlier? This is used here to execute the query plan.
line1541: Enter the main loop of query plan execution;
line1549: Enter the Execprocnode () function and enter the execprocnode.c file.
#1 in the execprocnode.c file:
The file provides a scheduling function that executes the query plan, with the following functions:
Execinitnode (): Initializes the node of the query plan and its subquery plan;
Execprocnode (): Get tuples by executing a query plan;
Execendnode (): Closes a query node and its subquery plan.
Line367:execprocnode () function entry;
line385: Go to the Execresult () function and jump to file noderesult.c.
#0 in the noderesult.c file:
This file provides support for the nodes of each query plan.
The Line67:execresult () function entry, which returns the tuples obtained by the query plan.
This section from #13 to #0 function call simple analysis to here, is entirely their own understanding, if there is anything wrong, welcome to criticize correct, common progress. Then there is the return of the function call, here does not elaborate, leave yourself and everyone together to ponder it ~
Have to say, PostgreSQL's source writing is very elegant, comments are in place, it seems that very few enveloping feeling, really is Wu bei model. Speaking of Reading source, want to recommend a book, called "Code reading Methods and Practice", the book is not very easy to find, I still in the school library in the support of the brother found.
In addition, today's code reading method is still somewhat primitive and inefficient, and decided to look at the use of Emacs tag or eclipse to debug some of the more difficult examples, this example is relatively simple. These are left to PostgreSQL compilation installation and debugging (three) to complete it, feel this series to come out a lot of appearance, haha ~
PostgreSQL compilation installation and commissioning (ii)