Following a Select Statement Through Postgres Internals

Source: Internet
Author: User


The third of a series of posts based on a presentation I do at the Barcelona Ruby Conference called "20,000 Leagu Es under ActiveRecord. " (Posts:one and video).

Preparing for this presentation over the Summer, I decided to read through parts of the PostgreSQL C source code. I executed a very simple SELECT statement and watched what Postgres do with it using LLDB, a C debugger. How did Postgres understand my query? How does it actually find the data I was looking for?

This post is a informal journal of my trip through the guts of Postgres. I ' ll describe the path I took and what I saw along the The. I ' ll use a series of simple, conceptual diagrams to explain how Postgres executed my query. In case you understand C, I'll also leave you a few landmarks and signposts you can look for if you ever decide to hack on Postgres Internals.

In the end, the Postgres source code delighted me. It is clean, the well documented and the easy to follow. Find out for yourself how Postgres works internally by following me on a journey deep inside a tool you use everyday.

Finding Captain Nemo

Here's the example query from the first half of my presentation; We ' ll follow Postgres as it searches for Captain Nemo:

Professor Aronnax and Captain Nemo
Plot the course of the Nautilus.

Finding a single name in string column like this should is straightforward, shouldn ' t it? We ' ll hold tightly onto this SELECT statement while we explore Postgres internals, like a rope deep sea divers use to find Their to the surface.

The Big picture

What does Postgres does with the this SQL string? How does it understand what we meant? How does it know what data we is looking for?

Postgres processes each SQL command we send it using a four step process.

In the first step, Postgres parses We SQL statement and converts it into a series of C memory structures, a Parse Tree. Next Postgres analyzes and rewrites our query, optimizing and simplifying it using a series of complex algorithms . After this, Postgres generates a plan for finding our data. Like a obsessive compulsive person who won ' t leave home without every suitcase packed perfectly, Postgres doesn ' t run our Query until it has a plan. Finally, Postgres actually executes our query. In this presentation I/ll briefly touch on the first three topics, and then focus more on the last step: Execute.

The C function inside of Postgres that implements this 4, step process is called exec_simple_query. You can find a link to it below, along with an lldb backtrace which gives some context on exactly when and how Postgres Callsexec_simple_query.

Exec_simple_queryview on


How does Postgres understand the SQL string we sent it? How does it make sense of the SQL keywords and expressions in our SELECT statement? Through a process called parsing, Postgres converts our SQL string to an internal data structure it understands , the parse tree.

It turns out this Postgres uses the same parsing technology that Ruby does, a parser generator called Bison. Bison runs during the Postgres C build process and generates parser code based on a series of grammar rules. The generated parser code is what runs inside's Postgres when we send it SQL commands. Each grammar rule was triggered when the generated parser finds a corresponding pattern or syntax in the SQL string, and in Serts a new C memory structure into the parse tree data structure.

I won ' t take the time today ~ explain how parsing algorithms work in detail. If you ' re interested in that sort of thing, I ' d suggest taking a look at my book Ruby under a microscope. In Chapter one I go through a detailed example of the LALR parse algorithm used by Bison and Ruby. Postgres parses SQL statements in exactly the same.

Using Lldb and enabling some C logging code, I observed the Postgres parser produce this parse tree for our captain Nemo Q Uery:

At the top was a node representing the entire SQL statement, and below that was child nodes or branches that represent the Different portions of the SQL statement syntax:the target list (a list of columns), the FROM clause (a list of tables), T He WHERE clause, the sort order and a limit count.

If you want to learn more about what Postgres parses SQL statements, follow the flow of control fromExec_simple_qu Ery through another C function called pg_parse_query.

Pg_parse_queryview on

As you can see there be many helpful and detailed comments in the Postgres source code it is explain Ening but also point out important design decisions.

All this hard work for nothing

The parse tree above should look familiar–it ' s almost precisely the same as the abstract syntax tree (AST) we saw Active Record create earlier. Recall from the first half of the presentation ActiveRecord generated our Captain Nemo SELECT statement when we executed T His Ruby query:

We saw that ActiveRecord internally created a AST when we called methods such as where and first . Later (see Thesecond Post), we watched as the Arel gem converted the AST to our example SELECT statement using an Algori tHM based on the visitor pattern.

Thinking about this, it's ironic that's the first thing Postgres does with your SQL statement are convert it from a string ba CK into an AST. Postgres ' s parse process reverses everything ActiveRecord did earlier; All of the Arel gems did is for nothing! The only reason for creating the SQL string at all is to communicate with Postgres over a network connection. Once Postgres have the string, it converts it back to an AST, which are a much more convenient and useful the It's the represent ing queries.

Learning this might ask:is there a better? Is there some, conceptually specifying the data we want to Postgres without writing a SQL statement? Without learning the complex SQL language or paying the performance overhead of using ActiveRecord and Arel? It seems like a waste of time-to go-such lengths to generate a-SQL string from the AST, just to convert it T again. Maybe we should be using a NoSQL database solution instead?

Of course, the AST Postgres uses is much different from the AST used by ActiveRecord. ActiveRecord's AST was comprised of the Ruby objects, while Postgres's AST is formed of a series of C memory structures. Same idea but very different implementations.

Analyze and Rewrite

Once Postgres have generated a parse tree, it then converts it into a another tree structure using a different set of nodes . This is known as the query tree. Returning to the Exec_simple_query C function, you can see it next calls another C functionpg_analyze_and_rewrite.

Pg_analyze_and_rewriteview on

Waving my hands a bit and glossing over many important details, the analyze and rewrite process applies a series of sophis Ticated algorithms and heuristics to try to optimize and simplify your SQL statement. If you had executed a complex SELECT statement with sub-selects and multiple inner and outer joins, then there are a lot of The optimization. It ' s quite possible that Postgres could reduce the number of sub-select clauses or joins to produce a simpler query that R UNS faster.

For our simple SELECT statement, here's the query tree that Pg_analyze_and_rewrite produces:

I don ' t pretend to understand the detailed algorithms behind pg_analyze_and_rewrite. I simply observed that to our example the query tree largely resembled the parse tree. This means the SELECT statement is so straightforward Postgres wasn ' t able to simplify it further.


The last step Postgres takes before starting-to-execute our query was to create a plan. This involves generating a third the tree of nodes that form a list of the instructions for Postgres to follow. Here's the plan tree for our SELECT statement.

Imagine that all node in the plan tree was a machine or worker of some kind. The plan tree resembles a pipeline of data or a conveyor belt in a factory. In my simple example there are only one branch in the tree. Each node in the plan tree takes some the output data from the node below, processes it, and returns results as input to T He node above. We ' ll follow Postgres as it executes the plan in the next section.

The C function, the starts the query planning process is called pg_plan_queries.

Pg_plan_queriesview on

Note the startup_cost and total_cost values in each plan node. Postgres uses these values to estimate how long the plan would take to complete. You don ' t has the use of a C debugger to see the execution plan for your query. Just prepend the SQLEXPLAIN command to your query, like this:

This is a powerful-to understand-Postgres is doing internally with one of the your queries, and why it might be slow Or inefficient–despite the sophisticated planning algorithms in pg_plan_queries.

Executing a Limit Plan Node

By now, Postgres have parsed your SQL statement and converted it back to an AST. Then it optimized and rewrote your query, possibly in a simpler the. Third, Postgres wrote a plan which it would follow to find and return the data is looking for. Finally It's time for Postgres to actually execute your query. How does it does this? It follows the plan, of course!

Let's start at the top of the plan tree and Move down. Skipping the root node, the first worker that Postgres uses for our Captain Nemo query is called Limit. The Limit node, as you might guess, implements the limit SQL command, which limits the result set to the specified num ber of records. The same plan node also implements the OFFSET command, which starts the result set window at the specified row.

The first time Postgres calls the Limit node, it calculates what is the limit and offset values should be, because they might BES set to the result of some dynamic calculation. In we example, offset is 0 and limit is 1.

Next, the Limit plan node repeatedly calls the Subplan, in our case Sort, counting until it reaches the offset value:

In we example the offset value is zero, so this loop would load the first data value and stop iterating. Then Postgres returns the last data value loaded from the Subplan to the calling or upper plan. For us, this would be is that first value from the Subplan.

Finally when Postgres continues to the Limit node, it'll pass the data values through from the Subplan one at a Tim E:

In our example, because the limit value is 1 limit would immediately return NULL indicating to the upper plan there is no m Ore data available.

Postgres implements the Limit node using code in a file called nodelimit.c

Execlimitview on

You can see the Postgres source code uses words such as tuple (a set a values, one from each column) and SUBP Lan. The subplan in this example are the Sort node, which appears below Limit in the plan.

Executing a Sort Plan Node

Where does the data values Limit filters come from? From the Sort plan node, appears under Limit in the plan tree. Sort loads data values from their subplan and returns them to their calling plan, Limit. Here's what Sort does when the Limit node calls it for the first time, to get the first data value:

You can see the Sort functions very differently from Limit. It immediately loads all of the available data from the Subplan into a buffer, before returning anything. Then it sorts the buffer using the Quicksort algorithm, and finally returns the first sorted value.

For the second and subsequent calls, Sort simply returns additional values from the sorted buffer, and never needs to call The Subplan again:

The Sort plan node is implemented by a C function called Execsort:

Execsortview on

Executing a seqscan Plan Node

Where does execsort get its values? From it subplan, or the Seqscan node, appears at the bottom of the plan tree. Seqscan stands for sequence scan, which means to look through the values in a table, returning values that match A given filter. To understand how the scan works with our filter, let's step through an imaginary users table filled with fake names, look ing for Captain Nemo.

Postgres starts at the first record in a table (known as a relation in the Postgres source code) and executes the The Boolean expression from the plan tree. In simple terms, Postgres asks the question: "Are this Captain Nemo?" Because Laurianne Goodwin is not Captain Nemo, Postgres steps down to the next record.

No, Candace is also not Captain Nemo. Postgres continues:

... and eventually finds Captain Nemo!

Postgres implements the Seqscan node using a C function called Execseqscan.

Execseqscanview on

What is We Doing wrong?

Now we ' re done! We ' ve followed a simple SELECT statement all the through the guts of Postgres, and has seen how it was parsed, Rewrit Ten, planned and finally executed. After executing many thousands of lines of C code, Postgres have found the data we are looking for! Now all Postgres have to do are return the Captain Nemo string back to our Rails application and ActiveRecord can create a R Uby object. We can finally return to the surface of our application.

But Postgres doesn ' t stop! Instead of simply returning, Postgres continues to scan through the users table, even though we ' ve already found Captain N Emo

While returning from the South Pole, the air
Supply inside the Nautilus began to run out.

What's going on here? Why are Postgres wasting its time, continuing to search even though it's already found the data we ' re looking for?

The answer lies farther up the plan tree in the Sort node. Recall in order to sort all of the users, Execsort first loads all of the values into a buffer, by calling the Subpla n repeatedly until there is no values left. That means that Execseqscan would continueto the end of the table, until it had all of the matching users . If Our users table contained thousands or even millions of records (imagine we work at Facebook or Twitter), Execseqscan W Ill has the to loop over every a single user record and execute the string comparison for each one. This is obviously inefficient and slow, and would get slower as more and more user records be added.

If we have a Captain Nemo record, then Execsort'll "sort" just that single matching record, and Execlimit would PA SS that single record through its offset/limit filter ... but only after execseqscan have iterated over all of the names.

Next time

How does we fix this problem? What should we do if we have SQL queries on the users table take more and more time to execute? The answer is simple:we create an index.

In the next and final post in this series we'll learn how to create a Postgres index and to avoid the use of Execseqscan. More importantly, I'll show you the A Postgres index looks like: How it works and why it speeds up queri Es like this one.

Note: A very good article that explains what the optimizer did when performing a select operation in pg.

For LLDB specific Operating procedures refer to:



Following a Select Statement Through Postgres Internals

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

Tags Index: