Discovering the computer science Behind Postgres Indexes

Source: Internet
Author: User
Tags postgres database codeship

The last in a series of Postgres posts, Pat Shaughnessy wrote based on he presentation at the Barcelona Ruby Conference. You can also watch the video recording of the presentation. The series was originally published in his personal blog, and we is republishing it on codeship with his kind permission. You can also read posts one, both, and three in the series.

We all know indexes is one of the most powerful and important features of relational database servers. How does search for a value quickly? Create an index. What does the remember to does when joining the tables together? Create an index. How does the speed up a SQL statement that's beginning to run slowly? Create an index.

But what is indexes, exactly? And how does they speed up our database searches?

To find out, I decided to read the C source code inside the PostgreSQL database server to follow along as it searched an Index for a simple string value. I expected to find sophisticated algorithms and efficient data structures. and I did.

Today I'll show you what the indexes look like inside Postgres and explain how they work. What I didn ' t expect to find-what I discovered for the first time reading the Postgres C source Code-was the computer Science theory behind, what is it was doing. Reading The Postgres source is like going back to school and taking that class I never had time for when I was younger. The C comments inside Postgres not only explain what Postgres does, and why.

Sequence scans:a Mindless Search

When we left the crew of the Nautilus, they were exhausted and beginning to faint:the Postgres sequence Scan algorithm WA s mindlessly looping over all of the records in the users table!

Recall in my last post we had executed this simple SQL statement to find Captain Nemo:

Postgres first parsed, analyzed and planned the query. Thenexecseqscan, the C function inside of Postgres that implements the sequence scan (Seqscan) plan node, quickly found Ca Ptain Nemo:

But then inexplicably Postgres continued to loop through the entire user table, comparing each name to "Captain Nemo," Eve N Though we had already found what we were looking for!

Imagine If our users table had millions of records; This could take a very long time. Of course, we could has avoided this by removing the sort and rewriting our query to accept the first name, but the Deepe R problem here is the inefficient-Postgres searches for our target string.

Using a sequence scan to compare every single value in the Users table with ' Captain Nemo ' is slow, inefficient and depend s on the random order the names appear in the table. What is we doing wrong? There must be a better way!

The answer is Simple:we forgot to create an Index. Let's do it now.

Creating an Index

Creating an index was straightforward-we just need to run the This command:

As Ruby developers, of course, we would use the add_indexactiverecord migration instead; This would run the same C reate INDEX command behind the scenes. When we rerun our SELECT statement, Postgres would create a plan tree as usual-but this time the plan tree would be slight Ly different:

Notice at the bottom Postgres now uses indexscan instead of Seqscan. Unlike Seqscan, Indexscan won ' t iterate over the entire users table. Instead, it'll use the index we just created to find and return the Captain Nemo records quickly and efficiently.

Creating an index have solved our performance problem, but it's also left us with many interesting, unanswered questions:

    • What is a Postgres index, exactly?
    • If I could go inside of a Postgres database and take a close look at an index, what is would itlook like?
    • and How does a index speed up searches?

Let's try to answer these questions by reading the Postgres C source code.

What is a Postgres Index, exactly?

We can get started with a look at the documentation for the CREATE INDEX command.

Here's can see all of the options we can use to create an index, such as UNIQUE and concurrently. Notice there ' s an option called USING method. This tells Postgres what kind of index we want. Farther down the same page is some information on method, the argument to the USING keyword:

It turns out Postgres implements four different types of indexes. You can use the them for different types of the data and in different situations. Because we didn ' t specify USING at all, ourIndex_users_on_name index is a "btree" (or B-tree) index, the default t ype.

Clue:a Postgres Index is a b-tree. But what's a b-tree? Where can we find one? Inside of Postgres, of course! Let's search the Postgres C source code for files containing "Btree:"

The key result is in bold: "./backend/access/nbtree." Inside This directory is a README file; Let ' s read it:

Amazingly, this README file turns off to is an extensive 12-page document! The Postgres source code not only contains helpful and interesting C comments, it also contains documentation about the TH Eory and implementation of the database server.

Reading and understanding the code in open source projects can often is intimidating and difficult, but not for Postgres. The developers behind Postgres has gone to great lengths to help with the rest of us understand their work.

The title of the README document, "Btree indexing," confirms this directory contains the C code, implements Postgres B -tree indexes. But the first sentence was even more Interesting:it's a reference to an academic paper that explains what a b-tree is, and How Postgres indexes work: efficient Locking for Concurrent Operations on B-trees, by Lehman and Yao.

We'll find a b-tree inside this academic paper.

What Does a b-tree Index look like?

Lehman and Yao ' s paper explains an innovation they made to the B-tree algorithm in 1981. I ' ll discuss this a bit later. But they start with a simple introduction to the B-tree data structure, which is actually invented 9 years earlier in 197 2. One of their diagrams shows an example of a simple b-tree:

The term b-tree actually stands for "balanced Tree." B-trees make searching easy and fast. For example, if we wanted to search for the value of this example, we first start at the root node which contains the V Alue 40:

We compare our target value of + with the value we find in the tree node. is a greater than or less than 40? Because greater than, we follow the pointer down to the right. If we were searching for $, we would go down to the left. Pointers on the right leads to larger values; Pointers on the left to smaller ones.

Following the pointer down the tree to the next child tree node, we encounter a node that contains 2 values:

This time we compare-both, and find that < < 62. Note the values in the tree node is sorted. This time we follow the center pointer down. Now we get to another tree node, this one with 3 values in it:

Looking through the sorted list of numbers, we find < <, and follow the second of four pointers down. Finally, we come to a leaf node in the tree:

And we ' ve found the value 53!

B-trees speed up searches because:

    • They sort the values (known as keys) inside of each node.
    • They is balanced: b-trees evenly distribute the keys among the nodes, minimizing the number of times we have to Follow a pointer from one node to another. Each pointer leads to a child node, contains more, or less the the same number of keys for each and child node does.
What Does a Postgres Index look like?

Lehman and Yao drew this diagram over years Ago-what does it has to do with how Postgres works today? Astonishingly, theindex_users_on_name index we created earlier looks very similar to figure 2:we created a index in Looks just like a diagram from 1981!

When we executed the "CREATE" INDEX command, Postgres saved all of the names from our users table into a b-tree. These became the keys of the tree. Here's what a node inside a Postgres b-tree the index looks like:

Each entry in the index consists of a C structure calledIndextupledata and are followed by a bitmap and a value. Postgres uses the bitmap to record whether all of the index attributes in a key is NULL, to save space. The actual values in the index appear after the bitmap.

Let's take a closer look at the indextupledata structures:

Above you can see each indextupledata structure contains:

    • T_tid:this is a pointer to either another the index tuple, or to a database record. Note this isn ' t a C pointer to physical memory; Instead, it contains numbers Postgres can use to find the referenced value among its memory pages.
    • T_info:this contains information about the index tuple, such as what many values it contains, and whether or not there is Null values.

To understand the better, let's show a few entries from ourindex_users_on_name index:

Now I ' ve replaced "value" with some names from my users table. The upper tree node includes the keys "Dr Edna Kunde" and "Julius Powlowski," While the lower tree node contains "Julius Powlowski "and" Juston Quitzon. "

Notice that, unlike Lehman and Yao's diagram, Postgres repeats the parent keys in each child node. Here "Julius Powlowski" are a key in the upper node and in the child node. The t_tid pointer from Julius in the upper node references the same Julius name in the lower node.

To learn more on exactly how Postgres stores key values into a B-tree node, refer to the Itup.h C header file:

Finding the B-tree Node containing Captain Nemo

Now let's return to our original SELECT statement again:

How exactly does Postgres search we index_users_on_name index for "captain Nemo?" Why are using the index faster than the sequence scan we saw in my last post? To find out, let's zoom out a bit and take a look at some of the user names with our index:

This is the root node of the index_users_on_name B-tree. I ' ve turned the tree on it side so the names would fit. You can see 4 names and a NULL value. Postgres created this root node when I created index_users_on_name.

Note that, aside from the first NULL value which represents the beginning of the index, the other 4 names is more or less Evenly distributed in alphabetical order.

Remember A b-tree is a balanced Tree. In this example, the B-tree have 5 child nodes:

    • The names that appear before Dr Edna Kunde alphabetically
    • Names that appear between Dr Edna Kunde and Julius Powlowski
    • Names that appear between Julius Powlowski and Monte Nicolas
    • etc...

Because we ' re searching for Captain Nemo, Postgres follows the first and top arrow to the right. This is because Captain Nemo comes before Dr Edna Kunde alphabetically:

You can see the Postgres with found the B-tree node that contains Captain Nemo. For my test I added-names to the users table; This child node in the B-tree contained is about names (actually). The b-tree have narrowed down Postgres ' s search considerably.

To learn more on the precise algorithm Postgres uses to search for the target B-tree node among all of the nodes in the Tree, read the_bt_search function.

Finding Captain Nemo Inside a single b-tree Node

Now that Postgres have narrowed down the search for a b-tree node containing about names, it still have to find Captain N Emo ... how does it does this? Does It perform a sequence scan on this shorter list?

No. To search for a key value inside of a tree node, Postgres switches to use a binary search algorithm. It starts by comparing the key, the appears at the 50% position in the tree node with "Captain Nemo:"

Because captain Nemo comes after Breana witting alphabetically, Postgres jumps down to the 75% position and performs anoth ER comparison:

This time Captain Nemo comes before Curtis Wolf, so Postgres jumps back a bit. Skipping a few more steps (it actually took Postgres 8 comparisons to find Captain Nemo in my example), Postgres Eventuall Y finds what we is looking for:

To learn on exactly how Postgres searches for a value in a single B-tree node, read the _bt_binsrch function:< /c0>

So Much + to learn

I don ' t has space in this blog post to cover many other fascinating details about b-trees, database indexes or Postgres I Nternals maybe I should write Postgres under a microscope. But for now, here is just a few interesting bits of theory you can read on efficient Locking forConcurrent Oper ations on B-trees or in the other academic papers it references.

  • Inserting into b-trees:the Most beautiful part of the b-tree algorithm have to does with Inserting new keys into a Tree. Key is inserted in sorted order to the proper tree node-but what happens when there's no more hostel for a new key? In this situation, Postgres splits the node into the other, inserts the new key into one of the them, and also adds the key from the Split point into the parent node, along with a pointer to the new child node. Of course, the parent node might also has the to is split to fit it new key, resulting in a complex, recursive operation.
  • Deleting from B-trees:the Converse operation is also interesting. When deleting a key from a node, Postgres would combine sibling nodes together when possible, removing a key from their par Ent. This can also is a recursive operation.
  • B-link-trees:lehman and Yao ' s paper actually discusses an innovation they researched related to concurrency and locking W Hen multiple threads is using the same b-tree. Remember, Postgres ' s code and algorithms need to being multithreaded because many clients could be searching or modifying the Same index at the same time. By adding another pointer from each B-tree node to the next sibling node-the so-called "right arrow"-one thread can SE Arch a tree even while a second thread is splitting a node without locking the entire index:
Don ' t be afraid to Explore beneath the Surface

Professor Aronnax risked his life and career to find the elusive Nautilus and to join Captain Nemo on a long series of AMA Zing Underwater Adventures.

We should do the Same:don ' t is afraid to dive underwater-inside and underneath the tools, languages and technologies th At the every day. You could know all on how to use Postgres, but did you really know how Postgres itself works internally?

Take a look inside; Before you know it, you'll be on an underwater adventure of your own.

Studying the computer science at work behind the scenes of our applications isn ' t just a matter of have fun, it's part O F being a good developer.

As software development tools improve year after year, and as building Web sites and mobile apps becomes easier and easier , we shouldn ' t lose sight of the computer science we depend on. We ' re all standing in the shoulders of giants-people like Lehman and Yao, and the open source developers who used their Theories to build Postgres.

Don ' t take the tools to everyday for granted-take a look inside them! You'll become a wiser developer and you'll find insights and knowledge you could never has imagined before.

Note: A very good article on the implementation of PG Btree, as well as comments on the article is worth a look.


In addition, the following is the PG source code to explain btree implementation of the Readme, the content is very detailed:



Discovering the computer science Behind Postgres Indexes

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.