Five tips for Instagram to improve PostgreSQL performance, postgresql Performance Optimization
As Instagram grows, Postgres continues to serve as a solid foundation for Instagram and stores a vast majority of user data. Less than a year ago, we also wrote on our blog that Instagram "stores a large amount of data" and adds 90 pieces of data per second. Now, this data has increased to 10000 pieces of peak data. Our basic storage technology remains unchanged.
In the past two and a half years, we have some experience and tools for Postgres extension and want to share them. I really hope to have these experiences and tools when I started Instagram. Some of them are exclusive to ipvs and some can be used by other databases. If you want to know how we partition horizontally, read this article.
1. Partial Index
If we often need to filter data based on a fixed feature, and this feature only exists in a small part of the row, in this case, the local index is very effective.
For example, when searching tags for Instagram, we need to find tags with many photos. We generally use ElasticSearch and other technologies for advanced search. However, the database query capability is enough. First, let's take a look at the query by TAG and sort by the number of photos. How does Postgres do it:
EXPLAIN ANALYZE SELECT id from tags WHERE name LIKE 'snow%' ORDER BY media_count DESC LIMIT 10; QUERY PLAN --------- Limit (cost=1780.73..1780.75 rows=10 width=32) (actual time=215.211..215.228 rows=10 loops=1) -> Sort (cost=1780.73..1819.36 rows=15455 width=32) (actual time=215.209..215.215 rows=10 loops=1) Sort Key: media_count Sort Method: top-N heapsort Memory: 25kB -> Index Scan using tags_search on tags_tag (cost=0.00..1446.75 rows=15455 width=32) (actual time=0.020..162.708 rows=64572 loops=1) Index Cond: (((name)::text ~>=~ 'snow'::text) AND ((name)::text ~<~ 'snox'::text)) Filter: ((name)::text ~~ 'snow%'::text) Total runtime: 215.275 ms(8 rows)
Have you seen it? In order to get the result, S has to sort 15000 rows of data. Because the Label Distribution satisfies the long tail model (Translator's note: According to Baidu encyclopedia, "we don't actually use many Chinese characters, but because of the high frequency, these few Chinese characters occupy a large red zone; most Chinese characters are hard to use. They belong to the long yellow tail. 」), We can query tags of more than 100 photos and create a local index first:
Create index concurrently on tags (name text_pattern_ops) WHERE media_count >= 100
Query the new query plan:
EXPLAIN ANALYZE SELECT * from tags WHERE name LIKE 'snow%' AND media_count >= 100 ORDER BY media_count DESC LIMIT 10; QUERY PLAN Limit (cost=224.73..224.75 rows=10 width=32) (actual time=3.088..3.105 rows=10 loops=1) -> Sort (cost=224.73..225.15 rows=169 width=32) (actual time=3.086..3.090 rows=10 loops=1) Sort Key: media_count Sort Method: top-N heapsort Memory: 25kB -> Index Scan using tags_tag_name_idx on tags_tag (cost=0.00..221.07 rows=169 width=32) (actual time=0.021..2.360 rows=924 loops=1) Index Cond: (((name)::text ~>=~ 'snow'::text) AND ((name)::text ~<~ 'snox'::text)) Filter: ((name)::text ~~ 'snow%'::text) Total runtime: 3.137 ms(8 rows)
As you can see, ipvs only needs to access 169 rows, so the speed is much faster. The query scheduler of ipvs is also effective in evaluating constraints. If you want to query tags of more than 500 photos in the future, the partial index will still be used because the result set is a subset of the above set.
2. Function Indexing
In some tables, we need to index some long strings, for example, the base64 mark with 64 characters. If you directly create an index, it will cause a large amount of data duplication. In this case, you can use the S function index:
CREATE INDEX CONCURRENTLY on tokens (substr(token), 0, 8)
Although this will cause many rows to match the same prefix, We can filter the rows based on the match, which is fast. In addition, the index is very small, only about of the original.
3. Use pg_reorg to compact data
Over time, ipvs tables become increasingly fragmented (caused by MVCC concurrency models and other reasons ). In addition, the insert sequence of data rows is often not the order we want to return. For example, if we often need to query photos by users, we 'd better put these items together on the disk to reduce the disk seek time.
We use pg_reorg to solve this problem. It uses three steps to compress a table:
- Obtain an exclusive table lock
- Create a temporary table that records changes, add a trigger to the original table, and copy the changes to the temporary table.
- Use create table... select from... order by to CREATE a TABLE. The new TABLE has all the data of the original TABLE and is sorted BY index ORDER.
- Synchronize the changes that occurred after the execution time of the create table from the temporary TABLE.
- Switch services to new tables
Each step has a lot of details, but in general it looks like above. We first reviewed the tool, ran several tests, and then applied it to the production environment. Now, we have run pg_reorg dozens of times in the environment of hundreds of machines, without any problems.
4. Archiving and backing up WAL (logs before writing) with WAL-E
We use WAL-E to archive WAL logs, a tool written by Heroku, And we contribute part of the code to it. The WAL-E greatly simplifies the process of data backup and replica Library Creation.
The WAL-E uses Progres archive_command to archive every WAL file generated by PG to Amazon S3. Using these WAL files and database benchmark backups, we can restore the database to any point in time after the benchmark backup. By using this method, we can quickly create a read-only replication database or a faulty backup database.
We have written a simple encapsulation script for the WAL-E to monitor repeated failures during archiving, see GitHub.
5. Automatic submission mode and asynchronous mode in psycopg2
We also started to use some advanced features in psycopg2 (psycopg2 is the Python driver of ipvs ).
One is the automatic submission mode. In this mode, psycopg2 does not issue BEGIN/COMMIT, and each query runs in its own single-statement transaction. This is particularly useful for read-only queries that do not require transactions. Enabling is simple:
connection.autocommit = True
After automatic submission is enabled, the conversation between our application server and the database is greatly reduced, and the CPU usage of the database server is also greatly reduced. In addition, we use PGBouncer as the connection pool. After automatic submission is enabled, the connection is returned faster.
For details about interaction with Django, see here.
Psycopg2 also has a very useful function. It can provide coroutine support by registering a wait callback function. It supports cross-connection queries and is very useful for queries that hit multiple nodes. When there is data, the socket will be awakened (we use the Python select module to process and wake up ). It can also work well with multi-threaded libraries such as eventlet and gevent. See the implementation of psycogreen.
In general, we are very satisfied with the high performance and reliability of ipvs. Do you want to work on one of the largest ipvs clusters in the world? Want to work with a group of infrastructure experts? Please contact infrajobs@instagram.com.