A detailed description of Django's select_related and prefetch_related functions for QuerySet query optimization
When the database has foreign keys, using select_related () and prefetch_related () can reduce the number of database requests and improve performance. This article is a simple example of how these two functions work. Although Queryset's documentation has been described in detail, this article attempts to analyze how it works from the SQL statements triggered by Queryset to learn more about how Django works.
Originally intended to write a separate article, but after writing select_related () found that the length has been a bit long, so it is written in series, probably in two or three articles. The link to the other articles will be added here after the entire completion.
1. Background notes for instances
Suppose that a personal information system requires the recording of the home, place of residence, and the city of the individual in the system. The database is designed as follows:
models.py content is as follows:
From django.db Import models Class province (models. Model): name = models. Charfield (max_length=10) def __unicode__ (self): return to Self.name class city (models. Model): name = models. Charfield (max_length=5) province = models. ForeignKey (province) def __unicode__ (self): return Self.name class person (models. Model): firstname = models. Charfield (max_length=10) LastName = models. Charfield (max_length=10) visitation = models. Manytomanyfield (city, related_name = "Visitor") Hometown = models. ForeignKey (city, related_name = "Birth") living = models. ForeignKey (city, related_name = "Citizen") def __unicode__ (self): return self.firstname + self.lastname
Note 1: The app created is named "Qsoptimize"
Note 2: For the sake of simplification, qsoptimize_province
there are only 2 data in the table: Hubei province and Guangdong Province, qsoptimize_city
there are only three data in the table: Wuhan, Shiyan and Guangzhou city
2. select_related ()
For a pair of fields (Onetoonefield) and foreign key fields (ForeignKey), you can use select_related to optimize for Queryset
Functions and methods
After using the select_related () function with Queryset, Django acquires the corresponding foreign key corresponding to the object, thus eliminating the need to query the database later. The above example shows that if we need to print all the cities in the database and their provinces, the most straightforward approach is:
>>> citys = City.objects.all () >>> for C in Citys: ... Print C.province ...
This results in a linear SQL query that, if the number of objects is too many n, has a K foreign key field in each object, which results in a n*k+1 SQL query. In this case, because there are 3 city objects that result in 4 SQL queries:
MySQL
SELECT ' qsoptimize_city '. ' id ', ' qsoptimize_city '. ' Name ', ' qsoptimize_city '. ' province_id ' from ' qsoptimize_city ' SELECT ' qsoptimize_province '. ' id ', ' qsoptimize_province '. ' Name ' from ' qsoptimize_province ' WHERE ' qsoptimize_ Province '. ' id ' = 1; SELECT ' qsoptimize_province '. ' id ', ' qsoptimize_province '. ' Name ' from ' qsoptimize_province ' WHERE ' qsoptimize_ Province '. ' id ' = 2; SELECT ' qsoptimize_province '. ' id ', ' qsoptimize_province '. ' Name ' from ' qsoptimize_province ' WHERE ' qsoptimize_ Province '. ' id ' = 1;
NOTE: The SQL statement here is output directly from the Django logger: ' Django.db.backends '
If we use the select_related () function:
>>> citys = City.objects.select_related (). All () >>> for C in Citys: ... Print C.province ...
There is only one SQL query, which obviously significantly reduces the number of SQL queries:
MySQL
SELECT ' qsoptimize_city '. ' id ', ' qsoptimize_city '. ' Name ', ' qsoptimize_city '. ' province_id ', ' qsoptimize_province '. ' Id ', ' qsoptimize_province '. ' Name ' from ' qsoptimize_city ' INNER joins ' Qsoptimize_province ' on (' qsoptimize_city '. ') province_id ' = ' qsoptimize_province '. ' id ');
As we can see here, Django uses the inner join to get information about the province. By the way, the results of this SQL query are as follows:
+----+-----------+-------------+----+-----------+| ID | Name | province_id | id | name |+----+-----------+-------------+----+-----------+| 1 | Wuhan | 1 | 1 | Hubei Province | | 2 | Guangzhou | 2 | 2 | Guangdong Province | | 3 | Shiyan | 1 | 1 | Hubei province |+----+-----------+-------------+----+-----------+3 rows in Set (0.00 sec)
The following three usages are supported using method functions:
*fieldsParameters
Select_related () accepts variable long parameters, each parameter is a field name that needs to get the foreign key (the parent table's contents), and the foreign Key's foreign key field name, foreign key foreign key .... To select a foreign key, you need to use two underscores "__" to connect.
For example, if we want to obtain Zhang San's current province, we can use the following methods:
Python
>>> zhangs = Person.objects.select_related (' living__province '). Get (Firstname=u "Zhang", lastname=u "three") >> > zhangs.living.province
The SQL query triggered is as follows:
MySQL
SELECT ' Qsoptimize_person '. ' id ', ' qsoptimize_person '. ' FirstName ', ' Qsoptimize_person '. ' LastName ', ' Qsoptimize_ Person '. ' hometown_id ', ' Qsoptimize_person '. ' living_id ', ' qsoptimize_city '. ' id ', ' qsoptimize_city '. ' Name ', ' Qsoptimize_city '. ' province_id ', ' qsoptimize_province '. ' id ', ' qsoptimize_province '. ' Name ' from ' Qsoptimize_person ' INNER join ' qsoptimize_city ' on (' Qsoptimize_person '. ' living_id ' = ' qsoptimize_city '. ' id ') INNER join ' Qsoptimize_ Province ' on (' qsoptimize_city '. ' province_id ' = ' qsoptimize_province '. ' id ') WHERE (' Qsoptimize_person '. ' LastName ' = ' Three ' and ' Qsoptimize_person '. ' FirstName ' = ' Zhang ');
As you can see, Django uses 2 INNER joins to complete the request, obtains the contents of the city table and the province table, and adds the corresponding columns to the result table, so that you do not have to make SQL queries again when calling Zhangs.living.
+----+-----------+----------+-------------+-----------+----+-----------+-------------+----+-----------+| ID | FirstName | LastName | hometown_id | living_id | ID | Name | province_id | id | name |+----+-----------+----------+-------------+-----------+----+-----------+--- ----------+----+-----------+| 1 | Zhang | three | 3 | 1 | 1 | Wuhan | 1 | 1 | Hubei province |+----+-----------+----------+-------------+-----------+----+-----------+-------------+----+---------- -+1 row in Set (0.00 sec)
However, non-specified foreign keys are not added to the results. At this time, if you need to get Zhang San's hometown, SQL queries will be made:
Python
>>> zhangs.hometown.province
MySQL
SELECT ' qsoptimize_city '. ' id ', ' qsoptimize_city '. ' Name ', ' qsoptimize_city '. ' province_id ' from ' qsoptimize_city ' WHERE ' qsoptimize_city '. ' id ' = 3; SELECT ' qsoptimize_province '. ' id ', ' qsoptimize_province '. ' Name ' from ' qsoptimize_province ' WHERE ' qsoptimize_ Province '. ' id ' = 1
Also, if you do not specify a foreign key, two queries are made. If the depth is deeper, the number of queries is more.
It is worth mentioning that, starting with Django 1.7, the select_related () function has changed its way of acting. In this case, if you want to obtain both the hometown of Zhang San and the province of your current place of residence, you can only do so before 1.7:
Python
>>> zhangs = Person.objects.select_related (' hometown__province ', ' living__province '). Get (Firstname=u "Zhang", Lastname=u "three") >>> zhangs.hometown.province>>> zhangs.living.province
However, with versions 1.7 and later, you can perform chained operations like other functions of queryset:
Python
>>> zhangs = Person.objects.select_related (' hometown__province '). select_related (' living__province '). Get ( Firstname=u "Zhang", lastname=u "three") >>> zhangs.hometown.province>>> zhangs.living.province
If you do this in the following version of 1.7, you will only get the result of the last operation, in this case only the place of residence and no hometown. When you print your hometown province, you create two SQL queries.
Depth
Parameters
Select_related () accepts the depth parameter, and the depth parameter determines the depth of the select_related. Django recursively iterates through all the Onetoonefield and ForeignKey within the specified depth. This example illustrates:
Python
>>> zhangs = Person.objects.select_related (depth = d)
D=1 equivalent to select_related (' hometown ', ' living ')
d=2 equivalent to select_related (' hometown__province ', ' living__province ')
No parameters
Select_related () can also be non-parametric, which means that Django is required to be as deep as possible select_related. For example: Zhangs = Person.objects.select_related (). Get (Firstname=u "Zhang", lastname=u "three"). But pay attention to two points:
- Django itself has a built-in upper limit, and for a particularly complex table relationship, Django might jump out of recursion somewhere you don't know, and that's not the way you want it to be. I don't know how the specific restrictions work.
- Django does not know what fields you actually want to use, so it takes all the fields in, causing unnecessary waste to affect performance.
Summary
- Select_related Primary pin-to-one and multi-pair relationship optimization.
- Select_related uses the SQL JOIN statement to optimize and improve performance by reducing the number of SQL queries.
- You can specify the name of the field you want to select_related with variable length parameters. You can also implement a specified recursive query by using the double underscore "__" to concatenate the field names. No specified field is cached, no depth specified is not cached, and Django will make a SQL query again if it is to be accessed.
- You can also specify the depth of recursion through the depth parameter, and Django automatically caches all fields in the specified depth. If you want to access fields outside the specified depth, Django will make the SQL query again.
- Also accepts parameterless calls, and Django recursively queries all of the fields as deeply as possible. But be aware of the limitations of Django recursion and the waste of performance.
- Django >= 1.7, the select_related of chained calls is equivalent to using variable-length parameters. Django < 1.7, chained calls cause the select_related in front to fail, leaving only the last one.
Django-website Program Case Series-18 multi-table operation optimization