Use the example to explain how to optimize the database query by using the prefetch_related () function in the Django framework in Python, pythondjango

Source: Internet
Author: User
Tags prefetch

Use the example to explain how to optimize the database query by using the prefetch_related () function in the Django framework in Python, pythondjango

Instance background

Assume that the Personal Information System must record the home, residence, and city of each individual in the system. The database design is as follows:

The content of Models. py is as follows:
 

from django.db import models class Province(models.Model): name = models.CharField(max_length=10) def __unicode__(self):  return self.name class City(models.Model): name = models.CharField(max_length=5) province = models.ForeignKey(Province) def __unicode__(self):  return self.name class Person(models.Model): firstname = models.CharField(max_length=10) lastname = models.CharField(max_length=10) visitation = models.ManyToManyField(City, related_name = "visitor") hometown = models.ForeignKey(City, related_name = "birth") living  = models.ForeignKey(City, related_name = "citizen") def __unicode__(self):  return self.firstname + self.lastname

NOTE 1: The created app is named "QSOptimize"

NOTE 2: For the sake of simplicity, the 'qsoptimize _ province 'table has only two data entries: Hubei province and Guangdong province, and the 'qsoptimize _ City' table has only three data entries: Wuhan, Shiyan city, and Guangzhou city.

Prefetch_related ()

You can use prefetch_related () to optimize multiple-to-many fields (ManyToManyField) and one-to-many fields. Maybe you will say that there is no such thing as OneToManyField. In fact, ForeignKey is a many-to-one field, and the field associated with ForeignKey is a one-to-many field.

 
Functions and Methods

Prefetch_related () and select_related () are designed to reduce the number of SQL queries, but they are implemented in different ways. The latter solves the problem in the SQL query through the JOIN statement. However, it is wise to use SQL statements to solve the many-to-many relationship, because the JOIN operation results in a long table, resulting in an increase in the SQL statement running time and memory usage. If there are n objects, each object's many-to-many fields correspond to Mi entries, a result table of Σ (n) Mi rows will be generated.

The solution of prefetch_related () is to query each table separately and then use Python to process the relationship between them. Continue with the example above. If we want to obtain all the cities that James has been to, use prefetch_related () to do this:
 

>>> Zhangs = Person. objects. prefetch_related ('visitation '). get (firstname = u "Zhang", lastname = u "3") >>> for city in zhangs. visitation. all ():... print city...

The SQL query triggered by the above Code is as follows:
 

SELECT 'qsoptimize _ person '. 'id', 'qsoptimize _ person '. 'firstname', 'qsoptimize _ person '. 'lastname', 'qsoptimize _ person '. 'hometown _ id', 'qsoptimize _ person '. 'Living _ id' FROM 'qsoptimize _ person' WHERE ('qsoptimize _ person '. 'lastname' = '3' AND 'qsoptimize _ person '. 'firstname' = 'zhang'); SELECT ('qsoptimize _ person_visitation '. 'person _ id') AS '_ prefetch_related_val', 'qsoptimize _ City '. 'id', 'qsoptimize _ City '. 'name', 'qsoptimize _ City '. 'province _ id' FROM 'qsoptimize _ City' inner join 'qsoptimize _ person_visitation 'ON ('qsoptimize _ City '. 'id' = 'qsoptimize _ person_visitation '. 'city _ id') WHERE 'qsoptimize _ person_visitation '. 'person _ id' IN (1 );

The first SQL query is only used to obtain the Person object of Michael Jacob. The second query is critical. It selects the row where 'person _ id' in the relational table 'qsoptimize _ person_visitation 'is Michael, then, inline JOIN with the 'city' table (inner join is also called equivalent JOIN) to obtain the result table.
 

+ ---- + ----------- + ---------- + ------------- + ----------- + | Id | firstname | lastname | region | living_id | + ---- + ----------- + ---------- + ------------- + ----------- + | 1 | 3 | 3 | 1 | + ---- + ----------- + ---------- + ------------- + ----------- + 1 row in set (0.00 sec) + region + ---- + ----------- + ------------- + | _ prefetch_related_val | id | name | province_id | + region + ---- + ----------- + | 1 | 1 | Wuhan | 1 | 1 | 2 | Guangzhou | 2 | 1 | 3 | Shiyan city | 1 | + ------------------------- + ---- + ----------- + ------------- + 3 rows in set (0.00 sec)

Apparently, John has been to Wuhan, Guangzhou, and Shiyan.

Or, we want to get the names of all cities in Hubei province, as shown in the following code:
 

>>> Hb = Province. objects. prefetch_related ('city _ set '). get (name _ iexact = u "Hubei Province") >>> for city in hb. city_set.all ():... city. name...

SQL query triggered:
 

SELECT 'qsoptimize _ province '. 'id', 'qsoptimize _ province '. 'name' FROM 'qsoptimize _ province 'WHERE 'qsoptimize _ province '. 'name' LIKE 'hubei province '; SELECT 'qsoptimize _ City '. 'id', 'qsoptimize _ City '. 'name', 'qsoptimize _ City '. 'province _ id' FROM 'qsoptimize _ City' WHERE 'qsoptimize _ City '. 'vince _ id' IN (1 );

The resulting table:
 

+ ---- + ----------- + | Id | name | + ---- + ----------- + | 1 | Hubei Province | + ---- + ----------- + 1 row in set (0.00 sec) + ---- + ----------- + ------------- + | id | name | province_id | + ---- + ----------- + ------------- + | 1 | Wuhan | 1 | 3 | Shiyan city | 1 | + ---- + ------------- + ------------- + 2 rows in set (0.00 sec)

We can see that prefetch uses the IN statement. In this way, when the number of objects in QuerySet is too large, different database features may cause performance problems.

 
Usage
* Lookups Parameters

Prefetch_related () is used only in Django <1.7. Like select_related (), prefetch_related:
 

>>> Zhangs = Person. objects. prefetch_related ('visitation _ province '). filter (firstname _ iexact = u 'zhang') >>> for I in zhangs :... for city in I. visitation. all ():... print city. province...

SQL triggered:
 

SELECT 'qsoptimize _ person '. 'id', 'qsoptimize _ person '. 'firstname', 'qsoptimize _ person '. 'lastname', 'qsoptimize _ person '. 'hometown _ id', 'qsoptimize _ person '. 'Living _ id' FROM 'qsoptimize _ person' WHERE 'qsoptimize _ person '. 'firstname' LIKE 'zhang '; SELECT ('qsoptimize _ person_visitation '. 'person _ id') AS '_ prefetch_related_val', 'qsoptimize _ City '. 'id', 'qsoptimize _ City '. 'name', 'qsoptimize _ City '. 'province _ id' FROM 'qsoptimize _ City' inner join 'qsoptimize _ person_visitation 'ON ('qsoptimize _ City '. 'id' = 'qsoptimize _ person_visitation '. 'city _ id') WHERE 'qsoptimize _ person_visitation '. 'person _ id' IN (1, 4); SELECT 'qsoptimize _ province '. 'id', 'qsoptimize _ province '. 'name' FROM 'qsoptimize _ province 'WHERE 'qsoptimize _ province '. 'id' IN (1, 2 );

Result:
 

+ ---- + ----------- + ---------- + ------------- + ----------- + | Id | firstname | lastname | region | living_id | + ---- + ----------- + ---------- + ------------- + ----------- + | 1 | 3 | 3 | 1 | 4 | sheets | 6 | 2 | 2 | + ---- + ----------- + ---------- + ------------- + ----------- + 2 rows in set (0.00 sec) + region + ---- + ----------- + ------------- + | _ prefetch_related_val | id | name | province_id | + region + ---- + ----------- + | 1 | 1 | Wuhan | 1 | 1 | 2 | Guangzhou | 2 | 4 | 2 | Guangzhou | 2 | 1 | 3 | Shiyan city | 1 | + ----------------------- + ---- + ----------- + ------------- + 4 rows in set (0.00 sec) + ---- + ----------- + | id | name | + ---- + ----------- + | 1 | Hubei Province | 2 | Guangdong Province | + ---- + ----------- + 2 rows in set (0.00 sec)

It is worth mentioning that the chain prefetch_related will add these queries, just like select_related in 1.7.

Note that when QuerySet is used, once the database Request is changed in the chained operation, the data cached with prefetch_related will be ignored. This will cause Django to request the database again to obtain the corresponding data, resulting in performance problems. The change of database requests mentioned here refers to the operation of various filters () and exclude () that will eventually change the SQL code. And all () does not change the final database request, so it will not cause a new request to the database.

For example, to obtain a city with the word "city" in the city visited by all users, this will lead to a large number of SQL queries:
 

Plist = Person. objects. prefetch_related ('visitation') [p. visitation. filter (name _ icontains = u "") for p in plist]

Four members in the database cause 2 + 4 SQL queries:
 

SELECT 'qsoptimize _ person '. 'id', 'qsoptimize _ person '. 'firstname', 'qsoptimize _ person '. 'lastname', 'qsoptimize _ person '. 'hometown _ id', 'qsoptimize _ person '. 'Living _ id' FROM 'qsoptimize _ person'; SELECT ('qsoptimize _ person_visitation '. 'person _ id') AS '_ prefetch_related_val', 'qsoptimize _ City '. 'id', 'qsoptimize _ City '. 'name', 'qsoptimize _ City '. 'province _ id' FROM 'qsoptimize _ City' inner join 'qsoptimize _ person_visitation 'ON ('qsoptimize _ City '. 'id' = 'qsoptimize _ person_visitation '. 'city _ id') WHERE 'qsoptimize _ person_visitation '. 'person _ id' IN (1, 2, 3, 4); SELECT 'qsoptimize _ City '. 'id', 'qsoptimize _ City '. 'name', 'qsoptimize _ City '. 'province _ id' FROM 'qsoptimize _ City' inner join 'qsoptimize _ person_visitation 'ON ('qsoptimize _ City '. 'id' = 'qsoptimize _ person_visitation '. 'city _ id') WHERE ('qsoptimize _ person_visitation '. 'person _ id' = 1 AND 'qsoptimize _ City '. 'name' LIKE '% city %'); SELECT 'qsoptimize _ City '. 'id', 'qsoptimize _ City '. 'name', 'qsoptimize _ City '. 'province _ id' FROM 'qsoptimize _ City' inner join 'qsoptimize _ person_visitation 'ON ('qsoptimize _ City '. 'id' = 'qsoptimize _ person_visitation '. 'city _ id') WHERE ('qsoptimize _ person_visitation '. 'person _ id' = 2 AND 'qsoptimize _ City '. 'name' LIKE '% city %'); SELECT 'qsoptimize _ City '. 'id', 'qsoptimize _ City '. 'name', 'qsoptimize _ City '. 'province _ id' FROM 'qsoptimize _ City' inner join 'qsoptimize _ person_visitation 'ON ('qsoptimize _ City '. 'id' = 'qsoptimize _ person_visitation '. 'city _ id') WHERE ('qsoptimize _ person_visitation '. 'person _ id' = 3 AND 'qsoptimize _ City '. 'name' LIKE '% city %'); SELECT 'qsoptimize _ City '. 'id', 'qsoptimize _ City '. 'name', 'qsoptimize _ City '. 'province _ id' FROM 'qsoptimize _ City' inner join 'qsoptimize _ person_visitation 'ON ('qsoptimize _ City '. 'id' = 'qsoptimize _ person_visitation '. 'city _ id') WHERE ('qsoptimize _ person_visitation '. 'person _ id' = 4 AND 'qsoptimize _ City '. 'name' LIKE '% city % ');

Analyze these request events in detail.

As we all know, QuerySet is lazy and will access the database only when it is used. When the second line of Python code is run, the for loop regards plist as an iterator, which triggers database queries. The first two SQL queries are caused by prefetch_related.

Although the query results contain all the required city information, the filter operation is performed on Person. visitation in the loop body, which obviously changes the database Request. Therefore, these operations will ignore the previously cached data and re-query the SQL statements.

But what should I do if I have such a requirement? In Django> = 1.7, you can use the Prefetch object in the next section. If your environment is Django <1.7, you can perform this operation in Python.
 

Plist = Person. objects. prefetch_related ('visitation') [[city for city in p. visitation. all () if u "city" in city. name] for p in plist]

Prefetch object

In Django> = 1.7, you can use the Prefetch object to control the behavior of the prefetch_related function.

Note: Because I have not installed the Django environment of version 1.7, this section is written by reference to the Django document and has not been tested.

Features of the Prefetch object:

  • A Prefetch object can only specify one prefetch operation.
  • The Prefetch object specifies the fields in the same way as the parameters in prefetch_related, and is completed by the field names connected with double underscores.
  • You can use the queryset parameter to manually specify the QuerySet used by prefetch.
  • You can use the to_attr parameter to specify the attribute name from prefetch.
  • Prefetch objects and lookups parameters specified in string form can be mixed.

Continue with the example above to find cities with the word "Wu" and "State" in the cities visited by all users:
 

Wus = City. objects. filter (name _ icontains = u "") zhous = City. objects. filter (name _ icontains = u "") plist = Person. objects. evaluate (Prefetch ('visitation', queryset = wus, to_attr = "wu_city"), Prefetch ('visitation', queryset = zhous, to_attr = "zhou_city"),) [p. wu_city for p in plist] [p. zhou_city for p in plist]

Note: This code has not been tested in the actual environment. If it is incorrect, correct it.

By the way, Prefetch objects and string parameters can be mixed.
None

You can pass in None to clear the previous prefetch_related. Like this:

>>> prefetch_cleared_qset = qset.prefetch_related(None)

Summary

  1. Prefetch_related optimizes the one-to-many and many-to-many relationships.
  2. Prefetch_related obtains the content of each table separately, and then uses Python to process the relationship between them for optimization.
  3. You can use the variable length parameter to specify the field name that requires select_related. The specified method and feature are the same as those of select_related.
  4. In Django> = 1.7, you can use the Prefetch object to implement complex queries, but in earlier versions, Django seems to be able to implement only by itself.
  5. As the prefetch_related parameter, Prefetch objects and strings can be mixed.
  6. The prefetch_related chain call adds the corresponding prefetch instead of replacement, which does not seem to be different based on different versions.
  7. You can pass in None to clear the previous prefetch_related.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.