Introduction
If you have a successfulGaeApplication,Inevitably, You need to modify your database architecture..This article uses a small example to introduce two basic steps for modifying the database architecture:
- Update Data Model class definitions
- UpdateDatastoreExisting data entity in(This step is not always necessary,The following will).
Before getting started
When updating your data model,YouMay needTo be temporaryDo not update data in your application.Whether or not it depends on your application, but in some cases, disabling user input temporarily makes it much easier for you to update existing data.
Update your data model
Here is an example of a simple image:
Classpicture (db. Model ):
Author = dB. userproperty ()
Png_data = dB. blobproperty ()
Name = dB. stringproperty (default = '') # uniqueImage name
Let's modify this model to add a score for each image.To save the score, we save the number of user ratings and the score. It is easy to update this data model. We just add two new attributes:
Classpicture (db. Model ):
Author = dB. userproperty ()
Png_data = dB. blobproperty ()
Name = dB. stringproperty (default = '') # unique name.
Num_votes = dB. integerproperty (default = 0)
Avg_rating = dB. floatproperty (default = 0)
Now, all new entities saved to datastore will receive a default score of 0. Note that the existing data in datastore is not automatically modified, so they do not have these attributes.
Update existing data entities
App EngineDatastoreNot all data must have the same set of attributes.After updating your data model, existing data entities will not have these attributes. In some cases, this is enough. You don't need to do anything.When do you want to update existing data so that they have new attributes?One scenario is when you want to query new attributes. In our picture example,Querying images that are "Most Popular" or "least popular" does not return the data before the update because they do not have the corresponding rating attribute.To solve this problem, we need to update the existing data entities in datastore.
Conceptually, it is easy to update existing data entities. You only need to createRequest handler,Retrieve each object, set the value of the new attribute, and save the data. There are two problems that must be solved:
- The maximum number of returned data sets for query is 1000. If there are more than 1000 records, multiple queries are required to obtain all records.
- Gae requires HTTPRequestMust be returned within a short period of time. OtherwiseRequestTimeout. If there is a lot of data,Request handler cannot process all data in one request.
The solution is Request Only a small part of data is updated. By making multiple Request , We update all the data without exceeding the query limit and request timeout limit. For simplicity, we update only one data record in a request, as shown in the following code:
- Read a data entity
- Set the property value (if the property has a default value, it is automatically set)
- Save data
- Mark with Meta Refresh to allow the browser to access the URL for updating the next data
Warning: when writing and reading a query, you should avoid using offset (which is not suitable for large datasets), but use the where statement to limit the number of returned data. This is easy if your data already has a unique value attribute. In this exampleName)
Is unique, so weName
The where statement is used.
CodeAs follows:
# Request handler for the URL/update _Datastore
Def get (Self ):
Name = self. Request. Get ('name', none)
If name isnone:
# First request, just get the first name out ofDatastore.
PIC = models. Picture. gql ('order by name desc'). Get ()
Name = pic. Name
Q = models. Picture. gql ('where name <=: 1 order by name DESC ', name)
Pics = Q. Fetch (Limit = 2)
Current_pic = pics [0]
If Len (PICs) = 2:
Next_name = pics [1]. Name
Next_url = '/update _ Datastore ? Name = % s' % urllib. Quote (next_name)
Else:
Next_name = 'finished'
Next_url = '/' # Finished processing, go back to main page.
# In this example, the default values of 0 for num_votes and avg_rating are
# Acceptable, so we don't need to do anything other than call put ().
Current_pic.put ()
Context = {
'Current _ name': Name,
'Next _ name': next_name,
'Next _ url': next_url,
}
Self. response. Out. Write (template. Render ('Update _ Datastore . Html ', context ))
The corresponding template shows which record we are updating and automatically transfers it to the next record using Meta Refresh:
<HTML>
<Head>
<Metahttp-equiv = "refresh" content = "0; Url ={{ next_url}"/>
</Head>
<Body>
<H3> Update datastore <Ul>
<Li> updated :{{ current_name }}</LI>
<Li> about to update :{{ next_name }}</LI>
</Ul>
</Body>
</Html>
If your data does not have an attribute with a unique value, the preceding example cannot be used directly (when a property value corresponds to many records, it will be very slow ). You need to expand it to handle this situation.However, the concept is the same. You can use the WHERE clause to limit the number of returned datasets in a query to update data in batches and then retain the data one by one.
Slave Remove deleted attributes from datastore
If you remove an attribute from the data model, you will find that the existing object still has this attribute. It is still displayed in the Admin console and still exists in datastore. To truly clean up old data, you need to traverse each record and remove attribute values one by one.
- Make sure that you have deleted the attribute from the data model definition.
- If your data model class inherits from DB. model, it inherits from DB. expando. (The db. Model instance cannot be dynamically modified. This is what we will do next ).
- Traverse all existing data (as described above ).For each data entry, use
Delattr
Delete the attribute and save the data.
- If your data model was originally inherited from DB. Model, do not forget to change it back.
Later
This method of processing multiple requests is now feasible. But now we are working on some offline processing solutions. When these become available, it may provide a more reasonable way to modify your data without increasing the burden on the server. You can subscribe to my blog to track the latest progress.