Django implements database sharding

Source: Internet
Author: User
Tags database sharding

As the business continues to expand, the user accumulation increases, and the database pressure increases. One way is to optimize the use method, that is, to optimize the SQL statement and add cache to reduce access. The other way is to modify the architecture, "database/table sharding" at the database level 」.

In the past, when I used a mobile game server, the database used the nxm structure, that is, N databases and m tables. User ID hash is used to distribute different users to different tables to achieve the goal of "balancing. Database/table sharding is a common solution to database pressure. It is applicable to many business scenarios, such as social apps, user tables, and user comments.

An example of what I need is: log statistics platform, of course, log storage is indispensable. The log feature is that there is no association between each other (simple business). It will always be incrementally reported (large volume), stored in a single table, and query performance problems will soon occur. This is probably the most suitable business scenario for database/table sharding.

Orm is almost the "natural enemy" of database splitting (essentially they have different design strategies ). I am using Django. Basically, Django's ORM can "split tables" and say goodbye. A model corresponds to a table. If there are 10 tables, we need to write 10 models, it is troublesome to use and is not easy to expand and maintain. Django provides a method to use multiple databases at the same time. By configuring routing rules to select the database to use, it seems that "vertical database sharding" is feasible, this article describes how to implement the sub-database of log storage in the log statistics platform.

BTW, before that, my log system was isolated from Django and directly encapsulated a layer of MySQL interface to implement a 10*10 log repository table structure, it took a while and there was no problem. The defect is that it is too cumbersome to add new functions, and multiple interfaces are encapsulated for queries of different services. Django is not the most annoying time.migrateAdding or deleting fields will become complicated.

The log statistics platform has its own business and other data to be stored.defaultThe storage of logs will be divided into 10 databases and then hashed to these 10 databases according to the server ID. The library alias issharding0,sharding1,...,sharding9.

Create a project

First, createlogstatProject, and then createreport.

>>> django-admin startproject logstat>>> cd logstat>>> django-startapp report
Database Configuration

ConfigurationsettingConfiguringdefaultDatabases:

DATABASES = {    ‘default‘: {        ‘ENGINE‘: ‘django.db.backends.mysql‘,        ‘NAME‘: ‘logstat‘,        ‘USER‘: ‘logstat_user‘,        ‘PASSWORD‘: ‘logstat_password‘,        ‘HOST‘: ‘localhost‘,        ‘PORT‘: ‘3306‘,        ‘CHARSET‘: ‘utf8‘,    },}

Next, configure the database with sub-database logs. For ease of writing the demo, change it to two databases:

DATABASES = {    ‘default‘: {        ‘ENGINE‘: ‘django.db.backends.mysql‘,        ‘NAME‘: ‘logstat‘,        ‘USER‘: ‘logstat_user‘,        ‘PASSWORD‘: ‘logstat_password‘,        ‘HOST‘: ‘localhost‘,        ‘PORT‘: ‘3306‘,        ‘CHARSET‘: ‘utf8‘,    },    ‘logsharding0‘: {        ‘ENGINE‘: ‘django.db.backends.mysql‘,        ‘NAME‘: ‘logsharding0‘,        ‘USER‘: ‘logstat_user‘,        ‘PASSWORD‘: ‘logstat_password‘,        ‘HOST‘: ‘localhost‘,        ‘PORT‘: ‘3306‘,        ‘CHARSET‘: ‘utf8‘,    },    ‘logsharding1‘: {        ‘ENGINE‘: ‘django.db.backends.mysql‘,        ‘NAME‘: ‘logsharding1‘,        ‘USER‘: ‘logstat_user‘,        ‘PASSWORD‘: ‘logstat_password‘,        ‘HOST‘: ‘localhost‘,        ‘PORT‘: ‘3306‘,        ‘CHARSET‘: ‘utf8‘,    },}
Create a database
create database logstat charset=‘utf8‘;grant all on logstat.* to ‘logstat_user‘@localhost identified by ‘logstat_password‘;create database logsharding0 charset=‘utf8‘;grant all on logsharding0.* to ‘logstat_user‘@localhost identified by ‘logstat_password‘;create database logsharding1 charset=‘utf8‘;grant all on logsharding1.* to ‘logstat_user‘@localhost identified by ‘logstat_password‘;
Add model class

Inlogstat/report/models.pyAdd the log format to be stored:

Class log (models. model): serverid = models. integerfield ('server id') logid = models. integerfield ('Log type') DESC = models. textfield ('Log content', blank = true) report_dt = models. datetimefield ('reporting Time ')

Then add the appINSTALLED_APPS../manage.py makemigrationsGenerate the migrations file.

Synchronize databases:

./manage.py migrate./manage.py migrate --database=logsharding0./manage.py migrate --database=logsharding1

At this time, we found that all migrations aredefault,logsharding0,logsharding1Tables are created separately. This is obviously not what we want. What we want isreportThe model in the app is notdefaultCreated inlogshardingxAnddefaultAnd do not wantlogshardingx.

In this case, we need to add a database router.

Database Router

Inlogstat/reportCreatelog_router.pyFile to add routing rules:

class LogRouter(object):    def allow_migrate(self, db, app_label, model_name=None, **hints):        if app_label == ‘report‘:            return db == ‘logsharding0‘ or db == ‘logsharding1‘        return None

InmigrateIf it isreportAnddatabaseYeslogshardingxYes. Otherwise, no.

Insetting.pyTo make it take effect:

DATABASE_ROUTERS = [‘report.log_router.LogRouter‘,]

Synchronize databases:

./manage.py migrate./manage.py migrate --database=logsharding0./manage.py migrate --database=logsharding1

At this time, we found thatreport_logThe table is no longerlog_statLibrary, but only appears inlogshardingx. HoweverlogshardingxThere will still beauth_group,auth_group_permissions... Tables of these Django components. By now, we have achieved database sharding. We don't have to worry about these extra tables, but we always feel that they are not elegant.

These additional apps areadmin,auth,contenttypes,sessions,messages,staticfiles, We also need to set routing rules for them, inlogstatCreatedefault_router.pyAdd routing rules so that other apps can only selectdefault:

class DefaultRouter(object):    def allow_migrate(self, db, app_label, model_name=None, **hints):        if app_label == ‘admin‘            or app_label == ‘auth‘            or app_label == ‘staticfiles‘            or app_label == ‘sessions‘            or app_label == ‘messages‘            or app_label == ‘contenttypes‘:            return db == ‘default‘        return None

AddDATABASE_ROUTERSMedium,

DATABASE_ROUTERS = [    ‘logstat.default_router.DefaultRouter‘,    ‘report.log_router.LogRouter‘,]

Run againmigrate --database=xxxOnly two tables are created,django_migrationsAndreport_logThis is what we want.

Note:django_migrationsThis table must exist. It is a database.migrateRecord to ensure replayingmigrate,migrationsFiles are not repeatedly executed.

Database shard usage

Django provides the usage of the specified database in the multi-database document, but I personally prefer a simple rule: "operate only one database at the same time 」. The data queried from that database, whether modified, saved, or deleted, operates only the same database. For example:

>>> user_obj.save(using=‘new_users‘)>>> user_obj.delete(using=‘legacy_users‘)

This type of user migration from one table to another should be more explicit and should be migrated in the business rather than usedusingKeyword and Django design. Although this is very convenient, it is a disaster for the person who maintains the code! The so-called business migration is to first create a new user (User.objects.using(‘new_users‘).create()), And then delete the old user (User.objects.using(‘legacy_users‘).filter().delete()). The logic is clear.

Automatically select a library

Log storage needs: Based onserveridSelect a model in the repository and encapsulate a function:

def db_slice(serverid):    slice_list = (        ‘logsharding0‘,        ‘logsharding0‘    )    return slice_list[serverid % 2]

Instance used:

# Create an object log. objects. using (db_slice (1 )). create (serverid = 1, logid = 1001, DESC = 'lala ', report_dt = datetime. now () # query object log. objects. using (db_slice (1 )). all ()
End

Since then, the example of a sub-database has been completed and is relatively simple. Let's talk about some personal ideas, select a framework, and select a technology. Because of the different business scenarios, it is difficult to have a perfect solution, and there is always a trade-off.

For example, it is not difficult to write a database/table sharding policy that is directly independent of Django, but the common API is not used because of Django (create,filter,deleteIt encapsulates an SQL operation for each operation. It is difficult to test, flexible, and poor scalability. It is important to prevent SQL injection.

If you use Django's multi-DB implementation policy, there is also a problem. The first is routing. If a new app forgets to set routing rules, it is easy to generate a table where it is not expected to be generated. In addition, the Django official documentation also says that it does not check non-defaultMigration consistency (may be supported in Versions later than 1.10 ). The second is the foreign key. After database sharding is used, the foreign key constraint does not exist. This is not a problem. In addition, each time you access the database, you must useusingSelect a database explicitly. I forgot that if there is no problem with the routing rule settings, it would be okay to directly report an error. If the selection is wrong, it would be a headache because the structure of each database is the same and it is difficult to find out the problem. This increases the complexity of writing code.

My suggestion is that since there is no perfect solution, we should try our best to ensure that the logic is simple and clear, so we should not rely too much on the framework and use less hack skills. Like the one above

>>> user_obj.save(using=‘new_users‘)>>> user_obj.delete(using=‘legacy_users‘)

Although feasible, I personally think this is very undesirable.

Django implements database sharding

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.