Discuz! Discussion on the System Scheme and Transformation Mode of cluster X deployment

Source: Internet
Author: User
Tags file url database sharding

Data sharing and synchronization between WEB servers is the core issue for multi-WEB deployment. In terms of data storage, Discuz data consists of two parts, one part of which is stored in MySQL database user, post, and other text and structured data ), some files are stored as file attachments and cached files ). Among them, data stored in MySQL can be easily shared among multiple servers, and there are mature solutions for expansion and redundancy. Here we mainly discuss the data of Discuz file type, part of which involves the content of multiple MySQL servers.
Data of the Discuz file type is stored in the DISCUZ_ROOT/data directory. The main functions of each directory are as follows: data/attachment class
Data/log running log data/cache Configuration Parameter-class cache files are SQL by default, and configuration parameters are cached through the pre_common_syscache table), CSS cache, and some JS cache data/template module caches

Data/threadcache Forum Page Cache Optimization for visitors)


Several important file locks exist in the DISCUZ_ROOT/data directory)
Data/install. lock the installer is locked. If the file exists, the installer in DISCUZ_ROOT/install/cannot be executed.

Data/sendmail. lock send mail lock. By default, Discuz uses a method similar to home. php? Mod = misc & ac = sendmail & rand = 1379315574 this hidden page is called, and the Browser Side uses a 300-second cookie to control the frequency of user browsing behavior trigger mail sending process, the server side uses sendmail. lock file mtime control frequency 5 seconds ). If you can control the server, you should optimize this mechanism.

Data/updatetime. lock the lock used by a management background.

Data/update. lock System update lock. This file lock is generated when the Version Upgrade Program is executed, such as x2 to x3.


The following functions involve data sharing and synchronization between multiple web servers. By default, Discuz is implemented through MySQL. User session table pre_common_session Management Panel session table pre_common_admincp_session

System Configuration item cache table pre_common_syscache


Assume that two web servers are deployed and the web server is also a php application server ). We need to solve the problem of data directory sharing. Introducing the NFS service can solve this problem simply. Server reuse, which is not listed here. Here we will have a selection of directories on NFS. From the analysis above, we can place the data directory on NFS, that is, each web server can deploy the program file independently, to mount NFS to a data directory node, you must deploy the program file on each web server to solve the problem of updating and deploying the program file, the advantage is that it can save the overhead for the web server to retrieve program files on NFS over the network. If the graph is convenient, you can also put the program file on NFS. All the files have only one copy, and the program update is convenient, the disadvantage is that it will increase the overhead for web servers to fetch program files through the network. We recommend that you use the first method to balance the two.


The above solution has some problems. When a user accesses an attachment, all WEB servers need to retrieve files from NFS over the network, which puts pressure on the Intranet and unnecessary overhead, this can be mitigated by adding a caching mechanism on the web Front End, such as squid and nginx proxy cache ). Configuring a separate domain name for access to static resources is also worth implementing. Discuz can easily do this, by configuring the "local attachment URL address" item, you can reconstruct the File URL in the data/attachment directory of the attachment class. However, the advertisement published in the background is not good, and the BUG is tested in version X3 ).


When using a file lock that depends on the execution logic of time values such as the file's mtime, ensure that the server clock is consistent.


The above scheme is simple, and the transformation of Discuz is small, and the maintenance cost is low. It is suitable for a single server with a small number of servers. As access traffic and web nodes increase, Intranet traffic, NFS, and MySQL need to be expanded. MySQL has mature extension solutions, such as the primary replication mechanism. NFS is a little troublesome, and many people criticize NFS's file sharing mechanism for being insecure. Forum attachments are mostly files of a small size of several hundred bytes, in linux, the size of the ext file system is generally 4 kb, which wastes storage space and is not good for inode utilization. In the long run, NFS will eventually become the bottleneck of the system, it is necessary to re-plan the file sharing/synchronization mechanism.


Here we will first discuss File Sharing, MySQL extensionThe problems involved are further explained.

Currently, many companies have solutions for storing large volumes of small files, such as GridFS, which is used by a large Internet company in China. Implementation Details are not covered in the discussion. The basic idea is to build a file storage service and store the attachment-type static files to a remote system to return a URL for access ), and the remote system handles various user access optimizations. We will discuss what needs to be paid attention to when Discuz connects to such a service.


Before that, let's take a look at the process of uploading and storing Discuz attachments.

Post attachment upload and post publishing are asynchronous. The Object File Uploaded By the attachment will be stored in a path similar to data/attachment/forum/201307/20/, and the pre_forum_attachment_unused in the attachment table will be added. When posting a post, these records are hashed to the pre_corum_attachment _ [0-9]) of the corresponding attachment table, and the pid and tid to which the record belongs are identified. This is the process for storing attachments locally.


Remote attachment

Discuz supports the global-upload settings-remote attachment function ). The remote attachment function supports storing attachments to a remote system through FTP. If the website does not have the file storage service, but you want to separate the file storage, using this built-in function is also a good option. After all, FTP is well maintained and does not need to be modified on Discuz. Although Discuz only supports FTP by default, remote storage is basically a common concept at the functional interface level, such as addition and deletion. Therefore, to extend "remote attachments" to your own remote file storage service, a good practice is to inherit the ftp class of Discuz, rewrite the methods defined by the ftp class of Discuz using the remote file storage function, and then call this new subclass at the place where the ftp class is instantiated. If you do not intend to retain the default FTP mechanism, you can even directly modify the ftp class implementation of Discuz, so that you do not need to modify the place where the ftp class is instantiated. This process minimizes the transformation of Discuz, and the details are hidden in the implementation of the ftp class, complying with the same behavior mode as ftp.


Remote attachments separate most of the static resource traffic. We can optimize the two systems separately. However, we still need to pay attention to the behavior pattern of Discuz remote attachments. We must be clear about how it runs and whether there is a certain trap to facilitate the abnormal behavior of some functions of the system, have a bottom in mind.


1) process of uploading and storing remote attachmentsThe asynchronous upload phase of attachments is the same as that of local storage. attachments are always uploaded to a path similar to data/attachment/forum/201307/20/, and the pre_forum_attachment_unused in the attachment table is added, when posting a post, these records are hashed out to the corresponding attachment table pre_corum_attachment _ [0-9]), and the pid and tid to which the post belongs are identified. Upload the attachment data to the remote end. After the attachment is uploaded, the local copy is deleted and the remote field of the attachment table pre_corum_attachment _ [0-9] is identified as the remote attachment type. 2) At what time does the client perform remote upload?The time when the attachment is uploaded to the remote end is when the post is published, and all the attachment blocking Models of the post are uploaded within the process cycle ). This may cause some risks. If it takes a long time to upload an attachment to an unreasonable system, there will be many attachments to a post, great attachments, or network bandwidth restrictions ), the response experience on the user interface is obviously stuck, and even the page times out and fails. 3) known data that will be stored to the remote endPost attachment 4) it is known that the data will not be stored to the remote end or will be stored locally)
  • The post attachment will be stored locally before it is uploaded for no reason.
  • The "Visitor view thumbnail" function must be enabled. Is a thumbnail similar to forum. php? Mod = image & aid = 4 & size = 100x100 & key = 05869b3720.ff990 & type = 1. The thumbnail is generated and stored in a file similar to data/attachment/image/000/00/00/123 .gif, and the local copy is retained.
  • The cover of the activity sticker is not stored as a remote attachment.
  • When you edit a post, the displayed thumbnail is WYSIWYG. The source image is obtained through remote network access, the image is stored in the data/attachment/temp directory. After the image data is output to the browser, the thumbnail stored in this location is deleted.
  • The thumbnails of image attachments are all forwarded through forum. php? Mod = image & xxx. Users can assemble requests so that the server side can execute the process of generating local copies of remote attachments. Therefore, there may be some risks.


From the above discussion about "remote attachments", we can see that even if the "remote attachments" mechanism is enabled, Discuz will still use multiple subdirectories under the data directory on many functions, some subdirectories must also be shared among multiple web servers, such as data/attachment ). Therefore, if we do not modify these features, we still need the NFS device. After all, the remote attachment has split most of the traffic. NFS is the easiest way to ensure normal business operation, who knows how many other functions will depend on this?


MySQL extensionIn the Discuz business model, the most widely used mechanism for MySQL expansion is "master-slave replication" and "read/write splitting ". Discuz also supports these mechanisms. The latest version also supports database sharding. We will not discuss details about the deployment of these mechanisms. When we deploy Discuz in such an environment, some adjustments need to be made. 1) Data in the master database and slave database is not synchronized due to replication delay.Although the delay of the replication mechanism is small, it always exists. Even this small delay is enough to bring behavior exceptions to the system, especially in the scenario of implementing "read/write splitting. In the Disuz business scenario, the following situations have occurred: parameters configured in the background do not work, it may be that the configuration item is written to the master database pre_common_setting ), at the same time, the configured cache pre_common_syscache is deleted in the master database, but before the data is synchronized to the slave database, another request finds that the pre_common_syscache is not cached, which triggers the cache generation process, this process will read from the database that the old configuration item "read" is performed in the slave database ). There are many other similar scenarios, such as credit refresh and abnormal administrator entry into the background. To solve this time series-related scenario, it is often necessary to handle the case and cache the issue. We recommend that you generate a cache at the program update time point, this avoids online configuration operations.

2) Only synchronize the data to be persisted to the slave database.

Data such as pre_common_session, pre_common_admincp_session, and pre_forum_threadaddviews should not be synchronized to the slave database. They are updated frequently and are temporary. They should be configured to ignore the synchronized table replicate_wild_wilignore_table option ), or a better way is to use other mechanisms such as memcache and redis to completely separate the database. When the slave database synchronizes the write operations of the master database, the write locks are also used, and these performance overhead are unnecessary and should be optimized, this allows the slave database to serve the reading and querying of core content to the maximum extent.


Notes for deploying multiple web Services

Transform built-in scheduled tasks.

By default, the built-in scheduled tasks of Discuz are triggered by user browsing. If you can control the server, this should be changed to the scheduled task driver of the operating system! Provided api. php? Mod = cron, just perform a transformation. Deploy only on a WEB server, and deploy each Discuz task separately as one scheduled task in the operating system. One option is to deploy only one scheduled task in one "per minute" cycle, then, the round robin operation of this task every minute will drive the built-in scheduling task mechanism of Discuz. We do not recommend this practice. After all, the number of scheduled tasks is limited, and the execution frequency is also planned.


Use "file" to cache configuration item data.That is, $ _ config ['cache'] ['type'] = 'file' Discuz uses the MySQL storage configuration cache table pre_common_syscache by default ). $ _ Config ['cache'] ['type'] = 'SQL' if the memory cache mechanism is enabled, such as memcache, the cached data will be stored in the cache, to relieve the pressure on the database. These cache items need to be called on every dynamic page. Therefore, if the page views are high, the overhead of this method in network and other aspects will be considerable.

We recommend that you use File Cache to cache this part of data. The cached file is deployed along with the program file. Each set of program files has a set of cached file copies with configuration items, thus completely eliminating this overhead. Advantage: writing the cache of configuration items to a file is a built-in mechanism of Discuz without modification. The disk file IO stability is the best, and the cost is the lowest, so as to avoid the risk of center node failure; cache of all configuration items has been generated at the launch time point, objectively achieving the "Warm cache" effect. Disadvantage: before the system is deployed online, all configuration item cache files need to be generated. Discuz's default cache file generation policy Discuz's default policy is "no cache is found and then generated ", for systems with high concurrency in a short period of time, this policy will often cause multiple processing processes to trigger write cache at the same time) cannot meet this requirement, which requires some development.


There are some suggestions for specific implementation. The generated cache file is included in version control like the program source code, so as to track configuration changes. The configuration data stored in the database is not so convenient through background UI operations. The configuration item cache is easy to generate. You only need to generate the configuration item according to the records in the pre_common_syscache table.


Likewise, the template cache file can be generated before the launch deployment, but it is troublesome to initialize the template cache file. We recommend that you collect the most common page entries and create scripts to trigger them.


If you want to override the default configuration item value of Discuz, we recommend that you enable a configuration file and overwrite the old value with the new value to avoid background UI operations, especially in the online environment, because the configuration parameters must be synchronized with the configuration item cache file to take effect.


The paths for deploying programs on all WEB servers must be the same. The cache items of Discuz will generate some absolute paths. In an example, the "Save location of local attachments" configuration item is operated by "Global-upload settings-Basic settings" in the management background ). Mysql> select * from pre_common_setting where skey = 'attachdir ';
+ ----------- + ------------------- +
| Skey | svalue |
+ ----------- + ------------------- +
| Attachdir |./data/attachment |
+ ----------- + ------------------- +

Table pre_common_syscache cname = 'setting' mysql> SELECT * FROM pre_common_syscache WHERE 'data' LIKE '% attachment % ';
Find 'attachdir' in the output result, which has the following content: s: 9: "attachdir"; s: 39: "D:/Apache/htdocs /. /data/attachment /";
These absolute paths must exist on all WEB servers and match the functions. If the deployment path of the website has changed, these cache item values will be regenerated.
"Forum Page cache": "Forum homepage cache" is only valid for visitors) and caches posts. These cached data is stored in the data/threadcache directory by default. In particular, these cached data has no expired deletion mechanism, A large number of cached files are often generated, and you need to monitor the cache directory. We recommend that you use a memcache-like device with an expiration control mechanism to modify the cache. You may need to weigh the network load. "Optimized the updated topic page views" and "delayed update of attachment downloads" Options. If this part of data is cached by writing files, we recommend that you store these files on various web servers instead of NFS to reduce network overhead and various locks ), when summarizing data to a database, you need to clear logs related to each WEB server. In the latest version of X, the "pre_forum_threadaddviews" table is used to cache the page views of topics. I personally feel that the file writing method is better for balanced performance. After the above optimization steps, only the data/attachment directory needs to be shared among the web servers, so mount NFS to this directory. Maintaining this location can be shared between WEB servers. This allows you to avoid many known and unpredictable problems with minimal transformation. Mount/PATH/TO/DISCUZ_ROOT/data/attachment NFS_SERVER:/PATH


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.