MySQL + mencached preferred choice for large-scale Web Applications

Last Update:2018-12-05 Source: Internet

Author: User

Tags server memory database sharding

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This is the fotolog experience. It is said that fotolog has deployed 51 memcached instances on 21 servers and cached up to GB of content on 21 servers, this number is much larger than that of databases on many websites. The original Article is a bunch of great strategies for using memcached and MySQL better together. Here I will continue to provide selective translation and supplement based on my understanding, thanks to Todd Hoff, we can always give us some learning cases. From here, we can also see the open attitude of foreign technologies, which is not like us. In fact, we are still waiting for the answer. Well, let's get started.

1. About memcached

Do not know this yet? When you go to the interview, you will suffer a loss. Please go to the official website to check your website (Facebook is in use), eaccelerator, or xcache (developed by Chinese people). These products have better standalone performance, if you need a distributed cache solution, use memcached.

Ii. How does memcached work with MySQL?

Database sharding is used to solve the problem of database write extension. Database sharding is deployed on different servers, so that there is no primary server, and write operations become bottlenecks and possible "single point of failure ", generally, database sharding mainly divides services according to the business, and splits services as much as possible. Unrelated data is made into services independently.
The front-end MySQL and a bunch of memcached servers are used to cope with read problems. The application first obtains data from memcached and then obtains data from the database.
Stored in memcached. I have read an article saying that 95% of application data is obtained from memcache, and 3% of data is obtained from MySQL query.
The cache is obtained, and the remaining 2% are used for table search. How far is the gap between your applications?
Solve the read problem through MySQL replication (master-slave)
First, the MySQL database uses master-slave read/write splitting and multiple slave operations to handle application read operations.

Iii. Why not use MySQL query cache?

We all know that MySQL has a query cache that can cache the results of the last query, but it does not actually help much. below is the insufficiency of MySQL quety cache:

Only one instance can be created.
This means that the upper limit of the content you can store is the available memory on your server. How much memory can a server have? How much can you store?
MySQL query cache becomes invalid as long as there is a write operation.
As long as the database content changes slightly, other rows may be changed, and the query cache of MySQL will also become invalid.
MySQL query cache can only cache database data rows
This means that no other content is available, such as arrays, such as objects. In theory, memcached can cache any content or even file ^_^.

Iv. fotolog Cache Technology

Non-deterministic cache: you are not sure whether the data cache you want is in or not, and you do not know if it has expired. So you may tentatively ask memcached, what data do you want? Me
Don't use expired data. memcached tells you that you are happy to give it to you. If not, you need to get it from the database or somewhere else. This is a typical memcached.
. Mainly used in: 1. complex data needs to be read multiple times. Your Database performs sharding. It is a huge overhead to obtain and combine data from multiple databases.
Save it to memcached.
2. a good alternative to MySQL query cache, so that other database departments have changed, as long as they have not changed (pay attention to the database update issue, which will be mentioned later)
3. cache the relationship or list, such as the list of multiple articles under a topic.
4. data that is called by multiple pages and is slow to obtain, or data that is slow to update, such as the ranking of articles
5. If the cache overhead exceeds the overhead for re-retrieval, do not cache it.
6. Tag Cloud and automatic recommendations (similar to Google sugest)
For example, if a user uploads an image and the user's friends page lists the image, cache it.
Potential problems:
Memcached mainly consumes server memory, which consumes a small amount of CPU. Therefore, fotolog deploys memcached on their application servers (which seems to be the same ), they encountered a CPU usage rate of 90% (How can it be so high? What's wrong?), memory recycling (this is a big problem), and so on.
Status cache stores the current status of the Application Service in memcached. It is mainly used for: 1. Expensive operations with high overhead.
2. For sessions Sessions, check that the sessions are stored in the database. It is cheaper to store memcached. If the memecached server is down, log on again.
3. Record Online User Information (we did the same)
The deterministic cache caches all the content of certain databases to memcached. There is a dedicated Application Service to ensure that all your data is stored in memcached.
Other application services directly obtain data from memcached instead of the database, because all databases have been saved to memcached and kept in sync. Mainly used in: 1. Read Extension
Show, all reads are obtained from memcached, and the database has no load
2. "never expire" (relative) data, such as administrative planning data. The change is very small.
3. frequently called content
4. User authentication information
5. User Summary
6. USER parameter settings
7. List of commonly used media files, such as images
8. the user logs on to the database instead of passing through memcached (I personally think this is not very good. The login information still needs to be persistent, and the effect is good if it is similar to bdb)
Usage:
1. multiple specialized cache pools are not a large cache server. Multiple cache pools ensure high availability, and one cache instance is suspended from other cache instances, all the cache instances are suspended and go to the database (it is estimated that the database cannot resist ^_^)
2. All cache pools are maintained by programs. For example, when the database is updated, the program automatically synchronizes the updated content to multiple cache instances.
3. After the server is restarted, the cache is started first than the website. This means that when the website is started, all the caches are available.
4. Read requests can be balanced to multiple cache instances, featuring high performance and high reliability.
Potential problems:
1. You need enough memory to store so much data.
2. Data is recorded in rows, while memcached stores data in objects. Your logic needs to convert the data in rows and columns into cached objects.
3. It is very troublesome to maintain multiple cache instances. fotolog uses Java/hibernate and they write a client to poll
4. Managing multiple cache instances will increase the overhead of the application. However, these overhead are nothing compared with the benefits of multiple caches.
The active cache data appears in the cache in magic. When there is an update in the database, the cache is immediately filled, and the updated data is more likely to be called (such as a new article, reading people)
Of course), a kind of deformation of non-deterministic cache (the original article is it's non-deterministic caching with
Twist. I think this translation is strange ). The main applications are: 1. pre-fill cache: Make memcached call MYSQL as little as possible if the content is not displayed.
2. "push" cache: When you need to copy data across data centers
Procedure:
1. parse the binary logs of database updates. memcached is also updated when the database is updated.
2. Execute user-defined functions, set trigger call UDF update, specific reference http://tangent.org/586/Memcached_Functions_for_MySQL.html
3. Use blackhole policies
In the legend, Facebook also uses the MySQL blackhole storage engine to fill the cache, and the data written to blackhole is copied to the cache. Facebook uses this
The advantage of setting data invalidation and cross-border replication is that database replication does not take MySQL, which means there is no binary log and there is not much CPU usage? Through memcached
Store binary logs and then copy them to different databases? Experienced comrades can add on this topic .)
The file system caches the file directly in memcached. Wow, it's BT enough to reduce the burden on NFS. It's estimated that only those popular images will be cached.
Some page content is cached. If some parts of the page are difficult to obtain, it is better to cache the original data of the page to directly call the content of the page.
Application-level replication updates the cache through APIS. The API execution details are as follows: 1. an application writes data to a cache instance, which copies the content to other cache instances (memcached synchronization. automatically obtain the cache pool address and instance count
3. Update multiple cache instances at the same time
4. If a cache instance goes down, jump to the next instance until the update is successful.
The entire process is very efficient and cost-effective.
Other Tips 1. multiple nodes to cope with "single point of failure" 2. Using Hot Standby technology, when a node is down, another service is automatically replaced with its IP address, so that the client does not need
New memcached IP address 3. memcached can be accessed through TCP/UDP, and continuous connections can reduce load. The system is designed to withstand 1000 connections at the same time.
4. Different Application Services and different cache server groups
5. Check whether your data size matches your allocated cache. For more information, see http://download.tangent.org/talks/Memcached%20Study.pdf.
6. do not consider data row caching and cache complex objects.
7. Do not run memcached on your database server. Both are memory-consuming monsters.
8. Do not be troubled by TCP latency. The local TCP/IP has optimized the memory replication.
9. Try to process data in parallel
10. Not all memcached clients are the same. Study the corresponding language carefully (as if PHP and memcached work well together)
11. Try to expire the data instead of making the data invalid. memcached can set the expiration time.
12. select a good cache ID key, such as the version number added during the update.
13. Store the version number in memcached.

The author's last speech I will not translate, it seems that MySQL proxy is doing a project, automatic synchronization of MySQL and memcached, more refer to the http://jan.kneschke.de/2008/5/18/mysql-proxy-replicating-into-memcache

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More