Let's talk about some of my views on memcache and MongoDB. I hope to see your comments and comments.
Memcache
The advantages of memcache are summarized as follows:
1) distributed. 10 machines with 4 GB memory can form a 40 Gb memory pool. If you think it is not large enough, you can increase the memory pool, most of the hotspot business data can be stored in the memory to block most of the database read requests, releasing considerable pressure on the database.
2) Single Point. If the Web server or app server performs load balancing, the cache stored in the respective memory may be different. If data needs to be synchronized, It is troublesome (each of them has its own expiration time, or distribution data synchronization ?), Even if data does not need to be synchronized, user experience may be unfriendly due to data inconsistency.
3) high performance. There is no doubt that, compared with databases, the root cause is the gap between memory read/write efficiency and disk read/write efficiency by several orders of magnitude. Sometimes, when we complain that the database reads and writes are too poor, we can look at the disk I/O. If it is indeed a bottleneck, it is not possible to install a powerful database, not strong enough is nothing more than how much memory the database fully utilizes.
However, we do not recommend using memcache to replace any cache in any situation:
1) if the value is very large, it is not suitable. By default, memcache only supports 1 MB of value (the key limit is not the biggest problem ). In fact, from the perspective of practice, it is not recommended to store very large data in memcache, because there is a serialization deserialization process, so don't underestimate the CPU it consumes. Speaking of this, I always think that memcache is suitable for output-oriented content caching, rather than processing-oriented data caching, that is, it is not suitable for storing large volumes of data and processing the data, it is suitable for output directly when it is taken out, or it does not need to be processed directly.
2) If expiration is not allowed, it is not suitable. By default, memcache expires for a maximum of 30 days. When the memory usage limit is reached, memcache recycles the minimum amount of data used. Therefore, if we want to treat it as a static variable, we need to consider this problem and there must be a process of re-initializing the data. In fact, this should be the case. Since the cache is used to get the cache, there must be a process of re-obtaining and re-caching without thinking that it will always exist.
There are also some problems or best practices when using memcache:
1) Clear some data. Memcache is only a key/value pool, and can be used by anyone on a bus. I think for similar public resources, it is easy to see problems if all users follow their own rules. Therefore, it is best to use a namespace-like concept in terms of key values. Every user can clearly know the key range or prefix of a function. The advantage is that if we need to clear it, we can find a batch of our own keys based on this specification and then clear it, instead of clearing all the keys. Of course, some people adopt the version upgrade concept. Let the old key pass and it will be cleared by then. This is also a method. However, it is always advantageous to have a standard key, which is more convenient in statistics.
2) Organization of value. That is to say, the granularity of the data we store. For example, to save a list, whether it is stored in a key value or saved as a key value in a unified manner depends on the business. If the granularity is small, it is best to obtain it in batches and save it in batches. The fewer cross-network calls, the better. You can think about it. If a page needs to output 100 rows of data, each data needs to be obtained once, the performance of hundreds of connections to a page is not a problem.
What functions does memcache mainly use?
In fact, I think we can always consider whether distributed caching can be applied to the cache in the memory, however, the main purpose is to block the read requests at the front end or in the middle to release the pressure on the Web server app server and DB.
MongoDB
MongoDB is a good non-relational database document-type database. Its advantages are mainly reflected in:
1) Open Source. It means that even if we do not change it can fully explore it, ms SQL in addition to reading those documents, who know how it is implemented internally.
2) free of charge. This means that we can attach a large number of instances to a large number of junk servers, even if the performance is not very high, it cannot hold a lot of points.
3) high performance. Others have not been compared. Compared with ms SQL, the same applications (mainly write operations) can support up to 500 users, and one can support up to 2000. Even if there is no index after the amount of data goes up to one million, the insertion performance of ms SQL is also messy. In fact, everything is relative, after the complexity becomes perfect will sacrifice a part of the performance, ms SQL reflects the very strong security of data integrity, this is MongoDB can not do.
4) The configuration is simple and flexible. In the production environment for Database Configuration failover cluster and read/write separation of database replication is a very common requirement, ms SQL configuration cumbersome steps are still very scary, mongoDB can configure the required failover group within five minutes. read/write splitting takes only one minute. Flexibility is embodied in that we can configure one m one s, two m one s (the data written by two m will be merged to S for reading ), one m and two s (the data written by one M has an image on two S), or even multiple m and multiple s (in theory, 10 m and 10 s can be created, we only need to write data to any m by means of polling. When we need to read data, we can also train any one S, of course, we need to know that it is impossible to ensure that all s have consistent data at the same time ). You can also configure two m pairs as a Failover cluster, and then configure two sets for this cluster, corresponding to two s, that is, four M corresponds to two s, ensure that M points have failover.
5) flexible use. In the previousArticleI mentioned that MongoDB can even support SQL statement queries by converting SQL to JS expressions. In any case, it is very convenient to query MongoDB.
As mentioned before, not all database applications use MongoDB instead. Its main disadvantages are:
1) features of open-source software: Fast updates and imperfect application tools. Because of the fast update, our client needs to be updated with it to enjoy some new features. The fast update also means that an important function is likely to be missing at a certain stage. In addition, we know that ms SQL provides excellent GUI tools for database maintenance in Dev/DBA/ADM dimensions. While MongoDB provides someProgramBut not very friendly. Our DBA may be very depressed to optimize MongoDB queries.
2) operational transactions. MongoDB does not support built-in transactions (no built-in transactions do not mean that there is no transaction function at all), which is not suitable for some applications. However, this problem does not exist for most Internet applications.
When using MongoDB, you may encounter the following problems:
1) real horizontal scaling? In the process of using memcache, we have already realized this kind of cool. Basically, we can add unlimited machines for horizontal scaling, because we use the client to determine the instance on which the key value is stored. When we obtain the key value, it is very clear on which instance it is on. Even if we obtain multiple key values at a time, the same is true. For databases, we have sharding in a variety of ways, not to mention anything else. How can we get batch data based on certain conditions during queries? For example, we split the data by user ID, but the query does not care about the user ID at all. We care about the user's age and education level, and finally sort the data by name. Where can we retrieve the data? Whether it is client-based or server-based sharding, it is very difficult to do so, and even if the automatic sharding performance is not guaranteed. The simplest thing is to try to divide data by function, and then it is the concept of historical data. It is still very difficult to achieve real-time data distribution across nodes.
2) multithreading and multi-process. When the write speed is lower than expected, we can open several threads to write data at the same time, or several MongoDB processes (the same machine), that is, multiple database instances, and then write data to different instances. Can this improve performance? Unfortunately, it is very limited. It can be said that it cannot be improved at all. Why can I increase the write speed by enabling multiple threads when using memcache? That's because we have not reached the bottleneck of memory data exchange, and for disks, the IO bottleneck is easily reached by dozens of megabytes per second. Once this bottleneck is reached, no matter how many processes are running, the performance cannot be improved. Fortunately, MongoDB uses memory ing and sees more memory usage. In fact, I have a little more confidence in it (I think it is easier to leave it idle when the memory usage is too much ), I am afraid that a database does not use any memory. I am afraid that the memory and CPU are still insufficient when I/O bottleneck occurs.
Cooperation between memcache and MongoDB
In fact, with memcache and MongoDB, we can even free more than 80% of applications from traditional relational databases. I can think that they can actually cooperate with each other to make up for each other's shortcomings:
Memcache is suitable for storing values based on keys. Sometimes we don't know which keys need to be read. What should we do? I was wondering if I could regard MongoDB or the database as a raw data. This raw data is divided into two parts: the field to be queried (index field) and the common data field, store a large number of non-query fields in memcache for small granularity. During the query, We query the database to know which data to obtain. Generally, 20-rows are displayed on the query page, then obtain the data from memcache at one time. That is to say, the read Pressure of MongoDB is mainly the index field, and the data field is only useful when the cache is invalid. memcache is used to block the query of most substantive data. Conversely, if we want to clear the data in memcache, we also know the keys to be cleared.