Bring you into the cache world

Source: Internet
Author: User
We are engaged inProgramMore or lessAlgorithm. In general, what is an algorithm? The algorithm is the interchange between "time" and "space. We often look at the time or space complexity of an algorithm. If we have enough time or space, the algorithm does not need it. Unfortunately, this condition does not exist, in some cases, we do not need to consider one of them. The "cache" we are discussing today is naturally an algorithm that uses space for time. Cache is to temporarily store some data in some places, which may be memory or hard disk. In short, the goal is to avoid some time-consuming operations. Our common time-consuming operations, such as database queries, calculation results of some data, or to reduce the pressure on the server. In fact, reducing the pressure is also caused by queries or computation. Although it is short and time-consuming, operations are frequent and accumulated for a long time, resulting in severe queuing and other situations, and the server cannot resist them)

The concept of things will not be mentioned for the moment. If more is said, it is a story. Now let's talk about various caches.

A beginner. net friend will first come into contact with the dataset class. In the fog of the cloud, he looks at the dataset example program. No matter what happens, use it. In fact, dataset is the cache. When we read a data set, If we process one piece of data every time we read it, our programs and databases will be connected all the time. If the time consumed for processing a piece of data is negligible, or you are the only one who uses this database, it doesn't matter if the database is connected all the time. We writeCodeYou do not need the dataset class. But in fact it is not time-consuming. If it is time-consuming, it will occupy the database connection until the processing is complete. If there are too many queries, the number of connections will take up too much, and the database will lock the table in some operations, this will cause other requests to wait, there will be query timeout, program exceptions and other phenomena. Therefore, we must take out the data first, then process the data, and close the database connection as soon as possible so that the database can process other requests. Therefore, it is important to select dataset or datareader in a timely manner (Note: datareader is the read method that holds the connection ).

You may be confused. Without knowing how to use the cache (Dataset), this is all done by. net. However, you may not know how to use or when to use the cache. Don't worry. Let's take a look.

As mentioned above, the cached data is nothing more than queries, computing results, and frequent queries of some databases. So what kind of data will we encounter in actual development? In fact, it is very common to think about it. For example, if the user's personal data after login causes the page to refresh every time he clicks the connection, we can't always re-query the database? We often use the session to store the information of this person. When he exits the system, we clear the session, so the session is also cached, but he does. NET provides us with good classes, sorry. I gave another example that you don't want to see, haha. In fact, sessions are private data. session data access must pass sessionid (for details, I will not talk about it much, but Google), which is not enough to explain the significance of cache. If this problem is extended, if we are developing a multi-user Blog system, we will query the information of this blogger whenever we access one of the blogs, if a and B access a blog at the same time, the ideal state is to query only once, rather than accessing the database for both people! Right? Actually... Yes, no! (In the story, the saying is "yes" or "no". If you say no, it is neither. :). The reason is not because, if our blog website is visited by several people every day and cannot be developed, we do not need to use the cache because the use of the cache brings more development complexity, because every time we update the data of a blogger, we do not only need to update the database information, but also process the cache. However, if the number of visits to our blog is very large, like in the blog garden, if we don't cache it, then the database server has long been gameover :). Now let's look at how to use the cache.

. NET Framework provides a ready-made cache class for us to use. Common examples are system. Web. httpruntime. cache. Every time we execute blogdataprovier. when using the getbloginfo () method (assuming this method is the method for obtaining the master information, as the name implies), you need to obtain data from the cache before querying. If the data does not exist, obtain the result from the database, store the result in the cache, and return the result. Below I will write out the pseudo code of this method, so as to let friends who have never used the cache know about it.

Public class sqldataprovider {public static object getbloginfo (string username) {// here we get bloginfo return NULL from the database;} public class blogdataprovider {public static object getbloginfo (string username) {var cachekey = "blog _" + username; var blog = cachehelper. get (cachekey); If (blog = NULL) {blog = sqldataprovider. getbloginfo (username); cachehelper. set (cachekey, blog);} return blog;} public class cachehelper {public static object get (string key) {return system. web. httpruntime. cache. get (key);} public static void set (string key, object Value) {system. web. httpruntime. cache. insert (Key, value );}}

 

Cache: the two words indicate the actual meaning. One is "save", we just saved it, and the other is "slow". The cache is generally used for temporary storage, its fate will be deleted or replaced, so the cache has a time limit problem. If you say that your data will never expire, I suggest you simply write it in the code.

The above example shows the httpcache class. It seems that we can use it to solve most of the cache problems, mainly the cache of public data (the so-called public data is the same data that you and I can access ). It is really important for new users to learn this type of usage through msdn, isn't it?

At the beginning, we talked about "changing space for Time". Currently, we only mention cache-related frequent queries. Are there some obvious examples of sacrificing space cache time? No problem. Look up!

We have inserted a sentence before, and our company is recruiting people. One of the PEN questions is to introduce the differences and usage of list <t> and dictionary <tkey, tvalue>. Unfortunately, I have interviewed many people. Only one of them answered the questions and all others said anything. Have you figured out how to answer the question? :) If you read the following and find that you want to be consistent with the current one, and you still need to find a challenging job, send me a message.

In fact, using the list <t>, Dictionary <tkey, tvalue> generic model is used to confuse people. Haha, some students will drag on the generic model and the results will be fooled, I can use arraylist and hashtable to ask.

What is the data structure of list? Array! It is also a dynamic array. The reason why dynamic is that space can be dynamically applied as needed. What is a dictionary structure? Some people reply to the dictionary. What data structure is a dictionary? Hash list! Hash. Once you hear this name, you will know that it is a discrete data table. How can this problem be solved? Naturally, keys are scattered. Each key corresponds to a value, so we often call it "key-value pairs". Keys and values are paired. When we regard dictionary as an array, the hash value of each key (what is the hash value? In. net, the gethashcode method is available for any type, and the int value is returned, with wood). It is the subscript of the array, and the element value of the array is value! Therefore, when we get the value of a key of a dictionary, the speed is very fast. We can get the value directly through the known subscript. the time complexity is O (1 ). Not fast? Fast, fast. However, do you think that the hash values of all keys are in order? Obviously not. The ghost knows what key you are using. Therefore, the array of dictionary is very long and a lot of null positions are wasted. Therefore, the space is changed for time. Of course, the gethashcode algorithms are different, and the distribution of key values is also different. Some algorithms are relatively tight and loose. common algorithms include consistent hash algorithms.

Actual Memory Distribution of dictionary

As shown in, the distribution of dict is not compact and sacrifices a lot of space, but data can be found most quickly. Therefore, dict, hash, map, and so on, no matter what the class is, all of them are hashtable. They are mainly used for queries. Therefore, if we cache the blog by username as the key, the user will use the username when accessing the blog. Therefore, we can get the information of the blogger without the blogid, the database is not used at all.

List, a compact data set, is generally used for batch processing. Of course, there are also data structures that take both space and speed into account, that is, the tree structure. During the search, no data needs to be traversed. the time complexity is generally O (logn), and the space is compact, instead of a compact array, the linked list structure is used. Therefore, there are no more time and space than the first two, but they are used very extensively. The indexes of the databases we use are basically used trees. This not only ensures that the occupied space is small, but also the query speed is not slow.

In the previous section, we introduced the basic principles of hash tables. Now we understand the advantages of caching. In actual project usage, apart from using the cache class provided by the system, I can try to write the cache class myself. Why not? Haha. We put a variable, namely, static, and then public, which is equivalent to the global variable. We can access it everywhere, and we need to use dict because it is fast enough! Don't bother writing one. Come back and try again!

 

As mentioned earlier, the word "slow" also has different policies. For example, the most common time-based Cache method is that the data is valid per unit time, during each access, you must determine whether the cached data has expired and then decide whether to get or remove the data. In addition to the time policy, there is also a heat policy, because the memory is limited, so our cache is not infinitely applied, it is time to limit the length. If the length is limited, someone has to go out when they can come in. This is the remove policy. We can mark all the caches to mark their heat. Each time we add a cache, we will remove the cache with the lowest heat (if the limit has been reached ). The cache heat is increased by 1 each time the cache is obtained. This is a user-friendly design, isn't it? This type of code has been posted in my previous blog. If you are interested, give you a portal.

We continue to use the blog Park as an example. We know that the access volume of the blog Park is already very large (I don't know how big it is, but it often times out when I post comments in the past, after the official team solved the problem, they also posted a blog post about how to solve the problem. In the result comments, a large number of students said that they didn't need to cache the problem. :).

When the website traffic reaches a certain level, it is difficult for one machine to handle too many httprequest requests. In this case, we must use multiple machines. If your program does not run on multiple machines at the same time, you may not have a deep understanding of the cache, because everyone will have this experience: Oh, sessio cannot be distributed? Oh, Mom, my cache cannot be on two machines. What can I do ?!

In fact, this cannot blame you. If you want to blame Microsoft, blame it. Because of IIS, our web programs reside in one process, and each httprequest has a thread for processing, so you haven't even used multiple threads. Harmful, haha. However, with the increase of project experience, especially the experience of large projects, there is nothing left. The reason is Microsoft's mistake is that PHP, Ruby, and servers (Apache, nginx, etc.) are multi-process. Each httprequest process has dozens of processes and processes concurrency. Multi-process means data sharing, just as with multiple machines. At this time, a shared cache process is required for other Web service processes to access and obtain the cache. This is the distributed cache.

If you didn't know what memcached was about two or three years ago, you might be excited. At that time, you were also popular writing Windows Services by yourself. But now nosql, MongoDB, memcached, and redis are all over the world. If you don't know about it, you should read more blogs. If you look at new technologies, you are already behind the times.

All of the terms mentioned above are cache players. Nosql is a new technology. There are many types of nosql databases. MongoDB is a hybrid database between traditional relational databases and memory databases. It is also a popular database. Memcached is a well-known distributed cache service, while redis (remoting dictionary Service), do you understand ?! Our cache server can use memcached or redis. memcached is pure memory, and the restart process will lose all the caches. redis has its own advantages in writing data to the hard disk. Redis is more suitable for storing computed data. Redis also supports a wide range of data types (list \ set \ hash \ string), which is more flexible than memcached. They all have. Net drivers and related example and unittest. You can download them from the official website.

The use of redis can be seen on behalf of the earthquake army prawns http://www.cnblogs.com/daizhj/archive/2011/02/21/1959511.html

With the development of hardware, increasing memory and more cache applications, we need to keep up with the pace to learn new ideas. I will not introduce nosql much here. I will search for it myself.

Other caches, such as the page cache (outputcache), are not described here. Thank you for reading. You are welcome to exchange ideas. The hope of communication is harmonious. It is best not to have comments with taste, bring a bad atmosphere for discussion, and raise the hope of making mistakes.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.