[Original] Write cache system by yourself (tmcache)

Source: Internet
Author: User
Tags key string

 

Write the cache system by yourself-tmcache

 

Author: heiyeluren
Time:
Blog:
Http://blog.csdn.net/heiyeshuwu

 

 

 

[Principles]

Tmcache is generally a cache server similar to memcache, and should have a general understanding of its execution process. For ease of understanding, I will briefly describe it.

Request sending process:
Client (PHP/Java/C ++) --> Cache Server --> memory (shared memory)

Data receiving process:
Memory (shared memory) --> Cache Server --> Client

The general description is: the client (any client language or tool that can access the socket) accesses the specified port of the cache server to store, read, and delete data, after receiving commands, the Cache Server performs memory operations and then writes back the results to the client. Therefore, the cache server contains these modules: socket communication, protocol parsing, data storage, and Data Validity Period control.

The following code describes these modules. The following code is taken from tmcache-tiema (tiny & Mini) memory cache. tmcache currently supports the following features:

* Based Memory Data Storage
* Compatible memcached communication protocol
* Few operation interface, the use of simple
* Support custom port, max_clients, memory use Control

Download tmcache (for Windows ):

Windows: http://heiyeluren.googlecode.com/files/tmcache-1.0.0_alpha-win32.zip
Unix/Linux: http://heiyeluren.googlecode.com/files/tmcache-1.0.0_alpha.tar.gz

 

 

[System implementation]

 

I. communication protocol processing module

 

This mainly includes listening for socket processing, tmcache mainly relies on the init_server_listen () function for listening, and concurrent connection acceptance is an important part of the program, you can select multiple Io modes in select/poll mode, epoll/kqueue event mode, and thread mode. for compatibility and simplicity, tmcache, the thread method is used.

Core Thread processing code:

  1. Void tm_thread (INT serversock, unsigned int max_client ){
  2. Int clientsock, * ARG;
  3. Struct sockaddr_in client_addr;
  4. Char currtime [32];
  5. Unsigned clientlen;
  6. Pthread_attr_t thread_attr;
  7. Void * thread_result;
  8. /* Setting pthread attribute */
  9. Pthread_attr_init (& thread_attr );
  10. Pthread_attr_setdetachstate (& thread_attr, pthread_create_detached );
  11. /* Run until canceled */
  12. While (1 ){
  13. Pthread_t thread;
  14. Unsigned int clientlen = sizeof (client_addr );
  15. Memset (currtime, 0, sizeof (currtime ));
  16. Getdate (currtime );
  17. /* Wait for client connection */
  18. If (clientsock = accept (serversock, (struct sockaddr *) & client_addr, & clientlen) <0 ){
  19. Die ("failed to accept client connection ");
  20. }
  21. /* Use thread process new connection */
  22. Arg = & clientsock;
  23. If (pthread_create (thread, & thread_attr, tm_thread_callback, (void *) Arg )! = 0 ){
  24. Die ("create new thread failed ");
  25. }
  26. }
  27. /* Destory pthread attribute */
  28. (Void) pthread_attr_destroy (& thread_attr );
  29. }

Protocol processing is very core, mainly including the set/Add/replace/append of data storage, get/gets of data extraction, delete/remove of data, various operations for obtaining commands such as stats/STAT. The main operation processing function is proc_request (). It is responsible for protocol analysis and calls related interfaces for processing.

 

 

Ii. Data Processing Module

 

This is the core of data storage and processing. It mainly uses hash tables to store data, uses queues to record the data storage sequence, and processes the data structure when the memory is insufficient, we also use probability processing algorithms to clear expired data from time to time.

 

1. Hash Table Data Storage

Data is stored in a hash table. The storage speed is simple and fast, and the algorithm efficiency is O (1). It is very suitable for the storage of key => value, the core hash algorithm is the classic times33 algorithm:

 

  1. Unsigned tm_hash (const char * STR, unsigned table_size ){
  2. Unsigned long hash = 5381;
  3. Int C;
  4. While (C = * STR ++) hash = (hash <5) + hash) + C;/* hash * 33 + C */
  5. Hash = table_size> 0? Hash % table_size: Hash;
  6. Return hash;
  7. }

At the same time, if there is a data node conflict, the zipper method is used to solve the problem. The next program is used to store the value of the next same hash ing result for the data structure of a Hash Storage node:

 

  1. /* Hash data item struct */
  2. Struct tm_hash_entry_t {
  3. Char * key;/* data key string */
  4. Char * data;/* data value string */
  5. Size_t length;/* Data Length */
  6. Unsigned created;/* Data create time (UNIX timestamp )*/
  7. Unsigned expired;/* Data expire time (UNIX timestamp )*/
  8. Struct tm_hash_entry_t * Next;/* key conflict link next data node pointer */
  9. };

2. Data invalidation Processing

 

At present, there are two methods for processing timeliness. One is that when you access a data node and find that the expired field of the data has exceeded the current time, the node will be removed. Another method is to remove expired algorithms from time to time based on the probability calculation algorithm when performing data operations. Let's look at the implementation of the probability algorithm:

  1. Status get_gc_probability (unsigned probaility, unsigned divisor ){
  2. Int N;
  3. Struct timeval TV;
  4. Gettimeofday (& TV, (struct timezone *) null );
  5. Srand (INT) (TV. TV _usec + TV. TV _sec ));
  6. N = 1 + (INT) (float) divisor * rand ()/(rand_max + 1.0 ));
  7. Return (n <= probaility? True: false );
  8. }

The probability percentage is determined by probaility and divisor. The default value is 1/100, which means that one of the one hundred operations may clear expired data, this reduces the pressure on program operations.

3. Memory-used operations

If 16 MB memory is set when tmcache is started, but the memory is not enough at the end, the new data can be stored only by clearing the cache data inserted earlier, the queue is mainly used here, because the queue uses the first in first out principle, the Code:

 

  1. /* Current memory use size exceed max_mem_size, remove last node from queue, remove key from hash table */
  2. If (get_mem_used () + length)> g_max_mem_size ){
  3. Struct tm_queue_node_t * qnode;
  4. While (get_mem_used () + length)> g_max_mem_size ){
  5. Qnode = tm_qremove (g_qlist );
  6. Remove_data (qnode-> key );
  7. }
  8. }

The disadvantage of doing so is that it is clear that the data is not in a valid period and has been deleted. Therefore, cache tools cannot be treated as persistent data in the same way, you must ensure that the corresponding storage operations are performed each time you query the cache, because the data is not guaranteed to be in the memory.

 

 

[Conclusion]

 

It can be basically confirmed that tmcache is a very simple cache system, which is far behind memcache. More importantly, tmcache is just a learning work and also provides some simple guiding ideas, we hope to provide a simple reference for developing a complex and stable cache system. Therefore, tmcache is not a stable and reliable cache system and is not suitable for production environments, it is more suitable for learning reference.

For other content not described above, we recommend that you read the tmcache code to learn more.

 

:Http://code.google.com/p/heiyeluren/downloads

 

 

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.