My Architecture Experience series-backend architecture-architecture Layer

Source: Internet
Author: User
Tags call back website server

Return to index




    • Log set

The so-called log set refersProgramAll logs and exception information records are summarized together. When there is only one server, we do not have the largest problem of recording local files, however, if you record local logs in the Server Load balancer environment, the problem may occur. I don't know which machine to query when I want to view the website log. Do I have 100 machines remotely connected to each other? Therefore, it is necessary for large website systems to aggregate and save the data so that we can directly view and search for the data, it is also clear that you can know which machine has a problem with the business. As for whether such log data is written to RDBMS, nosql, or even search engines, it is necessary to avoid writing local text files. Otherwise, it is a disaster. Of course, writing a log is far from thinking so simple:

    1. To achieve better performance, do I need to write logs to the local memory queue and then regularly fl them to the database?
    2. Logs are difficult to search. Do you want to add some search fields? For example, modules?
    3. If the database is unavailable, do you need to write local logs before summarizing them?
    4. Whether the log is easy to identify the problem or whether it depends on how the log is recorded. If only errors are written in the log, the log is also recorded in white. You must know which module is faulty, if conditions are met, you can also write some parameter information and the current status.
    5. Unhandled exception information is unlikely to be recorded manually. Generally, many frameworks or servers provide an interface point to call back our ownCodeHere, we can collect these unprocessed exceptions and record them in a unified manner, and finally bring the user to a friendly error page. It is terrible not to record any unhandled exception. Imagine that the user has seen a colorful page, and the development still does not know anything. In this case, this problem will always exist, even if the user complained about the feedback, it is difficult to reproduce it.


    • Configuration set

The same is true for configuration sets and log sets, that is, unified management. In fact, any system will have more or less parameters that cannot be written into the program (for example, at least the database connection string and external service address). Generally, they will be written to the configuration file, in this case, there are several problems. The first is to modify the configuration of each server in the case of a multi-server load cluster, second, you can restart the service or website to make the modification take effect. Third, you cannot manage the configuration in a unified way, and you cannot know whether all websites have configured the same parameters. The solution is the same: Save the summary in a unified place, such as in the database, and then each website obtains the configuration value from the database, in implementation, there are simple and complex practices. For example, consider the following:

    1. Is the parameter value directly saved as a strong type, or is it used to save the string? Even values support objects and arrays, rather than simple types.
    2. The value of a parameter cannot be retrieved from the database every time it is obtained. How can this problem be solved?
    3. Can the program modify the value? In other words, the program only reads data and does not allow modification. The parameter value can only be modified through the database or the background.
    4. Whether to set different values based on different deployment environments, user languages, and Server IP addresses.

In any case, at least one of the simplest Configuration Services reads the parameter values from the database, even if the cache is always effective only after the service is restarted, it will also be much better than reading from the local configuration file.


    • Cache

The caching architecture is too often used. Almost everyone knows the caching design and performance optimization methods. For a website system, there are too many places for caching, which can really make good use of caching, use caching in a reasonable place, and monitor the cache hit rate, it is not so easy to find a way to increase the cache hit rate. In general, these areas can be used for caching, from top to bottom:

    1. Browser and CDN cache: These two types of full-page cache can minimize the chance of requests hitting the website server, which improves the load capacity of the website server, of course, in general, static resources are easier to use this cache, and dynamic resources can also be used, except that appropriate keys should be made according to the access conditions, it is difficult to directly cache pages that are independent from each user.
    2. Reverse Proxy Cache: the reverse proxy can cache the whole page or fragment page as the server proxy, which can improve the chance of directly sending requests to the website server.
    3. Data Cache: if the request is sent to the website server, we do not need to retrieve all the data from the database. We can try to save some data in the distributed cache. As long as the key is reasonable and the request is regular, it can ensure a relatively high hit rate, thus reducing the pressure on the database and reducing the pressure on the website server.
    4. Memory cache of large data blocks: some large data blocks cannot be stored in the distributed cache, when the website is started, you can add all the unaltered large data blocks to the memory. This increases the data access efficiency and computing efficiency.

For more information about caching, refer to one of my previous shares:


    • Distributed cache

The so-called distributed cache means that the cached data is distributed across multiple nodes. The advantage is that the server resources can be used as much as possible. For example, a server can have 1 GB of memory free, 50 servers are 50 GB. If you do not need these 50 servers, that is, the memory is empty. (generally, the memory used by the website server is relatively fixed and the main business is not changing, in addition, distributed caches such as memcached have very low CPU usage. memcached can be deployed on a large-memory web server or application server to implement the imperceptible distributed cache ); the second advantage is that it can reduce the impact of single point of failure (spof). Generally, distributed cache has a consistent hashAlgorithmEven if there is a single point of failure, only a small portion of the cache data will not hit, and the loss is not too great. When using distributed cache, consider the following:

    1. Cache is the cache, and data can be discharged or lost. If it is used as a file system, it will be wrong.
    2. The data in the distributed cache is accessed through the network, and the efficiency of data transmission is incomparable to that in the local memory. In addition, the serialization and deserialization of data must take into account the performance overhead.
    3. The cache key generation policy is critical. If there are too many key generation parameters, the hit rate may be almost 0, therefore, the hit rate of distributed cache needs to be monitored.


    • Queue

Queue is also an essential tool for implementing a high-performance architecture. By queuing long-running tasks in the queue, you can limit the maximum number of projects in the queue, to achieve a high-pressure capability, and the queue's back-end points can also have a relatively stable pressure. In the end, the queue is just a container. If the front-end pressure is always greater than the backend processing capability, the queue will always burst, so the queue is not omnipotent. It is often useless for the website system, even if there is a queue for the back-end services, if the front-end pages cannot carry high pressure, or even the Verification Code cannot be flushed out, the architecture is like this. Any point in the system will drag down the architecture of the entire site. The architecture optimization is targeted at the weakest place, not the strongest place. In terms of form, there are two types of Queues:

    1. Producer and consumer: the data produced by the producer can only be consumed by one consumer, that is, the task can only be executed once. Data in this form often needs to be persistent.
    2. Publishing and subscription: any event allows multiple publishers and subscribers to subscribe to what they are interested in, as long as the publisher publishes content of interest to the subscriber, it will spread to all subscriber. In general, this type of data can be lost without persistence.


    • Pool Technology

The so-called database connection pool and thread pool are all applications of the pool technology. In the end, the pool technology caches the objects with high creation overhead in the pool to avoid repeated creation and destruction. The objects are used out from the pool when they are used, after resetting, you can use it again in the pool. For example, the database connection pool avoids frequent creation of TCP connections with high costs, and the thread pool avoids frequent creation of threads with high costs. Although the principle is like this, there are also some algorithms to consider in the pool:

    1. How can we create a collection policy for rich objects?
    2. How to deal with object corruption?

Generally, you can refer to the pool implementation method on the Internet to implement a general pool, so that various objects can be managed.


    • Distributed File System

Distributed File systems are not required by all websites. Generally, small websites can store images uploaded by users directly on the Web server. This is acceptable, but problems may occur when the number of images is large, first, what should I do if one server cannot be saved? How can I know which image is on which server? Next, how to separate read requests and synchronize images to other servers. A Distributed File System is used to solve this problem. By using a group of servers as a file system, our file resources can be stored on multiple servers separately and certain data backups are ensured. When the website size is not very large, it can actually be saved on a single server, and then synchronized to another server using the file synchronization tool. The reverse proxy can be used before, if the data volume is large, distributed file systems are required. In principle, the distributed file system is not very complex, but stability and performance should also be considered during the selection. Some people store the database in the database, although this can achieve a single point, although this can achieve backup, but this is obviously not very reasonable, will greatly increase the pressure on the database.


    • Distributed Search Engine

If you need intra-site search, you need to go to the search engine. Currently, there are many open-source search engines, but many of them are Lucene packages. You need to select the types based on your needs during the selection. The so-called distributed mode means that if single-point indexing and query cannot meet the performance and capacity requirements, they need to be distributed to multiple points. In principle, the search engine is mainly a Word Segmentation and inverted index. However, the search engine still has many algorithms on details such as content sorting, unless necessary, it is not recommended to implement the search engine on its own, it can be directly encapsulated and improved for open-source components. In fact, the search engine is far more simple than full-text search. You can even give search recommendations based on the user's search content, and perform self-improved indexing based on the user's search content, you can also perform a lot of user behavior analysis based on the user's search results to improve the website's products. If the site has its own search module, there are actually many things you can do.


    • Nosql

Nosql is a non-relational database. nosql is not used to replace relational databases. The reason why nosql is so popular is its performance. In essence, a program is a set of Code. For the same hardware configuration, the performance of the program depends on how much code is required to implement the same operation on the CPU, the number of I/O operations to be performed on a disk must be repeated. A lot of computing and I/O operations are often used for general purposes, and the performance of customized items is usually high. We said that RDBMS performance is not high, but we should also see that RDBMS should ensure data integrity and persistence and implement general functions, it is unreasonable to compare its performance with Io to memory and then regularly fl disk components. Therefore, we need to use proper nosql based on the business, for capital-related businesses that require transactions, nosql is not suitable for businesses that allow latency to allow data loss and require large volumes of traffic. All nosql products are customized by a large company for their own needs. Only such customized products can achieve high performance. Therefore, there will be so many nosql products on the market, therefore, when we choose, we also need to check whether the nosql Application Scenario meets our business needs. In addition, nosql approaching is a new thing, and its user base cannot be as many as MySQL or Oracle, so we cannot expect its stable performance to reach very high standards, and due to its customization characteristics, nosql, when using standard SQL, you may not always use standard SQL. This is also a consideration for learning costs. No matter how you use it when you need it, it is impossible for some applications to achieve such high performance by RDBMS alone. If you don't use nosql, it is a dead end. Even if it is unstable, it may still work.


    • MongoDB with nosql

MongoDB is a high-performance document-type database. The reason why it is so popular is not only because of its high performance, but also its comprehensive functions. Compared with other nosql databases, it can implement more than 90% of SQL operations, there are also a wide range of high-availability configuration methods. In summary, MongoDB is a nosql with several times the performance of traditional RDBMS, which is closest to the RDBMS function. MongoDB is suitable for applications with high performance requirements, especially those with high data writing concurrency requirements, such as storing business logs or system logs. It is worth mentioning that:

    1. MongoDB is not omnipotent either. If you do not partition large amounts of data, MongoDB cannot save you, and do not expect the automatic sharding function of MongoDB to be good, automatic, after all, is not as accurate as manual. If you can think of how to perform data sharding, it is recommended to perform it manually.
    2. MongoDB has good performance, but it also has limits. As long as you have a relatively large memory, hot data can be stored in the memory, reading performance is not too bad, but after the amount of data reaches a certain limit, if your index data cannot be stored in the memory, the performance will be poor. In fact, we can easily think of why, for example, we need to index the dictionary. As for the true content of the dictionary, it is not a problem. If an encyclopedia is composed of 100 books, as long as the index is found in the hand, it is not a problem to retrieve it again. If this index is also composed of 100 books, you cannot query the index at a time. If you still need to read different books, the performance will be very poor.


    • Nosql redis

Redis is a nosql component rather than a cache component. In most cases, nosql can be used as a server for computing. It stores complex types of data, memcached, which has some minor functions such as queues and pipelines, is good. My personal experience in using redis is:

    1. Unlike memcached, redis can store multiple forms of data, not just a string. This is a bright spot, meaning that we can directly perform a computing on the server for a large amount of data, then the server directly returns the computing result, instead of retrieving all the data from the cache and performing some computation on the client and then saving it back to the cache (the client here refers to the client that uses nosql, not a browser). That is to say, to make good use of redis, We need to write some customized code and embed our real business logic into redis, if you only store keyValue, the performance is not necessarily higher than that of memcached.
    2. Generally, we do not recommend that you rely too much on the redis disk VM. You can store 32 GB of data with 16 GB of memory, if 16 GB of memory is used to save 1 TB of data with redis, the error redis is used. In principle, nosql products are not very complex. Before using nosql, you 'd better familiarize yourself with its principles so that we can make better use of nosql products. In short, my opinion is still like this. Due to the huge difference in memory and disk performance, if we can do most of the work in memory, this performance will be better, if you want to roll back and forth on the memory and disk, this performance will not be good.


    • Hbase of nosql

Hbase and hadoop are suitable for the storage and computing of massive data volumes. They are actually a real distributed concept. Data can no longer be stored on a single point and need to be distributed to many machines, the calculation results are also separately calculated and then aggregated. I don't want to introduce this set of things easily:

    1. Unless it is more than million data records and the data format is relatively simple, you should consider whether it is suitable for introduction, hbase learning curve is not low, and it is bestCommunityI have been familiar with it for a while. Otherwise, I cannot figure out which version of hadoop is used together, and the version is useless. Either there are a lot of bugs that are hard to solve or the basic connectivity cannot be achieved.
    2. To achieve good performance, we should have at least 20 servers and try hbase again. Even if we have one or two servers, we have not yet reached the distributed requirement. Think about how to improve efficiency through parallelism. First, you have so many resources to distribute data in parallel, and then compute and summarize the data at the same time to save time, how can we save time if no data is split.



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.