Talk about Internet backend infrastructure

Source: Internet
Author: User
Tags allkeys cas data structures postgresql socket volatile redis cluster google guava

For an Internet enterprise, back-end services are an essential component. In the case of business applications, the underlying service infrastructure will ensure that the business is stable, reliable, maintainable and highly available. Looking at the current situation of the entire Internet technology system and the company, it is considered essential or critical to the back-end basic technology/facility as shown in the figure below:

The backend infrastructure here mainly refers to the critical components/services that the application relies on for stable operation on-line. Developing or building more of the backend infrastructure, in general, is able to support the business for a long period of time. In addition, for a complete architecture, there are many application-unaware system infrastructure services, such as load balancing, automated deployment, system security, etc., that are not included in the scope of this article. API Gateway

In the development of mobile apps, the interfaces provided by the backend typically require support for the following features: Load Balancer API access control user authentication

In general, using Nginx for load balancing, and then doing API interface access control and user authentication in each business application, a more optimized way is to make the latter two public class libraries for all business calls. In general, however, these three attributes belong to the public requirements of the business, and the preferred way is to integrate them together as a service, both to dynamically modify the permissions control and authentication mechanisms, and to reduce the cost of each of the business integration mechanisms. This service is the API Gateway (http://blog.csdn.net/pzxwhc/article/details/49873623), you can choose your own implementation, you can also use open source software implementation, such as Kong. As shown in the following illustration:

However, one of the problems with this scenario is that because all API requests go through the gateway, it can easily become a performance bottleneck for the system. Therefore, you can take the scenario is: Remove the API Gateway, let the business application directly to the Unified Certification Center, at the basic framework level to ensure that each API call first through the Unified Certification Center certification, here can take the cache authentication results in a way to avoid excessive pressure on the unified Certification Center. business applications and back-end infrastructure frameworks

Business applications are divided into: online business applications and internal business applications. Online business applications: Directly facing the Internet users of the application, interface, and so on, the typical feature is: large request volume, high concurrency, high availability, low tolerance for failure. Internal business applications: This is an internal application for the company. For example, internal data management platform, advertising platform and so on. Compared with the online business application, its characteristics: high data confidentiality, low pressure, low concurrency, allow the occurrence of failures.

Business applications are based on back-end infrastructure development, and there are several frameworks that should be available for the Java backend: The MVC framework: from the popular Struts1 of 10 years ago, 2 to now the most admired Springmvc, Jersey, and the jfinal of the people, Ali's webx and so on, These frameworks, especially those that are popular later, are of a particular kind. The main factor of selection is to see if your team has a person who can do two development and customization of a framework. Many times, for these generic frameworks, you need to do some specific development to meet specific needs. For example, many team pass parameters use the underscore naming method (underline connection words), but Java is named with Lowcamel. For SPRINGMVC, can be specified by the alias of the annotation, but it is necessary for each parameter to specify alias is a bit inefficient, in addition to Modelattribute also does not support aliases, a better way is to unify the framework level of the parameters of camel-named conversion to achieve the purpose. IOC framework: The benefits of IOC are needless to say. The most popular spring in Java is naturally supported by the IOC since it was born. ORM Framework: MyBatis is currently the most popular ORM framework. In addition, the JdbcTemplate provided in Spring Orm is also very good. Of course, for the sub-database sub-table, master-slave separation of these requirements, generally need to implement their own ORM framework to support, such as Ali's Tddl, when the Sharding-jdbc (from the DataSource level to solve the sub-database sub-table, read and write separation problem, the application of transparent, 0 intrusion). In addition, in order to unify the service level to solve the sub-database sub-table, master-slave separation, primary and standby switching, caching, failure recovery and other issues, many companies have their own database middleware, such as Ali's Cobar, 360 Atlas, NetEase's DDB, as well as the official MySQL Proxy as well as open source Mycat, Kingshard and fee oneproxy. At present, the online has a certain scale use should be kingshard, of course, if not lack of money can also on Oneproxy. Caching framework: The cache framework mainly refers to the Redis, memcached these cache server Operations uniform encapsulation, generally use the spring redistemplate, you can also use Jedis to do their own encapsulation, support client-side distributed scheme, master-slave and so on. Java EE Application Performance detection framework: for Java EE Applications Online, a unified framework is required to integrate into every business to detect time-consuming, state, and so on for every request, method invocation, JDBC connection, Redis connection, and so on. JWEBAP is a performance detection tool that can be used, but since it has notThere are updates, and it is possible to suggest two development based on this project.

In general, the frameworks above can be a prototype of a back-end application.

For these frameworks, the most critical is to choose the most appropriate team technology composition, the ability to develop their own framework is better. In addition, there is a need to provide a back-end application template or build tool (such as Maven archetype) for team members to use, so that you can develop new applications, the rapid generation of prototype applications, without the need to do some of the framework of repetitive labor. cache, database, search engine, message queue

Cache, database, search engine, message queue all four are application-dependent back-end basic services, their performance directly affect the overall performance of the application, sometimes your code is better to write perhaps because these services cause the application performance can not be promoted up. Cache

As the cache five-minute rule says: If a data is accessed frequently, it should be in memory. The cache here is a very high-read-write storage scheme that can handle high concurrent access requests, often without the need for persistent guarantees. However, compared to other storage, the cache is generally memory-based and expensive and therefore cannot be abused.

The cache can be divided into: local cache and distributed cache. Local cache: Mainly refers to the in-memory caching mechanism. In Java, the implementation mechanism of the local cache is provided in Google guava. Of course you can also implement your own local caching scheme using Java's conncurrenthashmap. Distributed cache: Refers to a separate cache service. A few years ago it was memcached, but it was just a kv storage, with too few data structures supported. Redis is now the most popular, capable of supporting rich data structures, and event-driven, single-threaded, non-blocking IO can also handle high concurrency scenarios. In addition to the official Redis cluster, the cluster scheme is now popular with the codis of pea pods and the twemproxy of Twitter.

For the use of the cache, note the following: cache invalidation mechanism: When a key is set to a valid period, then the cache when the key is deleted. In general, there are several ways: The daemon periodically scans the key, finds the key that has failed, and then deletes the key to determine if the key is invalid, and if it fails, it is deleted and returned empty. Cache culling mechanism: How to remove a key from the cache when the cache memory reaches the limit. Redis provides the following data elimination policies: VOLATILE-LRU: Pick the least recently used data from a dataset that has been set to expire Volatile-ttl: Select the data that will expire from the set of expired data sets Volatile-random: Choose any data from the data set that has an expiration time ALLKEYS-LRU: Select the least recently used data from the data set to retire Allkeys-random: Choose data culling from data set arbitrarily No-enviction (expulsion): Prohibition of eviction data

For its specific implementation mechanism, you can refer to the "Redis Design and Implementation" book Cache update mechanism: Usually there are four ways: cache aside, Read through, write through, write behind caching, Specifically visible Chenhao This summary of the great God: Cache update routines. Cached service overload protection: Cached service overload refers to a burst of pressure on the backend service due to cache failure, which further creates an avalanche effect. This behavior is related to the cache update, and the strategy of how to update the cache when the cache is invalidated directly determines the protection mechanism of the service overload. It is usually divided into the client and the service side of the response plan. The former is based on the simple mode of timeout, the regular mode based on timeout, the simple mode based on refresh, the normal mode based on refresh, and the renewal mode based on refresh. The latter scenario is a very common flow control and service downgrade. This article can be seen in detail by the group's technical team: A case study of service overload in the cache application. Database

A database is a very common service component in back-end development. For the database selection, according to the characteristics of the business and data structure to decide.

From the storage media, the database can be divided into: Memory database: The data is mainly stored in memory, but also can take measures to persist the data to the hard disk. such as Redis, h2db memory mode. For this kind of database, because the memory cost is expensive, therefore must do the storage quantification analysis, the capacity estimate, prevents the insufficient memory to cause the service to be unavailable. Hard disk database: This kind of database that data is stored on the hard disk is the most common. MySQL, Oracle, Postgresql, HBASE, h2db, Sqllite, and so on are all hard disk databases. In addition, the SSDB is an SSD-based KV database that supports a rich data interface and is another option for Redis.

From the storage data type, data pattern, the database can be divided into: relational database: MySQL, Oracle, PostgreSQL are relational database, is the use of relational model (relational model refers to the two-dimensional tabular model, A relational database is a database of data organized by a two-dimensional table and its connection. Non-relational database: Non-relational database is relative relational database. stored in key-value pairs, and the structure is not fixed, each tuple can have a different field, each tuple can add some of their own key-value pairs as needed, so that the fixed structure will not be limited, you can reduce the cost of some time and space. However, it does not have the strict data schema of relational databases, and it is not suitable for complex queries and businesses that require strong transaction management. Non-relational database can also be divided into: KV database: The main data Storage database (Key,value) key value pair. With Redis, Rocksdb (LevelDB), Ssdb as the representative. Document database: The overall form is also a key-value pair, but the value can have a variety of data structures: arrays, key-value pairs, strings and so on. With MongoDB, Couchdb as the representative. Column database: Also known as sparse large database, is generally used to store huge amounts of data. This database is stored as a unit of data on the media, relative to the row database. Take HBase, Cassendra as the representative.

One important thing about database is the index of the database. One argument is that "mastering the index equals mastering the database." Not to judge whether this statement is true, but the index does relate to the read and write performance of the database. You need to have a good understanding of how the database is indexed to better use the various databases. Generally speaking, Mysql, Oracle, MongoDB These are used as the index of B-tree, is to take into account the characteristics of the traditional hard disk and read-write performance and scope to find the choice of requirements, and HBase used LSM is to improve the performance of writing to read performance sacrifices. Search Engine

The search engine is also a very important component in the back-end application, especially to the content class, the e-commerce class application, through the keyword, the keyword searches the content, the commodity is a very common user scene. More mature open source search engine has SOLR and Elasticsearch, many small and medium-sized Internet company search engine is based on these two open source system constructs. They are all based on lucence, the difference is mainly in the storage of Termindex, the support of distributed architecture and so on.

For the use of search engines, from the system familiarity, service building, function customization, it takes a long time. In this process, you need to be aware of the following issues: Search engine integration with the company's existing data systems. What are the existing persistent and searchable data carriers, how to make the search engine seamlessly integrate the original data carrier in the process of full and incremental indexing, in order to give full play to the real-time nature of the search engine, the level of scalability (performance and capacity and the number of machines in direct proportion) and other advantages. As with databases, the indexing mechanism for search engines also needs to be understood in depth.

More detailed for the Search engine engineering practice can refer to the likes of the engineer's article: A good Search engine practice (engineering)

In addition, the search engine can also be used in the multidimensional analysis of data, is Growingio, mixpanel in any dimension can query the data report function. Of course, Druid may be a better solution for multidimensional Analysis, and the official also has its comparison with es: http://druid.io/docs/latest/comparisons/druid-vs-elasticsearch.html. Message Queuing

The organizational structure of software, from the beginning of the component-oriented to the SOA, SaaS is a gradual evolution of the process. In today's era of micro-service prevalence, you are embarrassed to say that your system is a single system and not decoupled into a service. Of course, small systems do not have the necessity of splitting, but a complex system that splits into services to do a microservices architecture is really something that has to be done.

So the question is, how does the communication between the service be done? What protocol to use. Called by what means. Are all issues to be considered.

Aside from the protocol, the invocation between service can be divided into synchronous calls and asynchronous invocations. There is no need to say more about synchronous calls, so how do asynchronous calls proceed? A common way is to use Message Queuing, where the caller puts the request in the queue to return, and then waits for the service provider to go to the queue to fetch the request for processing, and then return the result to the caller (which can pass the callback).

Asynchronous invocation is a very common application scenario for message middleware. In addition, Message Queuing scenarios include the following: decoupling: A transaction that only cares about the core process, relies on other systems but is less important, has notifications, and does not need to wait for results. Final consistency: Refers to the state of the two systems are consistent, either success, or both fail, there can be a certain delay, as long as the final consistency can be achieved. Broadcast: This is the most basic feature of Message Queuing. Producers only need to post messages, without having to worry about which subscribers are consuming messages. Fault peaks and flow control: when the upstream and downstream systems have different processing power, they need to be like a message queue to separate the two systems as buffers.

At present, the main Message Queuing software is mainly the following: The simplest message queue in Activemq:java is the implementation of JMS, which does not specify the order, security and re-send of messages. RabbitMQ: Is the implementation of the AMQP protocol, the order of the message, security, re-hair, etc. are well supported. Better suited for message transmission in business scenarios that do not allow data loss and transactional requirements. Kafka: is a log-based message queue, the underlying dependent on the sequential read of the file, is append-only. For some massive log transport scenarios that are insensitive to data loss and emphasize performance. is a very hot technology in the big data field in recent years. ZeroMQ: is a network programming pattern library, the common Network request form (packet management, link management, publish subscription, etc.) modeled, modular, in short, the socket above, MQ. For MQ, network transport is only a part of it, more need to deal with is the message store, routing, broker Service discovery and lookup, transaction, consumption mode (ACK, re-investment, etc.), cluster services, etc. file Storage

Whether it is a business application, a dependent backend service, or a variety of other services, it is ultimately dependent on the underlying file storage. In general, file storage needs to meet the characteristics of: reliability, disaster tolerance, stability, that is, to ensure that the stored data is not easily lost, even if a failure can have a rollback scheme, but also to ensure high availability. At the bottom, you can use traditional raid as a solution, and on the next level, Hadoop's HDFs is the most common Distributed file storage solution, and of course NFS, Samba, shared file system also provides simple distributed storage features.

In addition, if file storage does become a bottleneck for the application or must improve the performance of the file storage to improve the performance of the entire system, then the most straightforward and simple way is to abandon the traditional mechanical hard disk, with SSD hard disk replacement. As many companies are now dealing with business performance issues, the ultimate key point is SSDs. It is also the most direct and effective way to exchange money for time and manpower costs. The ssdb described in the database section is a high-performance KV database that takes advantage of Ssdb's characteristics after LEVELDB encapsulation.

As for HDFs, if you want to use the above data, you need to go through Hadoop. Some of the technologies like the XX on yarn are solutions for running non-Hadoop technologies on HDFS (and of course, for the use of Mr). Unified Certification Center

Unified Certification Center, mainly for app users, internal users, apps and other authentication services, including user registration, login verification, token authentication internal information System user management and login authentication app management, including app secret generation, app information verification (such as authentication interface signature) and so on.

The need for a unified certification center is to be able to centrally manage the information that all of these apps will use, and to provide a unified certification service for all applications. Especially when there is a lot of business need to share user data, it is very necessary to build a unified certification center. In addition, the Unified Certification Center to build a mobile app single sign-on is also the thing (imitating the web mechanism to encrypt the authenticated information stored on the local disk for use by multiple apps). Single Sign-on system

At present, many large online web sites are single sign-on system, the popular point is that only need a user login, you can enter multiple business applications (permissions can be different), very convenient user operation. In the mobile internet company, the internal management, information systems also need a single point of login system. At present, the more mature, the most used single sign-on system should be Yale University open-source CAs, can be based on Https://github.com/apereo/cas/tree/master/cas-server-webapp to customize the development. In addition, the Chinese open-source Kisso This is also good. Basically, the principle of single sign-on is similar to the one shown in the following illustration:

Unified Configuration Center

In the Java back-end application, a common way to read and write configuration is to write the configuration file in Propeties, Yaml, Hcon file, modified only need to update the file re-deployment can be done without the purpose of the code level changes. The Unified Configuration Center is a unified service that is based on the unified management of all business or the underlying backend services, and has the following features: Ability to dynamically modify configuration files online and to take effect profiles can be used to differentiate environments (development, testing, production, etc.) easy to use: In Java, annotations can be , the way XML is configured to introduce the relevant configuration

Disconf is a scenario that can be used in a production environment, or it may develop its own configuration center (optionally zookeeper as the configuration store) based on its own needs. Service Governance Framework

For external API calls or client access to back-end APIs, you can use the HTTP protocol or restful (or, of course, the most primitive socket to invoke). But for internal service calls, it is generally called through the RPC mechanism. The current mainstream RPC protocol is: RMI Hessian Thrift Dubbo

These RPC protocols have pros and cons and need to make the best choice for your business needs.

This way, when your system services are increasing, the RPC call chain becomes more complex, and in many cases it is necessary to constantly update the document to maintain these call relationships. A framework for managing these services can save a lot of the tedious human work that comes with it.

The traditional ESB (Enterprise service Bus) is essentially a service governance scenario, but the role of an ESB as a proxy exists between the client and the server, and all requests need to go through the ESB, making it easy for the ESB to become a performance bottleneck. Therefore, a better design is based on the traditional ESB, as shown in the following figure:

As shown in the diagram, the invocation relationship exists only between the client and the server that provides the service, and avoids the performance bottleneck of the traditional ESB in the configuration hub. For this design, the ESB should support the following features: registration of service providers, registration of management Service consumers, versioning of management services, load balancing, traffic control, service degradation and other services such as fault tolerance, fuse, etc.

Ali Open source Dubbo to the above did a very good implementation, but also many companies are currently using the program. But for some reason, Dubbo is now no longer maintained and recommended for use when later maintenance

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.