("Architecture Design: Inter-system Communication (31)--Other message middleware and scenario applications (1)")
5-3. Solution Two: The problem of improving the semi-invasive scheme 5-3-1 and solving method one
Scenario one is not the best semi-intrusive solution, but it is easy to understand the architect's design intent: at least business-level isolation. One of the greatest advantages of the scheme is that the log acquisition logic and the business processing logic are isolated from each other, and when the business logic changes, the log acquisition logic is not affected.
But the problems that we can enumerate for the program can be far more than the merits of programme one:
You need to provide a separate client API package for different development languages. The example we described above uses the Java language, so the event/log capture system will provide the Java language Client API package. If you need a business system that integrates an event/log capture system that is developed by various business teams within your company, then this problem is not a big problem-at least you can know which client you prefer to develop, and you know you need to develop several limited languages, but if you want to publish this collection system as shareware, or listed for sale, then this problem will limit the rapid development of your products.
Because the client code of the event/log capture system needs to be encoded in the business system. Therefore, the upgrade of the API package is also a problem: A major API Package upgrade may cause previous versions of incompatibility issues, causing the business system to re-change the acquisition system call code. Similarly, if all business systems are within your company, then this is not a big problem. But remember, your goal is to make your system a part of the product.
Although in the business system, the business logic and the log acquisition logic can be isolated through good code structure, but the processing process of log collection is integrated in the business system, which will affect the processing of the business system more or less. For example, when the message producer slows down, it may affect the processing efficiency of the business system, and when the messages to be sent are piled up on the business system side, these messages consume the system memory that should be used by the business data.
It seems that we need another semi-intrusive solution to solve these problems.
The idea of 5-3-2 and solving method two
In the second solution, we only ask the business system to load a piece of JavaScript code on the page to complete the event/log capture of the business system. Event/log data is transmitted across domains to the event/log Acquisition system via the HTTP protocol.
The advantage of the HTTP protocol is that it is an industry-wide protocol that is used by senior engineers who have just graduated from school to 20 years of development experience. Second, this protocol is not related to the programming language, and your business system is developed either in the language of the JVM virtual machine family or in PHP, or in development using NODEJS or in other development languages. As long as you need to render the action page on the browser, the HTTP protocol is involved.
The way in which a business system's page-integrated JavaScript script implements a collection of access logs can actually have some limitations: if the event you need to collect is not for page access (for example, how much of the order fee is settled by the acquisition Business Server in the set timing executor), Then this scenario two approach is not very applicable. Fortunately, according to the statistical requirements mentioned above, we need to count exactly the access to the goods order and the price movements of the commodity.
5-3-2-1 Load Layer Design
The load layer design of Scenario II and programme two is completely different. In scenario one, because the producer side of the message queue is integrated into the business system, its load layer is completely completed by the partition (partition) in Kafka brokers. However, in scenario two, because the business system sends messages to the acquisition system through the HTTP protocol, the load layer of the acquisition system needs to be adjusted accordingly:
is a typical HTTP protocol-based load balancing scheme. In my other blog, "Architecture design: Load balancer layer Design (7)--lvs + keepalived + nginx Installation and Configuration" in this program is described in detail, here will not repeat it. If you also feel that the load layer is too weak, you can also add technology such as DNS polling on top of it.
5-3-2-2 Why should I continue to use MQ?
In the second solution, we used Apache Kafka MQ technology inside the event/log acquisition system to send and receive messages inside the acquisition system. in the view of some readers, the message has been transferred from the external business system (more specifically from the browser side of the business System user) to the inside of the acquisition system via the HTTP protocol, so only the storage of these raw logs (or the timely analysis system) is required within the acquisition system. Why do we need a Message Queuing mechanism inside the acquisition system?
Consider this situation when the various business systems that integrate the acquisition system suddenly appear to be accessing the flood peaks, generating large amounts of log data. If there is no caching mechanism inside the acquisition system, it will allow the acquisition system to program the processing bottleneck in the entire architecture. You know, no matter what kind of persistent storage scheme you use within your acquisition system, you will consume more processing time. So in scenario two, the MQ queue is used internally by the acquisition system for the purpose of caching messages.
Of course you can also remove MQ and replace it with other scenarios to cache log messages that are too late to process, but be sure to have such a caching mechanism. Because processing a single log data, the acquisition system generally consumes more time than the business system, after all, the business system is responsible for sending log data.
Then combining the adjustment of load balance layer with the existing Kafka message queue scheme, we can draw the complete system architecture diagram in scenario two:
5-3-2-3 How to solve the cross-domain problem
In this scenario, the business system sends an HTTP request to the collection system through a page that is rendered on the browser, with the integrated JavaScript script. However, the business and acquisition systems are likely to use different domain names (in reality, as an architect of the event/log acquisition system, you cannot control the domain name of the business system).
As shown in the case of a cross-domain scenario, the page of a business system cannot send an HTTP request to a collection system working in another domain through the browser-side XMLHttpRequest object. to solve this problem, we need to find a way to complete the HTTP cross-domain invocation on the browser side.
Fortunately, the reliable program has provided us with a lot of past experience to solve this problem: proxy, Flash, iframe, JSONP, cors and so on. Here we introduce two available solutions based on the technical requirements of the acquisition system: IFRAME and Cors.
Cors is the abbreviation for Cross-origin Resource sharing (cross-origin resource sharing). This cross-domain technology is primarily supported by browsers. When the browser checks for cross-domain calls to the XMLHttpRequest object, Cors first allows this call and checks the return information of the HTTP protocol in response to the other. If the Access-control-allow-origin attribute description information is present in the header of the returned information, and the calling domain is allowed, the call is considered successful, otherwise the browser will prompt similar to: "No" Access-control-allow-origin ' header is present on the requested resource. Origin XXXXX is therefore not allowed access. " The error.
Because a cross-domain invocation of Cors mode requires browser support, there is a support issue with a browser version. The following list, excerpted from the Cors official website (http://enable-cors.org/), lists the various browser versions that support cors:
The red section represents a browser version that does not support cors, a yellow tile represents a partially supported browser version of Cors, and a green tile represents a fully supported browser version of Cors. To use Cors support is also very simple, just need to write the "Access-control-allow-origin" property on the server side of the target domain's HTTP protocol header, as shown in the following Java code:
- Allow any domain to invoke the domain service
......response.setHeader("Access-Control-Allow-Origin", "*");......
- Allow xxxxx domain to invoke the domain service
......response.setHeader("Access-Control-Allow-Origin", "XXXXX");......
Note that if you use Cors mode and there is an HTTP proxy service like Nginx before the service, you need to add support for Access-control-allow-origin in Nginx configuration , similar to the following:
http { ...... add_header Access-Control-Allow-Origin*; add_header Access-Control-Allow-Headers X-Requested-With; add_header Access-Control-Allow-Methods GET,POST,OPTIONS; ......}
Using the IFRAME tag is essentially avoiding using the XMLHttpRequest object on the browser side. The IFRAME tag does not have an unsupported problem on each version of the browser, only some browsers have some differences in the property support for the IFRAME tag. The following is an example of invoking a service on another domain using an IFRAME tag:
......<iframe style="display: none" src="http://192.168.1.100:9090/templateSSHProject/showSomething"></iframe>......
The display property is useful to ensure that the IFRAME label does not appear on the final page. There are obvious drawbacks to using IFRAME tags for cross-domain calls: It destroys the intended page layout of the front-end developer, and if the IFRAME tag is not hidden, it also destroys the developer's pre-judgment in writing JavaScript scripts.
because there are some problems with both of these approaches, there are two workarounds that can be mixed in the actual operation . First determine the current browser version information, if the browser version supports Cors mode, it is preferred this way (after all, this way does not change the page's existing HTML label layout), if the browser version does not support Cors mode, use the IFRAME label mode. The Access-control-allow-origin property is always added to the header on the invocation interface of the HTTP provided by the log server.
5-4. Solution Two Coding Example
Since many of the technical points in solution two are the same as the solution, for example, using Apache Kafka MQ, Spring is used to support it, and neither will affect the message consumer's use of "appropriate storage scenarios" for storage. So in this section of the code to introduce scenario two, we will only give those different, can reflect the work characteristics of the program two code, the other parts of the code will not repeat.
5-4-1, mixed with cors and IFRAME
To facilitate the integration of third-party business systems, the JavaScript snippet provided by the acquisition system should be as simple as possible, preferably with a JavaScript file referenced by the business system. The following code-side so:
// 业务系统在页面上通过以下形式引用采集系统提供的脚本文件......<script type="text/javascript" src="http://www.logsservice.com/analysis.js?34ab834ea98ee838ac76ed3986347546"></script>......
The above code fragment "Www.logsservice.com" is the domain name of the acquisition system, Analysis.js is provided to each business system embedded JS file, "34ab834ea98ee838ac76ed3986347546" is a collection system by the "Registration management platform" generated by the third-party business system verification string, only the verification string binding domain name and the current embedded JS file page is located in the same domain name, the acquisition system is considered valid .
The following is a sample script code for the "analysis.js" file:
var_supportchromeversion = [" A"," the","a"," the","Wuyi"," the"];/////First, regardless of the method used to send HTTP data to the acquisition system, you need to get the page to refer to this JS file pass the check string encrypted//This encrypted parameter contains considerable amount of information//Log service through this encrypted authentication user rights, business system domain name matching and other informationvarencrypted =NULL;varScripts = document.getElementsByTagName ("Script"); for(varindex =0; Index < scripts.length; index++) {varscript = Scripts[index];//If the condition is established, the reference location of this JS file on the page is found, and the encrypted parameter record if(Script.src.indexOf ("Js/analysis.js") >=0&& Script.src.indexOf ("?") >=0) {encrypted = Script.src.split ('? ')[1]; }}//If the encrypted message is not passed, it is considered to be the wrong JS reference. Do not processif(Encrypted! =NULL&& Encrypted! ="") {//Determines whether the current browser supports cors mode varBowersinfos = GetVersion ();varSupportcors =false;//In this example, we only judge the version information of Chrome browser //Other browser versions are similar in judging principle if(Bowersinfos.browser = ="Chrome") {varCurrentversionarray = BowersInfos.ver.split (".");varCurrentVersion = currentversionarray[0];if(Contains (_supportchromeversion, CurrentVersion)) {supportcors =true; } }// ================= //This can be used to determine the support of other browsers // ================= //=========================== if supported, use XMLHttpRequest to initiate requests directly //timestamp is to prevent HTTP 304 vartimestamp =New Date(). GetTime ();if(supportcors) {varreq = Createxmlhttprequest ();varURL ="http://127.0.0.1:9090/templateSSHProject/analysisSomething?encrypted="+ Encrypted +"&"+ timestamp; Req.open ("GET"Urltrue); Req.send (NULL); }//=========================== if not supported, use the IFrame method to make the request Else{varContext ="<iframe style=\" display:none\ "src=\" http://127.0.0.1:9090/templateSSHProject/analysisSomething?encrypted= "+ Encrypted +"&"+ timestamp +"\" ></iframe> "; document.write (context); }}//How to get the browser version//This method is used for testing. The included browser is not complete function getversion() { varSys = {};varUA = Navigator.userAgent.toLowerCase ();varRe =/(msie|firefox|chrome|opera|version). *? ([\d.] +)/;varm = Ua.match (re); Sys.browser = m[1].replace (/version/,"Safari"); Sys.ver = m[2];returnSys;}//Get XMLHttpRequest Object function createxmlhttprequest() { if(Window. ActiveXObject) {return NewActiveXObject ("Microsoft.XMLHTTP"); }Else if(Window. XMLHttpRequest) {return NewXMLHttpRequest (); } }//For collection element comparison function contains(collection, obj) { varindex = collection.length; while(index--) {if(Collection[index] = = = obj) {return true; } }return false;}
According to the code snippet above, if the browser does not support Cors mode then the script code will output an IFRAME label on the page and complete the cross-domain call through the IFRAME tag (this tag is not visible on the page, of course). The resulting IFRAME tag is as follows:
If the browser supports cors mode, the script code creates the XMLHttpRequest object and completes the cross-domain call through the XMLHttpRequest object (ie uses activexobject).
Note: In order to facilitate debugging, the above example code is used in the author's local debugging URL, instead of "www.logsservice.com". Readers can replace them with their own URLs.
5-4-2, acquisition system producer code
We're done with the JavaScript script file that the acquisition system provides for the business system, and we'll talk about the HTTP interface layer code for the capture system:
PackageTemplatesshproject.controller;ImportJava.io.PrintWriter;ImportJavax.servlet.http.HttpServletRequest;ImportJavax.servlet.http.HttpServletResponse;Importorg.springframework.beans.factory.annotation.Autowired;ImportOrg.springframework.stereotype.Controller;Importorg.springframework.web.bind.annotation.RequestMapping;ImportTest.interrupter.producer.ProducerService;/** * HTTP control layer built by Spring MVC component * @author Yinwenjie * *@Controller@RequestMapping("/") Public class analysiscontroller { /** * Here's the message. Producer Objects * It works in the same way as in scenario one * * @Autowired PrivateProducerservice Producerservice;/** * Do some analysis action * @param request * @param Response * * @RequestMapping("/analysissomething") Public void analysissomething(HttpServletRequest request, httpservletresponse response) {String param = request.getparameter ("Encrypted");//Send messages using Kafka producer side This. Producerservice.sendemessage (param); System.out.println ("public void Sendemessage (String message):"+ param);//output The corresponding information, the most critical is the header settings //There's no body information, it doesn't matterResponse.setheader ("Access-control-allow-origin","*"); Response.setcharacterencoding ("Utf-8"); Response.setcontenttype ("text/html; Charset=utf-8 "); PrintWriter out =NULL;Try{out = Response.getwriter (); }Catch(Exception e) {Throw NewRuntimeException (e); } out.print (""); }}
One of the keys to maintaining high throughput in the acquisition system is that the Apache Kafka consumer object Producerservice used in the Web control layer can send messages quickly. You can follow the setup for Apache Kafka message producers in scenario one.
5-5 Other design considerations in scenario Two
- Ensure log Collection permissions
According to the solution two design ideas, the completion of the design of the Log/event collection system, is available as a product to the public. since the open system involves the rights of individual users : At a minimum, it should be ensured that the business system of User A integrated acquisition system is a usable business system, and that each user can only see statistics of his own business system on the acquisition platform.
The acquisition system can provide business system registration function, all the business systems to use the acquisition system will first need to register through the registration page. After successful registration, the acquisition system will generate a unique check code for this business system. in the Log collection, only the code corresponding to the business system and the business system registered domain name exactly the same, the acquisition system will consider this data valid .
Event/Log Acquisition system architecture design is another key issue, is to ensure that the event/acquisition system can be in multiple business systems at the same time peak traffic, can also be normal log statistics, and does not affect the normal work of the various business systems-- You may not require that each business day PV using the acquisition system be more than the maximum threshold value of xxxxx.
In addition to the above mentioned use of a high-throughput MQ action within the acquisition system, in the peak traffic flow when the message consumer has not yet had time to process the log message (this is also the reason for the use of MQ components in Scenario two). You can also make further articles on the Kafka partition, such as creating a separate topic for each business system, and setting different partition sizes depending on the service plan purchased by the user. You also need to arrange 40% of idle resources for the entire log acquisition system, so that the performance of each physical node can be quickly upgraded or added to the new service node-The cloud server is a good choice.
It is important to note that thedisadvantages of topic in Apache Kafka that cannot be changed once created will limit its potential for horizontal expansion . So if you really want to design a very large, multi-high data traffic system for a fully open acquisition system, whether or not the use of Apache Kafka as the core message delivery means need to be carefully considered.
In fact, if you have read all the articles in the author's three columns, some of the most critical issues in distributed systems have been introduced (in addition to data consistency issues and data recovery issues): Service node discovery methods, service coordination and election rules, network IO models, caching, and asynchronous processing. So why not write your own MQ that meets the technical requirements ? In addition, Ali's Open source project ROCKETMQ is also a good choice oh.
- MQ Design with Scenario one
As compared to solution one, the message consumer code in solution two, including the "appropriate storage scheme" in which it is invoked, does not require any change. the HTTP calling interface provided by the log system to the business system is designed to ensure the compatibility of the various business systems, and the continuation of MQ within the log system is to ensure that the logging system does not become a bottleneck for any external system calls. in this way, the design problems left over from solution One are further optimized in solution two.
5-6 Baidu Webmaster Statistics tool
Similar scenario two, in the browse page embedded JavaScript code for access log acquisition of one of the typical applications is Baidu launched "Baidu Webmaster Statistics Tool" (http://tongji.baidu.com/). To use this statistic product, first you need to register a user information and inform the statistics tool that you need to count the working domain name of the business system.
Next, the Baidu Statistics tool will generate a piece of JavaScript code for you, with verification information. As shown in the following:
In fact, if you read the generated code carefully, you'll find that the main thing this code does is "generate another JavaScript reference tag from this code." Finally, you just need to include this JavaScript code on your business System page.
is the "Baidu Webmaster Tools" Statistical results example.
Architecture Design: Inter-system Communication (32)--Other message middleware and scenario applications (2)