At work, I am responsible for a proxy (proxy) module in the group, this module is for Microsoft Office 365 Mail Portal OWA implementation, after working, users access Office 365 OWA, no longer need to enter the Office 365 URL, Simply enter the address of our proxy and we will forward the request to Office 365 OWA for the purpose of the user's visit and make the user experience the same as the actual access to Office 365 OWA.
In fact, the principle of our proxy is to use node. js to build an HTTP Server, get the client (actually browser) request, then transfer the request to Office 365, the Office 365 of the return content response to the client side, so that the function of proxy .
Of course, there are a lot of details in the actual implementation process, including the processing of cookies, url conversion, and so on, not detailed here.
But when I developed and maintained this module at work, I found the problem that while we were forwarding the request, there were still a lot of requests that we needed to deal with, and there were a lot of complicated requests that needed to be researched to support, so as a proxy I had to know office 365, that is, the target site has what type of request, in fact, what are the different URLs, different URLs in fact path is different.
So I made an optimization because proxy is essentially an HTTP Server, so I print all the request URLs sent by the client in log so that I can collect all the URLs in the log and send the URL back to the result (Response Status Code) is also printed together, so that you can know if the URL is dealing with a problem, if the return value of 200, then OK.
So after printing in log, we get the following log,
1/___/outlook.office365.com/, 3022/owa/, 3023/__/LOGIN/LOGIN.SRF, 2004/owa/prefetch.aspx, 2005/___/r1.res.office365.com/owa/prem/16.801.12.1741001/scripts/preboot.js, 2006/___/r1.res.office365.com/owa/prem/16.801.12.1741001/scripts/boot.worldwide.0.mouse.js, 2007/___/OUTLOOK.OFFICE365.COM/GETUSERREALM.SRF, 2008/___/r1.res.office365.com/owa/prem/16.801.12.1741001/scripts/boot.worldwide.1.mouse.js, 2009/OWA/EV.OWA2, 200Ten/OWA/EV.OWA2, 200 One/___/outlook.office365.com/, 302 A/OWA/EV.OWA2, 200 -/owa/, 302 -/__/LOGIN/LOGIN.SRF, 200 the/OWA/EV.OWA2, 200 -/OWA/SERVICE.SVC, 200 -/owa/prefetch.aspx, 200 -/___/r1.res.office365.com/owa/prem/16.807.12.1742334/scripts/preboot.js, 200 +/OWA/SERVICE.SVC, 200 -/___/r1.res.office365.com/owa/prem/16.807.12.1742334/scripts/boot.worldwide.0.mouse.js, 200 +/OWA/EV.OWA2, 200 A/OWA/EV.OWA2, 200 at/OWA/SERVICE.SVC, 200 -/OWA/SERVICE.SVC, 200 -/___/OUTLOOK.OFFICE365.COM/GETUSERREALM.SRF, 200 -/___/r1.res.office365.com/owa/prem/16.807.12.1742334/scripts/boot.worldwide.1.mouse.js, 200 -/__/LOGIN/PPSECURE/POST.SRF, 200 -/owa/, 302
Each row of data, preceded by a URL, is followed by the response Status Code that the request received.
At the same time I wrote a script to parse the log data, because the data is repeated, need to go to the weight and sort.
The script is as follows:
1 varLinereader = require (' Line-reader '));2 varFS = require (' FS ');3 4 varFilereaddata = "URLs.log";5 varFilewritedata = "Result.txt";6 7 varIgnorenormalstatuscode =false;8 if(PROCESS.ARGV && process.argv[2]) {9Ignorenormalstatuscode = process.argv[2];//development to be passed as ParamTen } One AConsole.log ("Ignorenormalstatuscode:" +ignorenormalstatuscode); - - //Create data Object the varCreatedataobjectfromline =function(str) { - vardata = Str.split (","); - - varobj = { +Url:data[0].trim (), -Statuscode:data[1].trim (), +Number:1 A }; at - returnobj; - }; - - //get the index in the array - varIndexofobjinarray =function(array, obj) { in varpos =-1; - to for(vari=0; i<array.length; i++) { + varE =Array[i]; - the if(E.url = = = Obj.url && E.statuscode = = =obj.statuscode) { *pos =i; $ Break;Panax Notoginseng } - } the + returnPos; A }; the + //Compare number to sort - varCompare_number =function(A, b) { $ returnB.number-A.number; $ }; - - //write the array ' s data to file the varWriteresulttofile =function(result, number) { - varString = "";Wuyistring + = "Here's this URL scan result blow, \ n"; thestring + = "orignial URL number:" + number + "\ n"; -string + = "Unrepeat URL number:" + result.length + "\ n"; Wustring + = "------------------------------------------\ n \ nthe"; -string + = "Req url, this URL ' s response status code (" OK "), number statics\n"; About Fs.appendfilesync (Filewritedata, string); $ - for(vari=0; i<result.length; i++) { -Fs.appendfilesync (Filewritedata, Result[i].url + "," + Result[i].statuscode + "," + Result[i].number + "\ n"); - } A }; + the //create an array to save the URLs - varresult = []; $ the //count the orignial URL number the varNumber = 0; the the //Main function -Linereader.eachline (Filereaddata,function(line, last) { innumber++; the the //parse the data from every line About varobj =Createdataobjectfromline (line); the //console.log (obj); the the varpos =Indexofobjinarray (result, obj); + if(pos! =-1) { - //This object already exists in result array theresult[pos].number++;Bayi } the Else { the if(Ignorenormalstatuscode && Obj.statuscode = = = ' 200 ') { - // ... - } the Else { the //Add this obj to result the Result.push (obj); the } - } the the if(last) { the //sort the array by number94 Result.sort (compare_number); the the //write the result to file the Writeresulttofile (result, number);98 About //stop reading lines from the file - return false;101 }102});
Here, a node. js Module Line-reader is used to read data from a row of rows in a file.
After that, you can get the parsed result,
1Here is ThisURL Scan result blow,2Orignial URL number:1423Unrepeat URL Number:64------------------------------------------5 6Req URL, ThisUrls response Status code (is OK), number statics7 /owa/, 302, ten8 /___/outlook.office365.com/, 302, 59 /owa/auth/15.1.225/themes/resources/segoeui-regular.ttf, 404, 3Ten /owa/auth/15.1.225/themes/resources/segoeui-semilight.ttf, 404, 1 One /___/outlook.office365.com/favicon.ico, 302, 1 A/owa/auth/15.1.219/themes/resources/segoeui-semilight.ttf, 404, 1
Of course, the above result is not shown the status Code 200 URL, because this is the proxy processing the normal URL, there is no need for statistics and analysis.
After getting the result, it is obvious that there are a lot of 404 URLs, our proxy is not handled correctly, need further analysis, support in code. This completes the optimization of the product module.
Personal small feeling, work a lot of small things, if you think right, you should stick to do. Small optimization, as long as it is meaningful, will be of great use:-)
Kevin Song
2015-7-22
Optimization of "Work" Proxy server-detection of target site URL changes