JS operation binary: LZ77 algorithm

Source: Internet
Author: User
Tags date empty eval header pow split variable window

JS operation binary System is very troublesome, and has not a good lossless compression tool to achieve the compression of plain text.

So delve into a period of gzip, later found or only with LZ77 more easily implemented, Gzip Haffman compression part for JS is too difficult to get.

The code is as follows, the comments are very complete, so do not say more, interested can be carefully studied under:

Run Code Box
<ptml> <pead> <title>LZ77</title> <style> * {font-size:12px;} body {overflow:auto; backg Round-color:buttonface; } textarea {width:100%; height:240px Overflow:auto;} #btn1 {width:100px;} </style> <script> window.onload = init; function $ (s) {return document.getElementById (s);} function init () {$ (' txts '). focus (); $ ("btn1"). onclick = run; $ ("Txts" ). onkeydown = function () {if (Event.keycode = && event.ctrlkey) {run ();}}} function run () {var str = $ ("txts"). Value; $ ("txts"). Value = ""; var lzc = new Lz77compressdefer (str); var t = new Date (); Lzc.start (function (Result) {$ (' txtr '). Value = Lz77selfextract (result); var tc = new Date ()-T; $ ("txts"). Value = Eval ($ ("Txtr"). Value.substring (4)); var td = New Date ()-T-TC; Alert ("compressed \ r \ n Compression Ratio:" + ($ ("txtr"). value.length/str.length*100). toFixed (2) + "%\r\n compressed:" +tc+ "ms\r\n uncompressed:" +td+ "ms\r \ n Checksum: "+ (str==$ (" txts "). Value?" OK ":" "failed")); }); function showprogress () {var p = Lzc.status(); if (P < 1) {$ ("txts"). Value = "in compression ... "+ (p*100). toFixed (2) +"% "; SetTimeout (showprogress, 300); } showprogress (); /* $ ("txtr"). Value = Lz77compress (str); var tc = new Date ()-t; $ ("txts"). Value = Lz77decompress ($ ("txtr"). Value); var td = New Date ()-T-TC; Alert ($ ("txtr"). value.length/$ ("Txts"). value.length+ ": +tc+": "+td+": "+ (str==$ (" txts "). Value); * * * * with the LZ77 principle of JS text compression algorithm * Author:hutia * * * * LZ77 basic principle: 1, from the current compression position, review the data is not encoded, and try to find the longest matching string in the sliding window, if found, then proceed to step 2, otherwise enter Line Step 3. 2, output ternary symbol group (off, Len, c). Where off is the offset of the matching string relative to the window boundary in the window, Len is a matching length, and C is the next character. Then slide the window back to Len + 1 characters and proceed to step 1. 3, output ternary symbol group (0, 0, c). Where c is the next character. Then slide the window back to Len + 1 characters and proceed to step 1. Variant: 1. The matching string and the unmatched individual characters are encoded separately, respectively, and output matching strings are not output at the same time for subsequent characters. Variant of this algorithm: 1. A leading character p with a low probability is used to differentiate the matching string output from the mismatched string. For matching strings, output (P, off, Len), for mismatched strings, output C. When the character P appears in the mismatched string, the output PP is substituted to show the difference from the matching string. Therefore, the character P cannot appear in the output (off, Len) result of the matching string, lest it be confusing. In this case, take (') as the leading character. 2. For matching strings, the output is: Leading character (') + offset (3 bit, 92 = 778688) + matching length (2 bit, 92 = 8464) so the sliding window size is 778688 and the minimum matching length is 7. 3. This algorithm for JS files, for the simplification of the algorithm temporarilyConsider window sliding (JS files are usually not larger than 700K). Using this algorithm can be an error for files larger than 778688 bytes. Sliding windows or segmented compression can be implemented in the future. 4. In this example, the algorithm is simplified to convert off and Len to a 92-character string, and the low position is placed on the left and the top on the right. Author: hutia email:hutia2@163.com reprint Please specify the source * * var NC = [], CN = []; NC = "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz0123456789~!@#$%^&* ()-=[]\;",./_+{}:\ "&LT;&GT;?". Split (""); for (var i=0 i<nc.length; i++) cn[nc[i] = i; function lz77compress (input) {/*lz77 compression algorithm-HUTIA-JS Version/////////* Variable declaration * * var p = 0;//scan pointer var LP = 0;//list query pointer var len = input. Length The length of the input string is var output = []; Output var index = ""; index var head = []; Index header information var prev = []; Position list var Match_off = 0; Offset of the matching position var Match_len = 0; The length of the match occurred var last_match_off = 0; Offset of last match position var last_match_len = 0; Last occurrence of the matching length var j = 0; Cyclic variable/* cyclic scan/for (P = p<len; p++) {index = input.substring (P, p+7);///Take the 7 characters starting with the current character as index/* List maintenance//prev[p] = Head[ind EX]; Current head position into the list head[index] = p; Save current Position Header info/* MATCH/LP = P; Initialize the linked list query pointer match_len = 0; Initialization matching length Match_off = 0; Initialize the matching position if (PREV[LP])//If the listThere is a previous match {/* matching query/while (PREV[LP])//view each location on the linked list {LP = PREV[LP];////Remove the previous position on the list to the list query pointer for (j=1; j<8464 && LP +j<p; J + + to find the longest match for this position, the match length cannot exceed 8464 (2 byte length of 92), nor exceed the current pointer position {if (input.substring (LP, LP + j)!= input.substring (p, p + j)) b Reak; } j--; Calculates the longest match if (J > 7 && J > Match_len)//If this match is longer than the found match long {match_len = J;//record match length Match_off = LP;//Record match location}} /* Matching processing/if (Match_len > 7)//If matching requirements are found {if (last_match_len!= 0 && Last_match_len < Match_len)//if last match exists, and the length does not have the length of this match large {/* lazy mode/Output_unmatch (Input.charat (p-1));//Discard the last match, direct the character output Last_match_off = Match_off;// Record this match position last_match_len = Match_len; Record this match length} else if (Last_match_len!= 0)//If the last match exists and the length is greater than the length of this match {/* handles the last lazy mode//Output_match ();//output last match} else//if on The second match does not exist {/* lazy mode/Last_match_off = Match_off;//record this match position last_match_len = Match_len;//record the match length} else//If no match is found ( For example, match exceeds current pointer) {if (last_match_len!= 0)//If last match exists {/* handle last Lazy Mode/output_match ();//Output last match} ElSE {output_unmatch (Input.charat (P));//Direct output current character}} else//If there is currently no match {if (last_match_len!= 0)///If a match occurred before {////If the previous occurrence of the lazy Idler mode */Output_match (); Output matches} else {Output_unmatch (Input.charat (P));//Direct output current character}}//cyclic scan end////if (Last_match_len!= 0)//If a match occurred before {/* Handle the last lazy mode */Output_match ()//Output Match}/* Output/return Output.join (""); function Output_match () {Output.push (""))//Output prefix Output.push (N2C (Last_match_off, 3));//Output 3 byte offset Output.push ( Last_match_len, 2)); Output 2 byte match length p + + last_match_len-2; Move the current pointer to the end of the matching string (because lazy mode, at which point P points to Last_match_off + 1 position, so should-2) Last_match_off = 0; Empty match position Last_match_len = 0; Empty match length} function Output_unmatch (c) {Output.push (c = = "'"?) "'": c); Output unmatched character} function lz77decompress (input) {/*lz77 decompression algorithm-HUTIA-JS Version//////////* Variable declaration * * var p = 0; scan pointer var len = Input.len Gth The length of the input string is var output = []; Output var match_off = 0; Offset of the matching position var Match_len = 0; Occurrence of matching length/* Cyclic scan/for (P = p<len; p++) {if (Input.charat (p) = = "")//if the prefix tag {if (in) is foundPut.charat (p + 1) = = "'")//if it is an escape prefix {output.push ("");//Direct output character "'" p++;//Pointer after move, skip next character} else//If compression encoding {Match_off = C2 N (Input.substring (p+1, p+4)); Takes out its 1-3 characters, calculates the offset Match_len = c2n (input.substring (p+4, p+6)); Take out its 4-5 characters, calculate the matching length output = [].concat (Output.join ("")); Finishing output Content Output.push (output[0].substring (Match_off, Match_off + Match_len)); The string represented by the encoding is removed from the corresponding offset position of the output P = 5; The pointer moves back, skipping the next 5 characters} else//If no prefix tag {output.push (Input.charat (p)) is found;//Direct output the corresponding character}}/* Output/return Output.join ("");} /*LZ77 Decompression Algorithm-hutia-js/mini version */Hutia = function (s) {var a= "charAt", p=-1,l=s.length,o=[],m,a= "ABCDEFGHIJKLMNOPQRSTUVWX" yzabcdefghijklmnopqrstuvwxyz0123456789~!@#$%^&* ()-=[]\; ',./_+{}:\ "&LT;&GT;?". Split (""), _=[];while (++p<92) _[a[p]]=p;function $ (c) {var l=c.length,r=0,i=-1;while (++i<l) r+=_[c[a] (i)]* Math.pow (92,i); return r;} P=-1;while (++p<l) {if (S[a] (p) = "'") {if (s[a) (p+1) = = "") P++,o.push ("'"); else{m=$ (S.substring (p+1,p+4)); o=[]. Concat (O.join ("")); O.push (o[0].substring m,m+$ (s.substRing (P+4,P+6)));p +=5}} Else O.push (S.charat (P));} Return O.join ("");} function Lz77selfextract (s) {return "eval ((" +string (Hutia) + ") (\" "+s.replace (/\\/g," \\\\ "). Replace (/\r/g," \ r "). Replace (/\n/g, "\\n"). Replace (/\ "/g," \\\ ") +" \ ")); function Lz77compressdefer (input) {/*lz77 compression algorithm-Hutia-js/defer Version//////* Variable declaration * * var p = 0;//scan pointer var LP = 0;//LIST query pointer va R len = input.length; The length of the input string is var output = []; Output var index = ""; index var head = []; Index header information var prev = []; Position list var Match_off = 0; Offset of the matching position var Match_len = 0; The length of the match occurred var last_match_off = 0; Offset of last match position var last_match_len = 0; Last occurrence of the matching length var j = 0; Cyclic variable var callback; callback Function This.start = function (fn) {This.start = function () {} callback = fn; run ();} this.status = function () {return P /Len; function run () {var inner_i = 0;/* loop scan/for (; p<len; p++) {if (++inner_i >) {return settimeout (run);} I Ndex = Input.substring (P, p+7); 7 characters starting with the current character as index/* List maintenance/prev[p] = Head[index]; Current head position into the list Head[index] = p; Save current Position Header info/* MATCH/LP = P; Initialize the linked list query pointer match_len = 0; Initialization matching length Match_off = 0; Initialize the matching position if (PREV[LP])//If there is a previous match on the list {* * matching query/while (PREV[LP])///In turn view each location on the list {LP = PREV[LP];//Remove the previous position on the list to the list query pointer fo R (j=1 j<8464 && lp+j<p; j + +) find the longest match for this position, the match length cannot exceed 8464 (2 byte length of 92), nor exceed the current pointer position {if (input.substring LP, LP + j)!= input.substring (p, p + j)) break; } j--; Calculates the longest match if (J > 7 && J > Match_len)//If this match is longer than the found match long {match_len = J;//record match length Match_off = LP;//Record match location}} /* Matching processing/if (Match_len > 7)//If matching requirements are found {if (last_match_len!= 0 && Last_match_len < Match_len)//if last match exists, and the length does not have the length of this match large {/* lazy mode/Output_unmatch (Input.charat (p-1));//Discard the last match, direct the character output Last_match_off = Match_off;// Record this match position last_match_len = Match_len; Record this match length} else if (Last_match_len!= 0)//If the last match exists and the length is greater than the length of this match {/* handles the last lazy mode//Output_match ();//output last match} else//if on The second match does not exist {/* lazy mode/Last_match_off = Match_off;//record this match position last_match_len = Match_len;//record this match length}else//If no matching match is found (for example, match exceeds current pointer) {if (last_match_len!= 0)//If last match exists {/* process last Lazy mode/output_match ();//output last match} else { Output_unmatch (Input.charat (p)); Direct output current Character}} else//If there is currently no match {if (last_match_len!= 0)///If a match has occurred before {/* handles the last lazy mode//Output_match ();//Output matches} else { Output_unmatch (Input.charat (p)); Direct output current Character}}//cyclic scan end////if (Last_match_len!= 0)//if previous match {* * handles last lazy mode//Output_match ();//Output Match}//callback Output/* Callback (Output.join (""));} End of Run function Output_match () {Output.push (""))//Output prefix Output.push (N2C (Last_match_off, 3));//Output 3 byte offset OUTPUT.P Ush (N2C (Last_match_len, 2)); Output 2 byte match length p + + last_match_len-2; Move the current pointer to the end of the matching string (because lazy mode, at which point P points to Last_match_off + 1 position, so should-2) Last_match_off = 0; Empty match position Last_match_len = 0; Empty match length} function Output_unmatch (c) {Output.push (c = = "'"?) "'": c); Output unmatched character} function c2n (c)//Converts a 92-feed string (high to right) to the 10 binary number {var len = c.length; var re = 0; for (var i=0; i<len; i++ ) {Re + = Cn[c.charat (i)] * Math.pow); return to re; function n2c (n, Len)//Converts a 10 binary number to a specified length of 92 binary string, high on right {var re = []; for (var i=0; i<len; i++) {re[i] = nc[n% 92]; n = n/92 0; Return Re.join ("");} </script> </pead> <body> <textarea id= "txts" ></textarea> <textarea id= "TXTR" > </textarea> <br/> <input type= "button" value= "Go" id= "btn1" > </body> </ptml>
[Ctrl + A ALL SELECT hint: You can modify some of the code, and then run]



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.