I've been contacted a few days ago to discuss the limit compression algorithm I designed 2 years ago, I hate to be surprised, as a basic class algorithm, there is no profit space, and I designed to publish to the Internet for two years there are people interested in, in the spirit of open source, is now re-tidy up the release, Copyright Declaration, open source is the algorithm, does not mean that can be commercial, personal research no problem, but into a commercial product or need I authorize, although I do not intend to change something, but in the celestial, some things still understand good.
Talking about the compression algorithm, the famous should be consideredZipalgorithm, the algorithm should be regarded as the most widely used, where can see him, but to study the core of the idea (note: Here only to study the core ideas, rather than reading the source code, this article will only talk about pure theory, will not discuss how to write code more efficient) to find the collation of the same element should be considered to merge, For example, there is a string of data111111111111111, then followZipIf you compress, you may get the result that -a1, and then translation is theOf course, the specific process is not so, here is just an illustration, inZipthere is a core of the algorithm, called the data dictionary, this data dictionary is not the data dictionary that we usually play data, but all in the file to be compressed in the frequency of the relatively high data collection (if you do not understand here, please go back to the probability theory, although I also failed to pass the exam, but I read, The data structure has this thing, the name is what I forget, because I test data structure is to do the small copy. ), isn't it a little familiar? Right, the famous information theory should have heard, about the information theory of data compression this piece should have heard it,Zipis actually the realization of these theories.
Of course, there are other compression algorithms, such as the common Linux under the tar, and so on, the implementation of the same algorithm, in the final analysis, its source is information theory, here has to say that the influence of information theory far-reaching, but, what matters to me, Of course, I do not negate the theory of information, because it will be basically correct, but it does not mean to listen to him, it is two different things, information theory to us in the chaotic world of the message opened a window, making it easier to see and use the information, but it also restricts the information we can only through the window to contact the world , at some point, we can actually be unrestricted.
Well, now we put aside the information theory, began to contact the two basic principles of the computer World, the first: the computer does not know anything, only know the binary, and the computer will never consider the process, will only consider the results; the second: all the information stored on the computer, in the final analysis, is a binary string.
May be a little doubt about the two, I would like to explain: the first half should be well understood, the difficulty is the second half: the computer will not consider the process, will only consider the results; Quantum computers may not be suitable for this rule, but the crystal computer must be suitable, (first of all, this theory is the basis for my design of this algorithm , other algorithms and theories should not be reserved. For the file, before the computer read him will not care about what it looks like, he will only care about what is read when the time is not even read the end of the time before the last part of what kind of computer do not care, this is the most clear of those who play the password, the source of inspiration is to play the password, I turned to find out the password The second article I want to understand better, binary what the people who touch the computer will know, so there is nothing to say.
Well, after the theory is over, it's time to talk about dry foods (not to say you're supposed to be throwing rotten eggs) Please note that this article will talk about two algorithms, and are new design, this is about the element recursive collapse compression algorithm, well, here each word, each word can understand, but the combination of dizziness, this is what ah this is ... Well, I one by one explain, first of all, the element, the word is very well understood, is the basic atom, in the information theory may be very good understanding, but put here may be a little dizzy, wood has relations, not faint, elements, you can understand as a piece of data, and then recursion, well, this is a crazy word, Play recursive play good very convenient, play bad can make you cry, so do not explain, because here is the literal meaning; collapse, this good understanding? The combination is a piece of data by recursive way collapsed, no, why go to, the original is compressed ... Well, yes, that's what it means!
Well, we now formally talk about the process of the algorithm, first of all, need to have a copy to the compressed file, and then, the basic information of this document record, used to do the comparison of the restoration, the specific record of the information I do not explain here, but nothing isHashvalue Ah what, the next is to start work, how to do, the previous article we mentioned the elements, that is, the data fragment, here is not random fragments, there is a requirement, of course, there is no requirement for the content, thought that the whole algorithm directly ignore the coding problem, is based on the binary level, So the whole article does not consider the coding problem, requires two points, one point is the fragment cutting length fixed, the other point is the completion of the operation of the result is still fixed length, for example, I cut from the end of the file -a length of8bytes of data, and then I get the results by arithmetic, and then I follow each8bytes as a fragment to cut, leaving only the theone, and this theOne is a bit not much, a bit a lot, so that we kill8bytes of data, and then, as we have done with the entire file, you can imagine how much data we've lost? What the? 1/16? You're kidding, right?1/16can you call a recursive collapse? Recursive Ah, pro, don't you write the resulting results to the end of the file, and then in the cut -an element down in the operation once AH? Waste a bit of electricity will die, electricity and do not you hand in, is the company to pay ... So we can see a result: After a night of arithmetic, my files are finally compressed, but, this1kbWhat's the size of that thing? What's left of the good HD Blu-ray original disk?1Kbit? Element recursive collapse compression algorithm What the hell did you do? There are more shortcuts than others .1kbWhat, are you kidding?
16 elements calculated as 15 16 elements contained in the information has reached the limit, can not be included, so, then we will find a way, what method? Simple, ignore off on it, directly take him down, put it on another file, do a detailed mark, and then continue to recursion, encounter can not recursion and take down, until the end of the entire file, and then we can be recursive results with our records of the results of the merger, continue to recursive estimation is not possible, But you can think of another way, as for the other way, I will talk about
See here, we all know that the core idea of the algorithm, is how to change the number of elements through the operation of the three (or less) elements, but also can be restored, the algorithm I can't solve, because I am not very good , but I believe I can handle it, especially the group playing the password, in addition, please do not ask the format, coding problems, because in this algorithm, there is no code, no format, all the data I all as mathematical values to calculate, so you can use purely mathematical operations to solve the problem, Instead of a bunch of confusing problems with formatting and coding, at least even the zip algorithm, your compression rate for the ISO is basically no more than ten , For text files, there is no less than . ..
For any questions about the algorithm, please leave a comment below to discuss the problem with the code ... I can only say that there is no time to work, but do you believe, anyway, I believe.
Write more clutter, look at the demand, the element recursive collapse compression algorithm is too idealistic, to achieve a more difficult, the next article will talk about the algorithm based on the ZIP algorithm, which is easier, but, the same, compression ratio is not so high, but it will be easier to achieve, basically is information theory + cryptography + Coding Theory
Lonely Green in the abyss
December 1, 2014
Element recursive collapse compression algorithm and the same element building compression algorithm (top}