Common algorithms in the project

Source: Internet
Author: User

http://cstheory.stackexchange.com/questions/19759/core-algorithms-deployed/

The original content of this article originates from Stackexchange and follows Cc-wiki agreement;

Recently Emanuele Viola in Stackexchange raised such a problem, he would like to be able to enumerate some of the current software, hardware in the use of algorithms in the actual case to prove the importance of the algorithm, for everyone may give the answer, he also put forward a few requirements:

    1. Software or hardware that uses these algorithms should be widely used;
    2. Examples need to be specific, and give the exact system, the reference address of the algorithm;
    3. These algorithms or data structures should be taught in a classic undergraduate or doctoral course;

Vijay D's reply was given the best answer, and his specific reply reads as follows:

  Basic data structures and algorithms in the Linux kernel

  1. Linked list, doubly linked list, and no chain table
  2. B + Tree, the comments in the code will tell you what you can't learn in textbooks:

    This is a simple B + tree implementation, and I am writing it as an exercise to understand how the B + Tree works. Results This realization has played its practical value.

    ...

    A technique that is not often mentioned in textbooks: the minimum should be on the right, not the left. All slots within a node should be on the left, unused nodes should be NUL, and most operations only traverse all slots at once, terminating at the first NUL.

  3. An ordered list with weights is used for mutex, driver, etc.

  4. Red-black tree for scheduling, virtual memory management, tracking file descriptors and directory entries, etc.;
  5. Interval tree
  6. Radix Tree for memory management, NFS-related lookups, and network-related functions;

    A common use of the radix tree is to save a pointer to the structure of the page;

  7. Priority heap, the description of the text, mainly in the textbook implementation, for the control group system;

    A static-size-priority heap that contains only simple insertions of pointers, based on the CLR (introduction to Algorithms), chapter seventh

  8. hash function, citing Knuth and one of his papers:

    Knuth recommends choosing a multiplication hash with the maximum number of integers that can be expressed as the length of the machine, which is about the prime ratio, and Chuck Lever confirms the effectiveness of the technique.

    Http://www.citi.umich.edu/techreports/reports/citi-tr-00-1.pdf

    The prime numbers of these choices are bit sparse, which means that their operations can be replaced by displacement and addition to slow multiplication operations in the machine;

  9. Some code, such as this driver, is the hash function that they implement.

  10. A hash table for indexing nodes, file system integrity checks, and so on;
  11. Bit array for handling flags, interrupts, etc., with descriptions of their characteristics in the fourth volume of Knuth;
  12. semaphores and Spin Locks
  13. Binary tree search for interrupt processing, registration cache lookup, etc.;
  14. Binary Tree search using B-tree;
  15. Depth-first search and his variants are applied to the directory configuration;

    Executes a modified depth-first algorithm in the namespace tree, starting (and terminating at) the node determined by Start_handle. When a node that matches a parameter is discovered, the callback function is called. If the callback function returns a non-null value, the search will be terminated immediately, and the value will be passed back to the calling function;

  16. Breadth-first search is used to check the correctness of locks at run time;
  17. The combined sorting on the linked list is used for garbage collection, file system management, etc.
  18. In a driver's library function, the bubble sort is actually implemented.
  19. Knuth-morris-pratt string matching;

    Knuth, Morris, and Pratt [1] Implement a linear time complexity string matching algorithm. The algorithm completely avoids the explicit calculation of the DELTA of the conversion function. The matching time is O (n) (where n is the length of the text), using only one auxiliary function PI[1...M] (where m is the length of the pattern), and the preprocessing time of the pattern is O (m). The PI array allows the DELTA function to run quickly when needed. In general, the character "a" in any state q=0,1,..., m and any SIGMA, pi["Q"] holds information independent of "a" and is used to compute the DELTA ("Q", "a"). Since the PI array contains only m entries, and the delta contains O (msigma) entries, we calculate the PI and then save the SIGMA factor in the preprocessing time, rather than the delta.

    [1] cormen, Leiserson, Rivest, Stein introdcution to Algorithms, 2nd Edition, MIT press

    [2] See finite Automation theory

  20. Boyer-moore pattern matching, the following are references and suggestions for the use of other algorithms;

    Boyer-moore String Matching algorithm:

    [1] A Fast String searching algorithm, R.S. Boyer and Moore. Communications of the Association for Computing Machinery, (+), 1977, pp. 762-772.http://www.cs.utexas.edu/users/ Moore/publications/fstrpos.pdf

    [2] Handbook of Exact String Matching algorithms, Thierry LECROQ, 2004http://www-igm.univ-mlv.fr/~lecroq/string/string.pdf

    Note: Since Boyer-moore (BM) is matched from right to left, there is a possibility that a match is distributed in different blocks, in which case no match can be found.

    If you want to make sure such things don't happen, use the Knuth-pratt-morris (KMP) algorithm instead. In other words, choose the appropriate string lookup algorithm based on your settings.

    If you use a text search schema to filter, network intrusion Detection (NIDS), or any security purpose, then select KMP. If you are concerned with performance, such as when you classify a packet and apply a quality of service (QoS) policy, and you don't mind the possibility of having to match in a distributed number of fragments, then choose BM.

  Chromium data structures and algorithms in the browser

    1. Stretching tree

      This tree is parameterized by the allocation policy, which is responsible for allocating lists in the free storage space and area of C, see Zone.h

    2. Voronoi diagram used in Demo
    3. Label management based on Bresenham algorithm

At the same time, the Code also contains some third-party algorithms and data structures, such as:

    1. Two-fork Tree
    2. Red and black Trees
    3. AVL Tree
    4. Rabin-karp string matching for compression
    5. Calculating the suffix of the automata
    6. Apple-Implemented fabric filter
    7. Brinell algorithm

  Programming Language Class Library

    1. C + + STL containing lists, heaps, stacks, vectors, sorting, searching, and heap manipulation algorithms
    2. The Java API is very extensive and contains too many
    3. Boost C + + class library, including such as Boyer-moore and knuth-morris-pratt string matching algorithm;

  Allocation and scheduling algorithms

    1. The least recently used algorithms are implemented in many ways, and are based on lists in the Linux kernel;
    2. Other what you may need to know is first-in, first-out, least common, and polling;
    3. A variety of FIFO is used in VAX and VMS system;
    4. Richard Carr's clock algorithm is used for page frame substitution in Linux;
    5. The random substitution strategy is used in the Intel i860 processor;
    6. Adaptive cache replacement is used in some IBM storage control, because of patent reasons in PostgreSQL only simple application;
    7. Knuth the partner memory allocation algorithm mentioned in the first volume of TAOCP is used in the Linux kernel, both FreeBSD and Facebook use the jemalloc concurrent allocator;

  Core components in the *nix system

    1. Both grep and awk implement the use of the Thompson-mcnaughton-yamada build algorithm to create an NFA from a regular expression
    2. Tsort enables topological sequencing
    3. Fgrep implements the Aho-corasick string matching algorithm;
    4. GNU grep, according to the author Mike Haertel, implements the Boyer-moore algorithm;
    5. The crypt (1) in Unix implements the variant of the encryption algorithm in the Charades machine (Enigma machines);
    6. Doug Mcllroy's Unix diff based on the prototype of the James collaboration is better than the standard dynamic programming algorithm used to calculate Levenshtein distances, and the Linux version is used to calculate the shortest editing distance;

  Encryption algorithm

    1. Merkle tree, especially a variant of the Tiger tree Hash, used for point-to-point procedures such as GTK Gnutella and LimeWire;
    2. The MD5 is used to provide checksums for software packages and for integrity checks in the *nix system (Linux implementation), while also supporting Windows and OS X systems;
    3. OpenSSL implements the need for cryptographic algorithms such as aes,blowfish,des,sha-1,sha-2,rsa,des, etc.;

  Compiler

    1. YACC and Bison implement the LALR parser
    2. The dominant algorithm is used to optimize the compiler based on the SSA form;
    3. Lex and flex compile regular expressions as NFA;

  Compression and picture processing

    1. The LEMPEL-ZIVSRAF algorithm, which appears for the GIF image format, is often applied in the image processing program, which transforms from a simple *nix component into a complex program;

    2. Run length encoding is used to generate PCX files (for paintbrush in this program), to compress BMP files and TIFF files;

    3. Wavelet compression (Wavelet compression) is the basis of JPEG 2000, so all digital cameras that generate JPEG 2000 files are implemented with this algorithm;

    4. Reed-solomon error correction for Linux kernel, CD Drive, barcode reading, and combined with convolution from the navigation team for picture transmission;

  Conflict-driven Terms learning algorithm (Conflict driven Clause learning)

Since 2000, the operating time of the SAT (Boolean gratification problem) solver in industry standards has been exponentially reduced every year. A very important reason for this development is the use of the conflict-driven terms learning algorithm (Conflict driven Clause learning), which combines Davis Logemann and Loveland Constrained programming and artificial intelligence research techniques in the original paper on Boolean constrained propagation algorithm. Specifically, the SAT is considered a simple issue in industrial modelling (see discussion). For me, this is one of the greatest success stories of modern times, because it combines advanced algorithms, ingenious design ideas, experimental feedback, and a concerted effort to solve this problem. Malik and Zhang's CACM paper is a good reading material. Many universities are teaching this algorithm, but usually in a logical or formal method of the course.

  Discussion on the micro-Bo Hot

Databricks Big Data co-founder @hashjoin first and then spread this content on Weibo:

Many students and software engineers will be curious about the value of the actual application of the algorithms they have learned in the past. This stackexchange answer lists the various classic algorithms used in several open source projects. Http://t.cn/8kAP4yG Author lists code from the most basic hash table to string matching and encryption algorithms in the Chromium and Linux kernels. Viewing open source code is a good way to learn the algorithm.

We have also published their own views:

@GeniusVczh:

The so-called algorithm implementation is the same as the endorsement, so if not to learn grammar, do not look at those with code programming books, or the code inside the programming book. For the purpose of learning, things on their own, and then use, with the Xiang, you know why he is not good.

@ Left Ear Mouse:

The people who say the algorithm doesn't work are basically saying that he is only in the bottom of the hole where the business function code is simply stacked.

@ Shi Zhenghua-Chinese Academy of Sciences:

I have always felt that before telling each technology, it is best to let everyone know what this technology can do, what has been done, and where it may be used in the future. This will increase people's interest in technology, understanding and flexible use, will make you learn better. It's heavy.

  Original problem Link: Core algorithms deployed

Thank Wu for reviewing this article.

Common algorithms in the project

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.