Application of bloom filter data structure

Source: Internet
Author: User

Source: http://www.xymyeah.com/645.html

Application 1: store the dictionary. You may be familiar with the word spelling check function. When you misspell a word, the word is automatically marked with a red line. The specific working principle of word is unknown, but bloom filter is used in the software UNIX spell-checkers. UNIX spell-checkers stores all dictionary words in the data structure of the bloom filter, and then queries them directly on the bloom filter.

Error: The word is missing.

Application 2: semi-join operation of the database. For example, TA stores the employee/City field and TB stores the city/Cost of Living field. Now we need to find all the employees of cost of living> 50,000 $. Obviously we need to join TA and TB. Intuitively, you need to send all the city/Cost of Living Field Values of TB to TB, and then find
Join the matched cities to obtain the values of all the employee/city fields. This is time-consuming and laborious. The bloom filter method is to make the TB City field into the bloom filter data structure and send it to the TA so that the city in the TA matches on the bloom filter, re-issue a TB join for all the found employee/city instances. This is equivalent to filtering out a part that does not exist in
The city in Ta improves the execution efficiency.

Error: nonexistent.

Application 3: distributed cache technology. The principle of Web Cache Technology is that if a proxy requests a webpage, it first checks whether other proxies have this webpage, rather than directly sending requests to the Web. In this way, the download speed of the web page can be improved. To reduce the amount of data transmitted over the network, the proxy regularly broadcasts the URLs stored in the bloom filter format, instead of broadcasting its own large URL list to other proxies.

Error: A proxy thinks that a proxy has a URL, but the proxy does not actually have this URL. Therefore, it may cause a certain delay.

Because the cached content changes frequently, the proxy usually uses counting bloom filter to store its own cached URL, and only uses 0-1 bloom filter to store other proxy cached URLs.

Application 4: P2P. The basic principle of P2P technology is that peer downloads files from the source site and from other peers. Generally, a large distributed file is stored on many peer databases. In this way, the peer databases must cooperate with each other during the download process, peer A needs to send the data that has but does not have peer B to peer B, use bloom filter can easily carry out the set operation SA-SB to find the data to be transmitted.

Error: Data in the SA-SB may be missing and not passed to peer B.


More please go to http://www.xymyeah.com/645.html


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.