Review of common interview algorithms for Big Data

Source: Internet
Author: User

  

1, the massive log data, extracts one day to visit Baidu the most times the IP.

Solution: The number of IPs is 4 digits from 0 to 256. So he's a 2^32.

Scan the log: Directly put all the first number is n in a file n. So we have 256 files.

For each small file, he found the most visited IP in Baidu (can be counted as a dictionary). Then get 256 IPs. Find the largest in 256 IPs. Overall efficiency O (N)

2. Assume that there are currently 10 million records (these query strings have a high degree of repeatability, although the total is 10 million, but if you remove the duplicates, no more than 3 million.) The higher the repetition of a query string, the more users are queried for it, the more popular it is. ), please count the most popular 10 query strings, requiring no more than 1G of memory to use.

Solution: Use a small Gan of length 10 (give a string his frequency if he inserts a heap larger than the top of the stack, otherwise it discards) and trie tree. The string records are given to the trie tree, and the corresponding value is the number of occurrences.

(That is, the scan has already been added). This structure is fast to search. Build a structure while maintaining a small Gan with a length of 10 (update a small Gan for each record or add a record). Finally, the small Gan results can be displayed.

3. Find the non-repeating integer in 250 million integers, note that the memory is not enough to accommodate the 250 million integers.

Classic Bitmap topic. Bitmap to the fastest.

Review of common interview algorithms for Big Data

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.