Common Data Structures of massive data

Source: Internet
Author: User

Data Structure

Application scenarios

Example

Hash table

All key-value pairs must be placed in the memory. The search can be completed within the constant time.

L extract the IP address with the most frequent access to Baidu from a log

L count the numbers of different phone numbers

Heap

It takes O (logn) Time to insert and adjust. n is the number of heap elements, and obtaining the heap top element only requires constant time.

L calculate the first K of massive data

L calculate the median of massive data streams

Bitmap

It usually records the occurrence of integers for fast search, number determination, and deletion of elements.

L count the numbers of different phone numbers

L number of repeated integers in the 0.25 billion Integers

Double Bucket

Two addressing modes to save memory, usually used for determining the maximum K, median, and number.

L 0.25 billion integer to find the median

L K value of massive data

Reverse Index

Index using words-documents, properties-objects to facilitate reverse search

L keyword-based search

L auto-completion entered in the search box

Outbound

Use hard disk space to sort massive data

L 1 GB file, each line is a word, memory 1 MB, return the most frequently 100 words

Prefix Tree

Create a Prefix Tree for all words in the Set

L find the popular query string

L find words with high repetition rate

Mapreduce

In distributed processing, data is handed over to different machines for processing, data is divided, and then the results are normalized.

L massive log analysis

L Data Mining

L Intelligent Recommendation System

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.