Massive Data Search Algorithm Optimization-storage/query/Sorting Algorithm

Source: Internet
Author: User
Massive database applications, such as national population management systems and household registration file management systems. In such massive database applications, database storage design and structure optimization (such as index optimization), database query optimization and paging algorithms are particularly important!

With the increasing popularity of the Internet, the growth of massive information, and the arrival of grid computing, the demand for mass data storage products and mass data storage technology solutions is more necessary in the market.

At the same time, the actual massive data processing involves a lot of details, includingMassive Data Storage(Physical storage, logical storage, massive database backup), data collection, massive data query (massive data paging, massive data sorting), massive data security and management.

Questions About Baidu and Google massive data search algorithms

The following are two questions About Baidu and Google's massive data search algorithms in the Baidu and Google tests"

Google and Baidu, where the data volume of people is stored, their proposition ideas are clear, they do not require a specific language, but only the efficiency and feasibility of the program, most of the questions are about massive data search algorithms.

Search Algorithms for Baidu and Google massive data

1. There are 0.1 billion floating point numbers. Please find out the 10000 big ones. Tip: assume that each floating point occupies 4 bytes, and 0.1 billion floating point numbers must be in a considerable space. Therefore, you cannot read all the data into the memory for sorting at a time.

2. There is an English article (that is to say, each word is separated by a space). Please find out the number of times that "csdn" has a word, which requires the highest efficiency and write the algorithm at a time level.

Peak Wong's massive data search algorithm question

1. There are 0.1 billion floating point numbers. Please find out the 10000 big ones. Tip: assume that each floating point occupies 4 bytes, and 0.1 billion floating point numbers must be in a considerable space. Therefore, you cannot read all the data into the memory for sorting at a time.

~~~~~~~~~~~~~

In fact, the memory usage is not large, it is acceptable.

Since the memory cannot be read at a time, you can try this:

Method 1: read pieces of data and find the maximum 1 W pieces of data. If the W pieces of data are ideal, the minimum of the pieces of data is the benchmark, you can filter out 0.1 billion of the 99% million data records, and find the maximum 100 records in the remaining 1% million ~~

Method 2: block the data. For example, if one block is million, a maximum of data entries are found.

If we find the largest data records in the above-mentioned data, we can say it's cool, or we can talk about how to find the 1w big number:

The fast sorting method is used to divide the heap into two heaps. If the number N of the large heap is greater than, the fast sorting of the large heap is further divided into two heaps. If the number N of the large heap is less, in the small heap, sort the data quickly and find the number between the nth and nth. After recursion, you can find the number of the 1w. it is said that it is also the search_n () method of STL;

Refer to the above figure to find the 1w big number. I believe the landlord can find the 1w big number in a similar way.

The second question is actually very simple.

Suppose it is case-insensitive. Because there are 26 English letters, you can map words to numbers. Csdn is mapped:

('C'-'A') * 32*32*32 + ('s'-'A') * 32*32 + ('D'-'A ') * 32 + ('n'-'A ')

That is: ('C'-'A') * (1 <15) + ('s'-'A') * (1 <10) + ('D'-'A') * (1 <5) + ('n'-'A ')

Shenzhen massive storage Equipment Co., Ltd. is one of the enterprises that have perfect massive data storage solutions in China.
WithMassive Data Storage DevicesWith the increasing demand, the world's major high-end storage device manufacturers will face the new challenges of massive data storage.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.