Baidu Web Search Department

Source: Internet
Author: User

First, the efficiency of the algorithm comparison

title: For arrays A and array B, two arrays of elements of the same content, but array A is already sorted, array B is unordered, for the median of the array, there are two sets of programs, compare their efficiency and analyze the reasons.        

int G;int Main () {    g = 0;    for (int i = 0; i < n; i++) {        if (A[i] > mid)           g++;    }    for (int i = 0; i < n; i++) {        if (B[i] > mid)           g++;    }}

When a processor with pipelining is dealing with a branch instruction, there is a problem that, depending on the true/false conditions, it is possible to generate a jump, which interrupts the processing of instructions in the pipeline, because the processor cannot determine the next instruction of the instruction until the branch executes. The longer the pipeline, the longer the processor waits, because it must wait for the branch instruction to finish before it can determine the next instruction to enter the pipeline.

This topic was previously browsed on the Internet, know that the efficiency of an ordered array is actually much higher than the disorder, but the reason is not to be. Now search, the original is StackOverflow above the classic question and answer, the reason is not the compiler hands, but the CPU moving hands and feet, the CPU has a technology called branch prediction, it is this technology leads to an orderly array of efficiency is very high. The process of CPU instruction execution is pipelining, simple branch prediction scheme is to judge the direction of the next element according to the current element (actually the statistical law of the processing of the element), the accurate rate of the branch prediction is very high, the disordered branch prediction technology does not take effect, can not load the command into the assembly line in advance, This loses a certain amount of CPU time.

Second, the simple algorithm to find different elements

Title: Only one digit in an array appears 1 times, and the other numbers appear two times.

Quite simple topic, because has seen, therefore skipped this topic, therefore the element is different or can.

Third, DP topics

Title: A m*n board, with a number in each lattice, calculates the maximum path from the upper left to the lower right corner, and each walk can only walk to the right or down.

ACM did, I only know is DP, but not very understanding, now good high.

First initialize the two-dimensional array s, with a double for loop, the Arrays.fill method can initialize only one-dimensional array s[0][0] = a[0][0];for (int i=1; i<n; i++) s[i][0] = a[i][0] + s[i-1][0];for ( int j=1; j<m; J + +) S[0][j] = A[0][j] + s[0][j-1];for (int i=1; i<n; i++) for (int j=1; j<m; j + +) S[i][j] + = A[i][j] + math.max (s[i-1][ J],s[i][j-1]);

Third, massive data processing

Problem: Two URL files, each with 2 billion records, each with a URL of about 1KB. There are duplicate URL records in the file, how do I remove duplicates?

Because in one side of the process learned that an ordered array de-duplication can be fast to get to the weight, so the first to consider the sort, but two such a large file single-order? External sorting, K-way merge sort, and then the interviewer on the homeopathic asked me K-way merge sort of knowledge, K-way merge sort of time estimate, because K-way merge sort a lot of time on the disk IO above, so I guess the disk IO is the time of calm, each element finally in and out of the disk 4 times, So my estimation method is the number of elements *4* disk IO average time. I don't know if this method is right.

Multi-machine extension, the MapReduce program should be able to complete, but I do not know much about Hadoop (so this method does not answer). I think of the expansion of the multi-machine, of course, is divided into, think can also be multi-machine K-way merge sort, then the interviewer led me to say can not sort? I realized that the original problem was just to remove the repetition, so the method of splitting and using hash should be able to achieve a very fast algorithm (review the book "Large Web Site Architecture") The consistency Simhash approach should be to solve this problem.

I think of the first is a hash, because the previous see how to find out the most visited IP, is the IP hash, but do not know how to hash it, after two days dedicated to a huge amount of data hash processing.

Reference Documents http://www.cnblogs.com/weixliu/p/3900633.html

Four, the question of chess

Chinese chess in the handsome, will and a will be around the taxi, output its reasonable position of the scheme.

just saw this algorithm topic time also card a bit, but later own the board number for 1,2,3,4,5,6,7,8,9 after the enlightened ~ but my code if,else more, 3 class case enumeration, later in the interviewer's prompt under the conditions of merger, save a lot of code.  

for (int s = 1, s <= 9;s++) {for     (int j = 1, J <= 9;j++) {for          (int jsb = 1; JSB <= 9;JSB + = 2) {                         if ( Validposition (S,J,JSB))                          printf ("%d,%d,%d", S,J,JSB),}}       } bool validposition (int s,int j,int JSB) {   / /will be compared with handsome, and not the soldier in front of the situation???    if (s%3 = = j%3 &&! (jsb% 3 = = J% 3 && JSB < J))        return false;    return true;}

Baidu Web Search Department

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.