Big open Brain--a classic algorithmic problem with Linux commands commonly used in a project

Source: Internet
Author: User

As a child, the family set the "reader" of the monthly magazine, which recorded a story: said there is a lonely countryside one day suddenly came a beautiful woman, she with sung children in the local home, became the local squire. She kept the secret for generations of her children until the secret would not bring disaster to the family. She is Chen Yuan-yuan. That year, Wu Sangui collar Qing soldiers in the entry, Chong crown an anger for the beauty, rewrite the history of China, but I can all the body back to the person.

Friday routine look at the data and log of the offline data push project. After I segmented the log with awk, I wanted to know that the first 10 data IDs that were repeatedly sent in real-time data were sent several times to find the starting point for further optimizations, and God knows how many optimizations I have made to this project. So the Linux command is

Cat Transmission.log |grep ' incrementalbumservice.java:146 ' |awk ' {print $6} ' |awk-f ', ' {print '} ' | Sort |uniq-c| Sort-nr |head

I'm so sorry for the results.

(data security, not shown for the ID rules section of our project)

Although this is related to their operation, I was supposed to detect the data changes sent out, but for such a large rate of re-hair. There are points that can be optimized, whether from the interface of the update service or the offline service. Girl my thinking has always been with those who appear to use a hair dryer, artificial watering cans to create a picture sense of the male God of different ideas. In addition to this result, I am still thinking of another classic algorithm problem: said to have a text file, about 10,000 lines, each line of a word, asked to count the most frequent occurrence of the first 10 words.

The problem with this algorithm is that the Linux command above is sort|uniq-c |SORT-NR | Head The time complexity is the largest of the following:

1> first to do a sort,

Direct Insert Sort: constantly inserting elements into an ordered table, the worst time complexity is O (n2)

Shell sort: Reduced incremental insertion Sort, unstable, depends on the selection of increment factor sequence, the worst time complexity is O (n2)

Simple selection Sort: Select Minimum or maximum in the number to sort to exchange with the first unsorted position, the worst time complexity O (n2)

Binary selection Sort: Each simple selection sort determines two elements, which reduces the loop by half.

Heap Sort: Tree Select Sort, Dagen, small Gan. The worst time complexity is O (N*LOGN)

Bubble sort: Two-digit comparison per adjacent, swap, worst-time complexity O (n2)

Quick sort: Select datum element, divide the element to be sorted each time, the worst time complexity is O (n2)

Merge sort: Synthesize two ordered tables into a new ordered table, the worst time complexity is O (N*LOGN)

Bucket sequencing: Algorithm for space-time change, complexity close to O (n)

Base sort: Allocation collected according to the number of bits of a Chichong, the time complexity is O (DN)

2>uniq time Complexity of O (1)

3>sort Time service level with 1>

4> is already in order. Time Complexity is O (1)

The algorithm used is also related to the size of the file, if the file is too large, too much data, you need to split the file, sorted after a long way to merge.

Without Linux commands, the classic solution is to use a dictionary tree to count the word frequency and then use Dagen. First introduce the dictionary tree, also called tire tree. Because the search engine commonly used to do text frequency statistics, Word segmentation algorithm also used this as the basic data structure, so know some. It has the advantage of minimizing unnecessary string comparisons and querying more efficiently than a hash table. The core idea is to use the public prefix to reduce the cost of query time by changing the time of space. So the first thing to think about is the dictionary tree. If you maintain one of the top ten maximum word frequency arrays in the statistical word frequency, compare the time complexity to 10 times times in the loop processing. Therefore, it is more appropriate to take top10 time efficiently before counting.

In fact, I do not understand the algorithm, but will use. A colleague of mine read an article I wrote and asked me, "is the feed stream a very technical job?" "He this question reminds me of the" Chinese paladin "Li Carefree in the restaurant to install Gaofu, said the most expensive dish:" Vegetables fried beef ", all laughter, Ling son asked Li Carefree:" Carefree brother, vegetables fried beef is very expensive dish? " ”。 Although my colleague was really asking me for advice, I felt like the one I had never seen before. Feed flow This business logic can do, there is no technical content depends on how to do. I have written a patent, introduction of a feed stream of a method of assembly, the process has not been completed, I will not open the calculation method before. But try to think, the optimization point is still a lot of. The year before last I also like to play a friend circle, often will find themselves deleted the circle of friends appeared again, or their own or other people's circle of friends suddenly the latest data is all gone, only very old data, such as two years ago the data, one day after the automatic recovery. is a matter of strategy. There are many problems in the circle of friends. Since we have a person to see people love flowers bloom products mm is the architect's family, I will not be too much spit slot.

Although today is Sunday, can the brain hole big open, also must have a topic. The previous example has a classic top k problem. Because search engines often need to count the hottest query strings, the top K problem is the foundation. TOPK problem with small Gan. Maintain a small K-size Gan, traversing the elements to be compared, respectively, and with the elements to do the comparison, if the small root element, the explanation must not enter the former K, eliminated. If it is larger than the root element, the root element is eliminated. Then adjust the tree to the minimum heap and continue the comparison.

The smallest heap is a fully binary tree, and the value of each non-leaf node is not more than the value of its child nodes. If this rule is broken, it should be adjusted from the first non-leaf node to the bottom-up order of the root node.

Next week decided to face the Hulu, no noodles, it should be noodles but. Two years ago, the former colleague to recommend Amazon, the result did not let me go to interview, comfort oneself is estimated that then they actually is not recruit. I have never been to this kind of foreign company interview, do not know what is the routine. If you start preparing now, it's estimated that 11 will be almost over. I think I must go to the interview on my own. It's not totally bad, it's going to be unstable. Read my article friends will probably think I write a very messy, very miscellaneous. I do, too, in my life. Knowledge is very broad, very whimsical, reckless, this aspect for my creativity lay the foundation, on the other hand I play on the spot. The brain is like a computer. I have a lot of parallel programs, memory is not big enough, and more data. Memory paging causes constant and disk swap. This sometimes effective action can easily lead to a time-out return. I have so many technology patents, and now I think I can't remember what I invented. Just take the bus, because few people, the driver master asked me where to get off, meaning that no one got off the place on the stop. It took me a long time to remember. My brain runs more in asynchronous nonblocking mode, but it's better to be in sync with the interview. However, there is no solution to anything, there is no way to find the ability is not enough, nothing to defy. However, the interview is to examine the comprehensive ability, such as teamwork, speaking ability and so on. Believe that the people of our department will not be "quiet very clever" this sentence has objection. Also believe that the department or the work of co-workers do not think I am a difficult to communicate or difficult to get along with people. But in the interview I would probably forget how to speak. But if I don't pass an interview because of this problem, I have no complaints. Because the interviewer is the future colleagues and leaders, if not enough in tune, in the future to go to their own ability is not necessarily able to play out. If the interview is not good still feel that their ability is enough, it is likely that their pattern is not high enough, have not seen really good people what it looks like. But I am the kind of person who is bound to hit the wall. If one thing I decided to give up, the reason is certainly not worth doing.

Like to work, my goal is to have a creative job at the age of 60. So fear that the domestic internet company will let me retire at the age of 40. There is one of the most important things: I want to do their own search engine middleware, the domestic Internet companies to use the main, I am afraid I can hardly have the energy to do this thing. Of course, not to Hulu, search engines are still to do. Just a question of how to allocate time.

I actually like to go to the wall, probably I do not want to grow up so soon. If you are very mature and graceful every day, you need to hide something you are not good at, or something that might go wrong. As a result, every day will be very happy, but it may be a lifetime. History has a lot of famous characters, originally are the Playboy, after the fall of the world came to be a great man. In the book, the transition of life has met the noble, and encountered setbacks two kinds. Young when the mentality open, meet the noble open thought can epiphany. And with the increase of experience, people will be more selective to receive the information around, this time probably need to encounter a lot of setbacks in order to rethink life. If we can see a better future, I would like to be alone and throw my bridges. Ups and downs is better than a year like a day, to live on the wonderful ~ ~

Big open Brain--a classic algorithmic problem with Linux commands commonly used in a project

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.