Shell in the age of big data glamour: from a little bit of thinking Baidu Big data plane question

Source: Internet
Author: User

For the students in the Linux development, Shell is a basic skill to say.

For the students of the operation and maintenance. The shell can also be said to be a necessary skill for the shell. For the release Team, software configuration management students. The shell also plays a very critical role in the data. In particular, the development of distributed systems in full swing, very many open source projects are carried out in full swing (as if not distributed systems are embarrassed to take out the matter). Configuration of distributed systems. Management, the shell also plays a very critical data, even though it is simply a copy of the file, but who makes the shell do it by nature?


Of course, the above is not the subject of this article. The topic of this article is the role of the shell in the field of big data analysis.

Look at a generation of classic Baidu Noodle test:

for a user log file, a user query string is recorded per line, with a length of 1-255 bytes, Total several Tens of thousands of rows, please drain the most queries before - article. logs can be constructed on their own.

For the students who use C + +, java. This is not to say that the executable code can be done in a few minutes. How about this also get dozens of lines of code it.

Of course, this also can examine a classmate programming, the basic ability of design.

But I believe. Suppose you can use the shell to finish, interviewer. Or at least I will be very comfortable, because the shell is born to do this:

One line of code is done:

awk ' {print '} ' $file | Sort | uniq-c | SORT-K1NR | head-n$100

don't worry about memory issues. Since these tens of millions of data can be completely in memory, and the available nodes in the cluster today, no dozens of g of memory is shy of living (assuming, of course, the machines in your production environment are in single-digit memory.) So you guys ... )。


Especially if you are on the line of one of their own feature, may want to look at the relevant data very quickly, then the data for a certain period of time to analyze, can be very good to evaluate the performance of the On-line feature, and so on.


Copyright notice: This article Bo Master original articles, blogs, without consent may not be reproduced.

Shell in the age of big data glamour: from a little bit of thinking Baidu Big data plane question

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.