You don't know the data engineer: 80% times are doing "big data drudgery"

Source: Internet
Author: User
Keywords Large data all in raw data
Tags .net big data company create data different http large data

What do you think data engineers do every day like http://hackertyper.net/, and then create one great product after another? Wrong! The New York Times reporter interviewed a number of large data engineers who said they had spent 80% of their time as a "big Data Drudge", it's not what humans can imagine. Tedious work--extracting useful data from massive amounts of raw data, sorting, converting formats, adjusting to structured data in the same format as the algorithm can understand ...

As a result, these data engineers call themselves "Data cleaners", "Data porters","data shapers" and so on ... Monica Rogati, vice president of data science at Jawbone, a leading health tracking arm, told reporters that for most people, even ordinary programmers, the job was extremely dry and unacceptable, but for data engineers, it was something they had to do every day.

Professor Jeffrey Heer, a professor at the University of Washington and a founder of Trifacta, a big data start-up, said that simply entering the algorithm into a bunch of raw data and expecting the results to pop out on its own is yarn ... It is not surprising that data engineers need to convert data of different formats (very massive) into neatly formatted data that the algorithm can understand.

Iodine is a medical start-up company. The company's employees disclosed that their products could provide users with warnings about side effects of drugs by tapping raw data from the State Food Administration (FDA), the National Health Center, and the text and images provided by pharmaceutical companies. But things are far less simple than you might think.

The light drowsiness one has "drowsiness", "somnolence" and "sleepiness" Three kinds of theories, lets the user see these three words certainly to be able to understand, but do not expect the algorithm to be able to understand these three words to represent the same meaning.

So the so-called "big data" start-up companies, recently in the basic is to do through different channels, pointcuts, to complete a task: the production of a standardized, simple data processing software, so that data engineers are not so tired, directly to all the raw data input, extract the results, simple. Parooto, a start-up company called ClearStory Data, is doing something like this.

The company offers a product that integrates raw data from a variety of different specifications into a visual presentation table, picture, or map. The company's CEO, Shahani-mulligan, said ClearStory's products were able to consolidate 6 to 8 different data formats, and that the results were suitable for end-users who were ignorant of the data.

You can also manually count the data, and I bet you'll never find enough data engineers to do this ...

(Responsible editor: Mengyishan)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.