What do you think data engineers do every day like http://hackertyper.net/, and then create one great product after another? Wrong! The New York Times reporter interviewed a number of large data engineers who said they had spent 80% of their time as a "big Data Drudge", it's not what humans can imagine. Tedious work--extracting useful data from massive amounts of raw data, sorting, converting formats, adjusting to structured data in the same format as the algorithm can understand ...
As a result, these data engineers call themselves "Data cleaners", "Data porters","data shapers" and so on ... Monica Rogati, vice president of data science at Jawbone, a leading health tracking arm, told reporters that for most people, even ordinary programmers, the job was extremely dry and unacceptable, but for data engineers, it was something they had to do every day.
Professor Jeffrey Heer, a professor at the University of Washington and a founder of Trifacta, a big data start-up, said that simply entering the algorithm into a bunch of raw data and expecting the results to pop out on its own is yarn ... It is not surprising that data engineers need to convert data of different formats (very massive) into neatly formatted data that the algorithm can understand.
Iodine is a medical start-up company. The company's employees disclosed that their products could provide users with warnings about side effects of drugs by tapping raw data from the State Food Administration (FDA), the National Health Center, and the text and images provided by pharmaceutical companies. But things are far less simple than you might think.
The light drowsiness one has "drowsiness", "somnolence" and "sleepiness" Three kinds of theories, lets the user see these three words certainly to be able to understand, but do not expect the algorithm to be able to understand these three words to represent the same meaning.
So the so-called "big data" start-up companies, recently in the basic is to do through different channels, pointcuts, to complete a task: the production of a standardized, simple data processing software, so that data engineers are not so tired, directly to all the raw data input, extract the results, simple. Parooto, a start-up company called ClearStory Data, is doing something like this.
The company offers a product that integrates raw data from a variety of different specifications into a visual presentation table, picture, or map. The company's CEO, Shahani-mulligan, said ClearStory's products were able to consolidate 6 to 8 different data formats, and that the results were suitable for end-users who were ignorant of the data.
You can also manually count the data, and I bet you'll never find enough data engineers to do this ...
(Responsible editor: Mengyishan)