Big Data scientists sound good at the "Move Bricks" job.

Source: Internet
Author: User
Keywords These data scientists large data scientists large data

  

(Original from the New York times, tiger sniffing intern Wei Cen compiled)

The popular word "Big data" contains a wide variety of digital data, from networks, sensors to mobile phones and computers. Using intelligent software to dig up this data, you can get a lot of discoveries. It makes the data-oriented decision-making process in various fields possible. This is why data scientists have become popular careers. But do you know what data scientists usually do?

Organizing data

In fact, data scientists spend their Bell time in sorting out disordered data until they can be used for digging, known as "Data Mining". These fine and tedious jobs are not as easy to imagine. This is because we are in the midst of the modern wilderness of large data, where data is collected and available, and some sort of reclamation is required.

Timothy Weaver, the CIO of the wrangling food company, points to the "iceberg effect" (icebergs) in the "Data noise" of large data. People only see the results, but do not see the results behind the large amount of labor.

However, it is also a question of symbiosis with opportunities. Some startups are trying to break the bottleneck of big data by developing software that automatically collects, cleans, and manages data.

In the future, more and more data sources will be available to reveal the operation of a company. In the food industry, for example, the data available are output, origin and transportation, weather, retail data and social network reviews. What we measure is the signal of the change in emotion and demand. As a result, we are able to see every step of the operation more clearly than ever before, and we can begin to tailor our production plans and inventories.

However, in the process of assembling different categories of data, problems can arise. The data for sensors, files, networks, and traditional databases are in different formats and must be cleaned and converted into a uniform format to join the algorithm.

Human language

The data format is just one problem, and the other is the fuzziness of human language. Iodine is a health start-up that provides customers with information about the side effects and interactions of drugs. But for the same side effects, the FDA's terminology is often slightly different. "Sleepy", "drowsiness" and "drowsiness" are used at the same time. Humans can recognize these synonyms, but software algorithms must be programmed to gain this ability to read. This kind of egg-ache work needs to be repeated in the data project.

Data experts try to automate each step of the process. "However, in operation, because of the complexity of the data, you have a good long time data gatekeeper to get fascinating results." "The data scientist and iodine founder Matt Mohebbi said.

Data software can do more than save time for scientists, and it could be a big contributor to large data computing.

The law of history shows that a new technology has been in the hands of a few elites at the beginning. However, over time, technological progress and investment have increased, tools have become more and more powerful, relevant economic development, business operations began to adapt, technology eventually into the mainstream. In the big Data age, this law still works.

Cto,john akred of Silicon Valley's data science has seen similarities between the development of modern data world and technology. "We are witnessing the origins of the revolution, and it is dedicated to making the larger population more capable of solving data problems," he said. ”

ClearStory data, a start-up company in Palo, Calif., is dedicated to developing software that identifies, aggregates, and visualizes results through tables and graphs and data maps. Its goal is to develop a larger user market through software.

A visual report typically includes six to eight data sources. For example, a report to a retailer may include scanned point-of-sale data, weather forecasts, Web site browsing, competitor pricing data, smartphone software access, and video surveillance of car park traffic. And if the data is collated manually, it may not be enough.

The algorithm still doesn't replace manual labor.

Still, data scientists stress that manual labor is still essential in data preparation. "At first you prepare the data for a specific goal, but soon you find something new and your goals change," he said. "says Cathy O ' Neil, a data scientist at Columbia Journalism School.

But there is no doubt that scientists need to sharpen their swords to reduce the pressure on data statistics. After all, 工欲善其事, prerequisite.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.