Scientific research and the misuse of large data concepts

Source: Internet
Author: User
Keywords nbsp; large data very but

&http://www.aliyun.com/zixun/aggregation/37954.html ">nbsp; Sudden as the night Spring Breeze, thousand tree million pear blossom. 2012, the rise of the "big data" trend, so that "data" This it circle inside the terminology of the popular industry. It can be said that no one of the IT industry terminology can be so much attention and use. In addition to the traditional IT industry and industry related to it circles, all kinds of such as catering, real estate, finance and so on can not wait to announce their "Big data" strategy.

The fourth paradigm of Microsoft Research, data-intensive science, has introduced the fourth paradigm, data-intensive scientific research, from the three Paradigms of experimental scientific research, theoretical research and analogue computational science.



Therefore, the trend of big data inevitably scraped into the field of scientific research.

In this era of popular speculation, there is still a group of science and technology workers remain calm. Although the term "large data" was first proposed by the scientific community, but what is really being promoted and used is in the Internet, especially those that are recognized for large data, whether the first 3V or 4V, to the present 11V, none of which is compatible with the characteristics of the data stream generated by the Internet, Does the scientific community really need these?


        First, large data from the concept of the "fast", where the fast can be produced fast, fast propagation, rapid change, processing speed and so on. But in the field of scientific research, much of the data is not so fast. For example, in many geographic information related areas, such as land use, soil changes, administrative divisions and other information, the same years, or changes rarely is very common phenomenon.

        Second, questions about dimensions. The big data has the idea of gathering more data, whether this data can be used at present, whether it is our current concern information, as long as possible, to collect, not afraid of all not afraid of many, afraid not (many times, many companies and researchers, have entered a data for the state of the obsession). In particular, NoSQL's popularity of this data idea has allowed many researchers to shout "Mom no longer has to worry about my data storage paradigm ...". However, we know that in the field of science, the first thing to define is your scientific goals, the goal must be defined clearly, then your data structure should be designed in the beginning to meet your research objectives, so as to have a purposeful work, if not in advance of detailed definition and design, In the process of research will lead to the weakening of goals and lost.

        also has questions about data value. The Internet data can be "not time-consuming" to describe, especially we used to cite the Twitter, Google, Facebook, such as the internet industry. But every data in scientific research is hard to come by, whether it's from an experiment or a field trip, and every data can have extremely high human and time costs.

        Getting more data is an ideal state, but if every piece of data has a large cost, it is almost impossible to achieve the same amount of data in the field of scientific research as the Internet.

        Of course, in the big data age, the big data is not simply a large number of concepts, but also contains the concept of integrity analysis.

        in the field of scientific research, obtaining complete data and analysis is also an ideal state. From the Geographical information field, sampling points exist in point mode, according to the concept of geographical information elements, the point element only (x,y) of the nature, only to indicate the location, can not represent the size, so no matter how to collect, can not be covered by the entire research area. So a variety of samples to estimate the wholeThe algorithm of the body is so important in the field of geo-information, including spatial sampling and statistical analysis.

Large data is a thought, but in the use of the process can not commit dogmatism, not the amount of data to increase data, and not in line with a variety of V to call large data, we need to really understand the case, to apply. As Comrade Xiaoping said: Black Cat, White cat, catch mice, is a good cat!


Original link: http://blog.csdn.net/allenlu2008/article/details/39611789

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.