Big Data Brain drain: Why research is in trouble

Source: Internet
Author: User
Keywords Academia these researchers very published

However, in recent years, the rapid transformation of scientific research field to data-centered, has a negative impact. In summary, the ability of competent scientific researchers and the ability of qualified industry practitioners are increasingly indistinguishable. The typically inert academic community has just begun to adapt to this shift, and other areas have already begun to scale up and reward this capacity. Unfortunately, this has led to a lot of talented would-be researchers in academia, can only invest in the arms of the industry.

The Magic effect of data

In the 1960s, physicist Eugene Wigner published his work: The magical effects of mathematics in the natural sciences. It discusses the effectiveness of abstract mathematical concepts in context, far exceeding the context in which these concepts are generated, and the gap is staggering. After all, who would have thought that riiemannn's 20th century study of the non-European collection would be Einstein's basis for rethinking gravitation? Who would have thought that the codification of the spinning group of abstract solids almost eventually led physicists to successfully predict the existence of the Higgs boson?

Google researcher Alon Halevy, Peter Norving and Fernando Pereira in 2009 to "the Magic of the data" for the issue of the article to respond to this view. The article presents its astonishing insight: as long as there is enough data, the discretionary choice of mathematical models is no longer important-especially for the automated language translations they are studying, "compared to very few data and accurate models, simple models and a lot of data."

If we are bold enough to assume that this view can be extended (at least partially) to natural language processing, then only data mining is a skill that will gradually defeat domain knowledge. I believe this prophecy has been proven: the ability to deal effectively with data in many academic fields is replacing other, more classical research models.

I'm not saying that the mastery of a field is completely outdated. Without understanding the theory of particle interaction, the 10Gb per second velocity produced by the Large Hadron Collider (LHC) would be useless, just as understanding the theory that physical processes drive space explosions, and the original image data of 15TB per night produced by large astronomical telescopes (LSST) can help us understand cosmology. However, the LHC and LSST reflect an increasingly common phenomenon: scientific results rely entirely on accurate analysis of a large number of data. In fact, we find that even if the volume of data is not large enough, researchers who can process, abstract, excavate, and learn from the data are gradually promoting scientific progress.

New Scientist

In a sense, data-driven research is a simple continuation of past research trends. Since the 16-17 century of scientific research separates Aristotle's philosophy, scientific progress has largely depended on experimentation and observation. To be aware, it was the seminal study of the sky in the 16th century, which helped to advance the study of Kepler's planetary movement in 17th century, paving the way for Newton's law of universal gravitation and eventually forming Einstein's general theory of relativity. Scientists are always trying to deal with the data, except that this effort is now at the heart of the scientific process.

However, the gradual data-centric research has produced a new solution to the problem: to enter the LHC, lsst era, good at using high-performance parallel data statistics algorithm to explore a large number of mass data sets of researchers, as well as new statistical methods, machine learning algorithms, high-speed code, The unprecedented scale of the previous application of the typical analysis has promoted exciting research. In short, the New Scientist must be a multidisciplinary expert proficient in statistics, computing, algorithms, software design, and domain skills that may be used afterwards. In almost all areas of particle physics, biology, chemistry, neuroscience, marine science, and Atmospheric physics, research is becoming more and more data-driven, with no sign of slowing down the speed at which data is collected.

The fundamental role of scientific software

The common denominator of scientific software is that these jobs are inseparable from writing code. High-quality, well-organized public code can have an impact on the very important reproduction of the scientific process. Many public software is about the current non-renewable science crisis, the need for new forms of publication, new research, code, and open access to data. No more details here.

What I want to discuss in detail here is the core role of optimized professional software in the analysis and abstraction of large datasets, and the deductive process that has become the core of modern scientific research. My collaborators, Gael Varoquauz, and his colleagues recently commented on this point (see Gael's profile) and made a case study of the argument that public, well-organized and robust scientific code is essential for the reproducibility of modern scientific research and the progress of research itself. The results of past studies, if simply mentioned in the paper, and the actual process of producing the results are not organized, will not be the basis for new research. As Buckheit and Donoho once said:

The science of computing in academic journals is not academic, they are just the cover of scholarship. True scholarship is a complete software development environment and a complete set of instructions to calculate numbers.

The public code looks like an afterthought, but in general, just releasing the code is not enough. As Brandon Rhodes said in Rupy 2013 talk, "a program works better than it can run normally". Making code useful to authors outside of scientific research requires considerable input. Such projects have incalculable value, like numpy projects, Scikit-learn projects, and so on. They provide a framework within which code can be shared, reviewed and published on the GitHub for the benefit of the research community.

The fault of academia

This is the malady of academia: although layered, high-quality software is at the heart of the current scientific model, and these practices will contribute to the success of academic research, academia has successfully blocked this practice. In the "Publish or ruin" model, the paper restricts most research universities, the equivalent of the academic reward framework, while the time spent on building and writing software tools is not the time to write a paper. This leads to the difficulty of getting a promotion in academia unless it is a particular situation for people who specialize in reusable open software. These poor people, looking forward to achievements in the development of scientific software rather than research papers, often find themselves on the verge of academic groups.

In a way, this fault has been around. Academics always reward certain skills and impair other skills: teaching is a long-term marginalized skill. But the two major differences make the academic fault more worrisome:

The aforementioned skills for building and writing software tools are slipping to the edge of the academic reward framework, which is the key to the success of modern research.

While almost all the world is using intensive data-mining tools, the technology overlooked by academics is the industry's most prized

The storm has led skilled researchers to drift away from research to industry. Although the academic community also has a focus on software work, the basic salary of those jobs is very low, no status, no promotion and opportunities. Industry, by contrast, is far more attractive: it is committed to solving interesting and pressing problems, offering superior salaries and benefits, helping postdoctoral researchers escape from workstations, and even encouraging research and publication of basic topics. It would be a miracle to stay in academia in this situation.

The fields of astronomy and astrophysics I have studied are particularly worrying. The LSST project is preparing for the first goal at the end of the decade: it is extremely aggressive to be able to process 30TB of data per night in real time in ten years. To deal with such a large amount of data, the project may recruit dozens of data-centric astronomical researchers in the coming years. Given the required technology and current pay levels, as well as the development prospects of the academic sector's engineering-oriented work, I doubt that it will attract enough candidates.

How to adapt to the academic circle

I am not the only one considering these problems. I've talked to a lot of people at home and abroad about some of the topics I've talked about, and I've learned that some policy makers and funding agencies are also thinking about these tough issues. But the more realistic question is how to solve these problems and prevent it from worsening. It is common for scholars to complain about the culture of academia, Deidre McCloskey's "Academic prestige law" confirms some viewpoints of this article: the more practical the field, the less status. This was to lament that the basic paper, like a novice's work, was so low that it applied to the current theme.

I think prestige is the key: Academics take prudent steps to catch up with the industry, and the software needed for data-driven research is more prestigious to its developers as a way to solve these problems. Researchers, funding agencies and policy makers can also take action to promote the process. Here are some suggestions:

Academic journals continue to emphasize the importance of reproducibility. Reproducibility is an essential element of the scientific process itself, and it relies on high quality code for open source. Viewing these codes as an important part of the publication of a paper can improve the position of software developers in academic communities.

Promote the establishment of a new standard for tenure teaching evaluation. The new standards also consider the development and maintenance of public software as well as traditional paper publishing and teaching, so that the time spent writing neat public code will no longer be curbed.

Create and finance a new academic employment system to help doctoral graduates, postdoctoral fellows, researchers and tenured professors in employment. Employment positions should place special emphasis on and reward the development of public, interdisciplinary research software, thus providing a viable academic career path for researchers willing to build and maintain generic basic software.

Improve the remuneration of postdoctoral research positions. The proposal may be controversial, but the current level of pay is simply not sustainable. The base salary provided by the NIH post for a newly graduated postdoctoral doctor is below 40,000 a year. Postdoctoral work has risen to 50,000 knives every year for seven years. If the graduates who are proficient in building and maintaining software tools are employed in industry, the salaries will be several times more, and the industry respects their computing power, and they can use these abilities to study the issues of interest to them. I am worried that the academic community will suffer serious obstacles in the next few years if these adjustments are not made in time.

We live in an inspiring era, accelerating the growth of the ability to collect, store, process and learn massive amounts of data, making our scientific understanding of the world more breadth and breadth. To keep this pace of exploring new things, we need to motivate the researchers to be content with the research community. This is not an easy problem to solve, but efforts to ensure that scientific research in the future can be healthy and sustainable development.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.