The era of "big data" has come. At present, large data has become a major disruptive technology revolution in the IT industry after cloud computing and Internet of things. With the popularization of technology, large data has been applied in the fields of biology, finance, retailing, energy, transportation and so on, which is permeating all aspects of people's life.
Compared with other fields, China is consistent with the foreign starting point in large data, it can be said that the large data age is also a new development opportunity in our country. To this end, this newspaper will open a column to introduce the application of large data in different fields and the difficult problem of China's large data development, and readers to meet the arrival of large data era.
Although China's data production is very large, but the biological data is still relatively backward compared with foreign countries.
World Cup forecasts, college entrance exams, what kind of men are most popular today ... These familiar analyses use large data, but few people know that large numbers have already reached the "hands" in the biomedical field and started the disease prediction.
countries in Europe and America attach great importance to the development of biological field. In March this year, Britain announced that the British Medical Research Council (MRC) would invest 32 million pounds to fund the first 5 major projects to improve the capacity, capacity and core infrastructure of medical bioinformatics. The "Medical Bioinformatics Program", which expects a total investment of £ 50 million, will address key medical problems by creating new ways of coupling complex biological data and health records.
as early as March 2012, the Obama administration announced a "Big Data research and development program" that raised big data to a national strategic level, promising to invest more than $200 million trillion. 2014, the United States government on how to make full use of biomedical large data, but also launched the DA data to Knowledge program.
Although our country enterprise also frequently has the movement in the big data, but the Shanghai Biological Information Technology Research Center Director, the Chinese Academy of Sciences Shanghai Life Science Research Institute Biological Information Center Director Li Yicho to "China Science newspaper" The reporter pointed out: "Although the domestic big data in other direction development far exceed abroad, But the big bio-data is still relatively backward compared with foreign countries. ”
No free lunch
Biological large data development is lagging behind abroad, not because there is no data quantity.
According to Chen Runsheng, an academician of Chinese Academy of Sciences, one of the earliest researchers engaged in theoretical biology and bioinformatics research, the gene sequencing mechanism represented by Huada gene has made an important contribution to the production of large biological data.
The current sequencing of is about 40% of the international data, and with the cost of technology development and sequencing, almost all research universities and research institutes, whether in agriculture, forestry or medicine, have involved genetic sequencing.
"The development of large-scale biological data at the genome level has led to the mass production of related biological data at various levels, such as proteome, metabolic Group and bio-network, but our country has not yet established a unified bioinformatics Information center." "Chen Runsheng points out.
Li Yicho also believes that the most fundamental reason for the large data lag abroad is that there is no large comprehensive biological database in China and no large data center platform for Biology.
This means that when we carry out research, we can only "seek help" from large databases abroad. Although these large databases claim to be free to share data as long as they submit applications, there is no free lunch.
reporter learned that some large international biological databases require data users to submit detailed data use instructions, even if the database has been submitted by our scientists the amount of data, but it is not easy to take out the use of it.
Li Yicho said: "Free sharing of data only theoretically, the actual core data will not be submitted to the applicant in time, the experts will find that the data are non-core or incomplete." The large database related to clinical medicine has a special committee to review data applicants, and if they refuse to apply, they will not give a reason. ”
therefore, in order to have the qualifications to get the data, our scientists often need to repeatedly prepare the application materials, but often no further below. "Controlled, very annoyed. Li Yicho exclaimed.
Building big platform meets "difficult problem"
The establishment of national biological database, often with public interest, the need for long-term and stable investment and professional technical team. According to experts, in the 90 's, the relevant departments of our country have considered the establishment of national level data center, but until now, no specific implementation plan has been discussed.
So, why has the data centre been delayed?
Li Yicho to reporters, Britain and the United States to the establishment of the database invested a lot of money, the introduction of the most talented people have a doctorate, the annual costs are supported by congressional appropriations, the source is more stable, about 100 million dollars a year to invest.
"If we set up a similar biological large data center, we should also have hundreds of people and a long-term stable investment of hundreds of millions of yuan a year." If the money is paid by the Government, the obstacles are still not small. "Li Yicho analysis.
Chen Runsheng also points out that the establishment of a national-level data platform is common in the scientific community, but there are many different scenarios for the location of the center, the way it is presented (entity or virtual), what it encompasses, the scope of rights involved, and how it is managed and managed by WHO.
in such a situation, the inability to concentrate, unified deployment of biological data in the country has become a difficult problem.
"With the genome data to talk to China, with protein data to discuss with other people, small data centers, although there are, but are their own, there is no unified coordination of management." Without a national biological data center, it cannot be co-ordinated. Chen Runsheng the difficulty.
at the same time, China's biological data, although the output is large, but the utilization rate is far from enough. Although the resulting data provides good information, there are many aspects to discovering its full value. If the amount of data is large, it is a problem to analyze and excavate its value in a short time.
Moreover, large data generation requires corresponding theories, techniques and methods to follow up, and new tools and methods are needed. Industry insiders pointed out that the domestic existing large biological data analysis capacity, although not with the United States and Europe, but in the data analysis framework, software systems and advanced it technology to be upgraded.
rooted in China's "soil"
Li Yicho pointed out that the large biological data, although faced with these obstacles, but can rely on "grasp the target, strong cooperation" to catch up with the international footsteps.
in his view, the advantage of China's development of large biological data is that there are massive samples.
What the should do now is to protect the domestic bio-data resources and find valuable ways to use them.
"One is to clarify the development of large biological data goals and channels, and the second is open minded and domestic research units to work together to do a good job of biological data." Li Yicho stressed.
For example, the current medical Union project in China has made some achievements in Shanghai area.
hospitals and community hospitals have connected the medical records and health files of the residents, and established the database system, including 34 million electronic medical records and archives.
Li Yicho said that after obtaining the patient's consent to call the health files and examination results, not only improve the efficiency of the doctor, without prejudice to privacy protection, such as small drops of data can eventually converge into an ocean, beneficial to large biological data this national strategic resources protection and utilization.
Chen Runsheng also pointed out that the development of large biological data should be down-to-earth, rooted in China's "soil", mining its own characteristics.
"China's species has a variety of characteristics, it is entirely possible to form a unique biological large data system." In such a situation, to look at the problem, the development of large biological data is more purposeful, this is what we should pay attention to. "Chen Runsheng said.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.