Biomedical gene data: current status and Prospects

Source: Internet
Author: User
Keywords nbsp large data biomedical real time

Biomedical data cover a wide range of areas related to human health: clinical Medicine, public health, pharmaceutical Research and development, medical markets and costs, individual behavior and emotions, human genetics and genomics, social demography, environment, health networks and media data.

Large data (Bigdata) is a dataset that cannot be crawled, managed, stored, retrieved, shared, transmitted, and analyzed by conventional software for a certain period of time due to its capacity too big and too complex.

Large data has a "4V" feature:

1. Data capacity (Volume) large, often in the PB (1pb=250b) level above;

2. Many types of data (produced) often have different data types (structured, semi-structured and unstructured) and data sources;

3. Generate and update speed (velocity) fast (such as real-time data stream), the timeliness requirements are high;

4. Scientific value (value) is large, although the use of low density, but often contains new knowledge or important predictive value.

Mankind has entered the age of the NPC data. The results of the international data Company show that the amount of data produced globally in 2011 is as high as 1.82ZB. In May 2012, the United Nations issued a white paper on big Data and human development: challenges and opportunities, pointing out that large data is a historic opportunity to use the most abundant data resources to carry out unprecedented real-time analysis of the socio-economic and to help governments better respond to social and economic operations.

Big data is being taken more and more seriously. Many colleges and universities in Europe and America have set up data scientific research institutions and opened data science courses. Nature and Science also launched a large data issue in 2008 and 2011 to discuss the challenges posed by large data. As one of the most active areas of scientific research, large data in the biomedical field has attracted much attention.

Biomedical Large data sources:

The following factors contribute to the emergence of large data in the biomedical field.

1. The integrity of life and the complexity of disease. For example, chronic diseases, which seriously threaten human health, are complex diseases, which have complicated genetic and molecular mechanisms and are influenced by genes, environment and their interactions, and their etiology studies will produce a lot of data.

2. The development of high-throughput technologies and the decline in genomic sequencing costs. High-throughput sequencing techniques allow simultaneous sequencing of millions of of DNA, making it possible to conduct a detailed and comprehensive analysis of a species ' transcription group and genome. With the completion of the Human Genome Project and the rapid development of computational power, sequencing costs for each genome have been reduced from millions of to thousands of dollars (and will continue to decline). This will result in massive sequencing data.

3. Hospital informatization and the rapid development of IT industry. Human body itself is a major source of biomedical data, with the rapid development of hospital information and IT industry, more and more human data can be stored and used. For example, X-ray, 3D, X-ray, 3DCT scans included 30M, 150M, 120M and 1G data volumes, to 2015 the average U.S. hospital needs to manage 665T data volumes.

4. Large biomedical data cover a wide range of areas related to human health: clinical Medicine, public health, pharmaceutical Research and development, medical markets and costs, individual behaviour and emotions, human genetics and genomics, social demography, the environment, health networks and media data (table 1).

Biomedical Large Data applications:

1. To carry out the study of the association between the group study and the different groups. From the environment, individual lifestyle behavior and other exposure group, to individual cell molecular level of genomics, apparent group, transcription, proteomics, metabolic group, Macro genomics, and then to individual health and disease status of Phenotypic group. Using large data to synthesize and integrate various groups, it can not only provide a comprehensive and new understanding for the occurrence, prevention and treatment of disease, but also facilitate the development of individualized medicine, that is, through integrating system biology and clinical data, we can predict the risk and prognosis of individual diseases more accurately, and implement prevention and treatment accordingly.

2. Rapid identification of biomarkers and drug development. By using the group data of the population of a certain disease, the biomarkers of disease occurrence, prognosis or therapeutic effect can be quickly identified. In the field of drug research and development, large data have led to a deeper understanding of the etiology and pathogenesis of disease, thus helping to identify biological targets and drug development, and to accelerate the drug screening process by making full use of mass-group data, research data on existing drugs and high-throughput drug screening.

3. Rapid screening of unknown pathogens and detection of suspected pathogenic microorganisms. By collecting unknown pathogen samples, the pathogen was sequenced and the genetic sequence of the unknown pathogen and the known pathogen was compared to determine its origin and the closest pathogen type, and then the source and the route of transmission, the drug screening and the corresponding disease prevention were deduced.

4. Real-time biological monitoring and public health monitoring. Public health monitoring includes communicable disease surveillance, chronic non-communicable diseases and related risk factors monitoring, health-related monitoring (such as birth defects monitoring, food safety risk monitoring, etc.). In addition, it is possible to monitor the prevalence of certain infectious diseases by monitoring social media or frequently retrieved entries through an epidemiological database of patient electronic records covering the whole country.

For example, Googletrends can predict influenza in some areas before the increase in hospital emergency flu patients by looking for "flu symptoms" and "flu therapy" spikes in search terms.

5. Understand the changes in the population disease spectrum. This helps to develop new strategies for disease control. The global disease burden research is an example of applying large data, which has a wide range of data and a large amount of data, and nearly 4700 parallel desktop computers have completed the automation and normalization of data preparation, data Warehouse establishment and data mining analysis.

The study of China found that: compared with 1990, in 2010, the first 25 causes of life loss in China, chronic non-communicable disease significantly increased, infectious disease was significantly reduced, indicating that chronic non-communicable disease has become a major threat to the health of our population.

6. Develop health management in real time. The real-time and continuous monitoring of individual physical signs data (heart rate, pulse rate, respiration frequency, body temperature, heat consumption, blood pressure, blood glucose, oxygen, body fat content, etc.) through wearable devices provides real-time health guidance and advice to better implement health management.

7. Implement more powerful data mining. The tasks of data mining include correlation analysis, clustering analysis, classification analysis, anomaly analysis, etc. Large data mining can increase the ability of grasping and discovering weak correlation.

Biomedical-related large data programs:

Table 2

Major problems and trends in biomedical data:

As an emerging area, big data is also controversial:

1. Since data is always increasing, is it necessary to distinguish between large and traditional data?

2. Is large data more likely to be a commercial propaganda?

3. The variable type is more and more complex in large data, and the probability of obtaining false positive correlation increases with the increase of variable.

4. Larger data do not necessarily imply better data, and data representation and data purity must be taken into account;

5. Does it meet ethical requirements to use data from the population without informing the individual? These controversies are the big data that must be paid attention to in the future development.

From an epidemiological point of view, biomedical data has the following advantages:

1. With the characteristics of large samples, can solve the epidemiological study of sample size problem, large samples can improve the results of high precision, reduce random/sampling errors;

2. Objective methods of acquisition can reduce information bias. The collection of large data is often more objective, and can record individual behavior dynamically, which can reduce information bias compared with traditional epidemiological survey by asking and recalling certain behaviors.

However, in contrast to traditional probabilistic random sampling, large data may have selective bias, and its collection approaches often cover people with certain characteristics (such as Medicare patients, people using wearable equipment).

Major problems in biomedical data

1. How to achieve standardization and normalization of biomedical data. Data standardization is the prerequisite of data sharing, only the standardized data can be effectively fused and integrated, thus the value of large data is played.

2. How to break the data island and realize the biomedical data sharing. Data sharing is a prerequisite for the application of biomedical data, which should be avoided only for minorities or units. Many public funding agencies have begun to demand that the data for the funded research be shared within a certain range.

3. Storage and management of biomedical large data. In the biomedical field, the data is very large, and the production and updating speed is faster, and its storage mode not only affects the efficiency of data analysis, but also affects the cost of data storage.

4. How to achieve the efficient use of biomedical data. China has accumulated a huge amount of biomedical data, how to use is the key, which to a certain extent also depends on the development of large data technology.

5. Biomedical Large data analysis, integration and excavation. In particular, the treatment of semi structured and unstructured data (such as ECG, medical imaging data) and convection data (real-time video, sensor data, medical equipment monitoring data) is an important challenge for biomedical data analysis.

6. Lack of complex talents in biomedical and information sciences. This is a difficult situation for biomedical data at home and abroad, which needs to be solved by the education of computer Science and biology interdisciplinary.

Future development trend of biomedical large data

1. From "concept" to "value" as the basis of "intelligent health". Large biomedical data will generate new knowledge, change medical practices with information, and ultimately improve human health and public health.

2. Integration, transformation and evidence-based scientific evidence generation of medical scientific evidence. Biomedical data are useful for the production of evidence-based scientific evidence, such as the consolidation of large numbers of health data through big data, and more reliable evidence, and the production of "virtual clinical trials" through real-time data on the Web.

3. Data security and Privacy protection technology development. In the mining of massive data, privacy disclosure is a huge risk. Data security and privacy protection are paid more and more attention, relevant policies and legislation need to be strengthened, the corresponding technology development will play an important role.

4. Large data-oriented cohort research has become a hotspot. Large-scale cohort studies (hundreds of thousands of people), forward-looking (decades of long-term follow-up), multidisciplinary (basic, clinical, preventive, information, multidisciplinary collaboration), multiple diseases (capable of studying multiple diseases), multiple factors (able to explore multiple risk factors), integration (monitoring systems, information systems, The integration of health insurance systems, sharing (the sharing of biological specimens and data resources) and so on, through long-term follow-up can produce a large number of population data.

5. Visualization of biomedical large data. Visualization is closely related to information image, information visualization, scientific visualization and statistical graphics, and it can communicate more clearly and effectively the information contained in large data.

6. Individualized health management based on biomedical data is becoming more prevalent. On the one hand, real-time and continuous health monitoring and assessment can be performed on the individual by using real time sensors (wearable equipment) to provide real-time health guidance to individuals; On the other hand, individualized prevention, diagnosis and treatment will be achieved with the development of individualized medicine based on large biomedical data.

7. Biomedical large data become strategic industries. Many countries have already raised big data as a national strategy, and the industrialization of large biomedical data is already underway.

Prospect:

Mankind has entered the age of the NPC data. Large data science, as a new interdisciplinary subject which spans the fields of information science, social Science, Network science, System Science, biomedicine, psychology, economics and so on, is gradually forming and has become the hotspot of scientific research.

The biomedical field has massive data, how to share, standardize, manage and utilize is the key. At the same time, the cultivation of biomedical large data professionals needs to be solved urgently. Biomedical data will change the medical practice model, improve the quality of medical service, and ultimately help to achieve individualized treatment and group prevention of medical purposes.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.