Gene sequencing has a wide range of industrial applications and can be of great help in areas such as prevention of birth defects, detection of hereditary diseases and oncology guidance. In recent years, genomics research has made rapid progress. With its strong research and development capabilities, Huada Gene has become the world's largest genomics research center, and its research results have a wide range of influences around the world.
At a time when the genomics industry is rapidly developing, the cost of gene sequencing has been rapidly declining, jumping from the past sky-high price to the "thousand-yuan (about 144 USD)" era that the public can easily afford. At the same time, a number of killer-level clinical applications have introduced gene sequencing, pulling the number of users to climb up to double. With the development of high-throughput sequencers, the amount of calculated genetic data needs to be stored, exponentially increasing. Gene sequencing has entered an explosive phase, and user and application-scale outbreaks have led to an explosion of data calculations.
With the explosive growth of data in the life sciences, how to acquire, quickly analyze, and safely store such huge data is an urgent problem for researchers, including the challenges faced by the University of China. At the Guangzhou Yunqi Conference Service on November 22, 2017, Huang Zehui, Director of BGI Online Products of Huada Gene, shared the challenges and solutions that Huada Gene faced on this issue.
24 hours, complete the analysis of thousands of human genetic data?
The amount of genetic data managed and stored in a lifetime is quite large, and the rapid retrieval and query of genes provided by BGI involves the process of reducing the dimensionality of genetic data in data analysis. The data-intensive and CPU-intensive computational analysis tasks in the process require high computing power, a wide variety of result files, and unstructured data, which is not conducive to data mining and visualization. At present, multi-sequence sequencing centers are used, and users are widely distributed, and data sharing and transmission are difficult.
Traditional solutions are based on large computing devices and storage device purchases. For cost and server follow-up speeds, BGI Online has moved to the cloud and customized and personalized experiences at the analytical level. Not only does it lower the threshold for data analysis, but users can also perform startup analysis directly on the line.
Fully embrace cloud computing to solve data storage, transmission, analysis, security issues
As a large-scale bioinformatics analysis platform, BGI Online is called the “application market” of the genetic industry. Based on the services deployed on the Alibaba Cloud computing platform, BGI Online has the ability to handle large-scale genomic data analysis more easily.
Based on the genetic data analysis requirements of BGI Online, Huada Gene designed a cloud platform architecture for computing resource elastic scaling, multi-level storage, mass storage computing and data security on the Alibaba Cloud platform.
Through the dedicated line access, it transmits tens of terabytes/day of data in the sequencing center; through platform multi-data center deployment, computing power is placed in the data location, and sequencing is provided for the United States, Europe, and China; and a variety of computing services using Alibaba Cloud are mixed. The data is interoperable through OSS. High-efficiency output gene sequencing via ECS online, mass-sequencing sequencing can reduce costs on a large scale, and MaxCompute achieves MapReduce hour-level sequencing.
In the process of cooperation, Alibaba Cloud provided enterprise-level support services and expert services for Huada Gene to ensure the establishment and operation of Huada Gene's cloud business. Including guiding product selection and use technology, providing APM report and analysis and optimization of corresponding performance; troubleshooting safety hazards of rectification system and special support for task support at peak production. It took only three or four days to export a person's genetic analysis, and now realizes the human dream of reaching a thousand-person genome analysis within 22 hours.
The value of life deserves our efforts
Genomics data is "natural" big data, the value of computational analysis will exceed the sequencing itself, and the combination of cloud computing and big data technology is the industry's need. Yin Wei, CEO of Huada Gene Co., Ltd. said, "Genetic sequencing brings changes to biotechnology, and the value created for life is worth our effort."
Over the years, Huada Gene has accumulated diverse and rich clients from pharmaceutical, scientific research, clinical and personal, and has a deeper understanding of market needs. In the future, Huada Gene will continue to develop different hierarchical structures using accumulated technology and experience, focusing on the development of bioinformatics analysis process and the mining of genetic data. Future applications should be shared across platforms, and cloud computing services continue to provide underlying data storage and compression optimization. In this way, China is bound to provide a strong driving force for China's sustainable growth in the life sciences and bio-industries in the coming decades.