Encode to treasure large data achievement genetic engineering

Last Update:2015-03-17 Source: Internet

Author: User

Keywords Large data these achievements

Tags activity analysis community computer control control panel data functional

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In the 1972, the Japanese geneticist named the DNA fragment that could not encode protein as "junk DNA." However, the "80% Genome is functional" report of September 5, 2012 shows that the "junk" DNA is actually a huge control panel that regulates the activity of tens of thousands of genes and determines the nature of some genes. Without the regulation of these switches, genes will not work properly, and these areas may cause disease in humans. The discovery shocked the entire scientific community!

But for us in the computer field, it's not just the results of the ENCODE project, but the infrastructure that provides the support. The press release also reported that encode generated more than 15TB of raw data, data analysis is more than 300 years of execution time! This may not be a big deal for a company that makes a living with large numbers.-facebook published day processing data more than 500TB. But remember, encode's data is shared and accessed in the scientific community!

When we try to build large savings and organized data, the ENCODE project is worth our reference. It's not just a few new genetic material truths--or a global partnership that takes 32 laboratories, collects and completes more than 1600 experiments on more than 147 tissue samples, and then uses the data to harvest more of the discoveries.

In a recent report from encode. UCSC Genome Browser Project director Encode Data Coordination Center chief Jim Kent has unveiled some problems. These challenges come from determining that experiments are independent, valuable, and that they can still produce accurate data.

Kent and his coordinating group on Biomolecular Science and engineering data (located at the University of California's Santa Cruz Center) have presented many challenges to the project's size. First, they must coordinate some of the scientists who produce data from around the world. Kent also said that we had 5 data harvesters traveling among laboratories, and that the 1-week-old 4-hour conference calls were supplemented by 1-year two-large meetings and countless e-mails and internet calls.

Data and activity processes QA management challenges are the same. Sultan M.meghji, vice president of the Appistry company for genetic data management, says most people are committed to managing data so that it stays up to date.

The project also uses a large dataset. The researchers also developed the results analysis tools. This includes database Haploreg and Regulomedb designed to track the details of genetic analysis. There is also a pre-configured virtual machine that provides hosting and analyzes the data generated by the project. Of course, the data will be open to the researchers, who also encourage interested people to actively learn how to use the data and provide them with a portal site.

With the development of information technology, the interoperability of the world has been significantly improved, and in the past, the laboratory-Unit science and technology research will gradually be replaced by organized multi-laboratory research. With the development of cloud computing, the improvement of large data and the advent of new technologies, these existing problems will be solved by each!

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More