In the 1972, the Japanese geneticist named the DNA fragment that could not encode protein as "junk DNA." However, the "80% Genome is functional" report of September 5, 2012 shows that the "junk" DNA is actually a huge control panel that regulates the activity of tens of thousands of genes and determines the nature of some genes. Without the regulation of these switches, genes will not work properly, and these areas may cause disease in humans. The discovery shocked the entire scientific community!
But for us in the computer field, it's not just the results of the ENCODE project, but the infrastructure that provides the support. The press release also reported that encode generated more than 15TB of raw data, data analysis is more than 300 years of execution time! This may not be a big deal for a company that makes a living with large numbers.-facebook published day processing data more than 500TB. But remember, encode's data is shared and accessed in the scientific community!
When we try to build large savings and organized data, the ENCODE project is worth our reference. It's not just a few new genetic material truths--or a global partnership that takes 32 laboratories, collects and completes more than 1600 experiments on more than 147 tissue samples, and then uses the data to harvest more of the discoveries.
In a recent report from encode. UCSC Genome Browser Project director Encode Data Coordination Center chief Jim Kent has unveiled some problems. These challenges come from determining that experiments are independent, valuable, and that they can still produce accurate data.
Kent and his coordinating group on Biomolecular Science and engineering data (located at the University of California's Santa Cruz Center) have presented many challenges to the project's size. First, they must coordinate some of the scientists who produce data from around the world. Kent also said that we had 5 data harvesters traveling among laboratories, and that the 1-week-old 4-hour conference calls were supplemented by 1-year two-large meetings and countless e-mails and internet calls.
Data and activity processes QA management challenges are the same. Sultan M.meghji, vice president of the Appistry company for genetic data management, says most people are committed to managing data so that it stays up to date.
The project also uses a large dataset. The researchers also developed the results analysis tools. This includes database Haploreg and Regulomedb designed to track the details of genetic analysis. There is also a pre-configured virtual machine that provides hosting and analyzes the data generated by the project. Of course, the data will be open to the researchers, who also encourage interested people to actively learn how to use the data and provide them with a portal site.
With the development of information technology, the interoperability of the world has been significantly improved, and in the past, the laboratory-Unit science and technology research will gradually be replaced by organized multi-laboratory research. With the development of cloud computing, the improvement of large data and the advent of new technologies, these existing problems will be solved by each!
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.