Three-generation sequencing articles

Source: Internet
Author: User

Long-read sequence assembly of the gorilla genome

http://science.sciencemag.org/content/352/6281/aae0344

Insights into hominid evolution from the gorilla genome sequence

Http://www.nature.com/nature/journal/v483/n7388/full/nature10842.html#methods

PacBio at AGBT

Http://www.bio-itworld.com/2015/3/3/pacbio-agbt.html

Defining a personal, allele-specific, and Single-molecule long-read transcriptome

http://www.pnas.org/content/111/27/9869

Single-molecule sequencing of the desiccationtolerant grass Oropetium thomaeum

The diploid genome Sequence of an individual Human

Http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.0050254#s4

Current Status of knowledge and the perspectives in Korean Pear Genomics

Http://www.plantbreedbio.org/journal/view.html?uid=231&vmd=Full

Useful, but just a conference paper

Https://pag.confex.com/pag/asia2015/webprogram/Paper18227.html

Company

http://aa314.gondor.co/webinar/brett-hannigan-simplifying-de-novo-assembly-with-pacbio-tools-available-on-dnanexus-falcon/

Company

http://www.pacb.com/tag/falcon/

Data Release: ~54x long-read Coverage for pacbio-only De Novo Human Genome Assembly

http://www.pacb.com/blog/data-release-54x-long-read-coverage-for/

Toward Platinum Genomes:pacbio releases a New, higher-quality CHM1 Assembly to NCBI

http://www.pacb.com/blog/toward-platinum-genomes-pacbio-releases-a-new-higher-quality-chm1-assembly-to-ncbi/

At SFAF, great science and high-quality Genomes

http://www.pacb.com/blog/at-sfaf-2014-great-science-and-hig/

Planning, Running, and understanding the FALCON genome Assembly pipeline Video

Https://speakerdeck.com/pacbio/planning-running-and-understanding-the-falcon-genome-assembly-pipeline

Https://www.dnanexus.com/falcon Corp.

by bio-it

March 3,| The advances in Genome Biology & Technology conference wrapped up over the weekend on Marco Island, Florida, afte R four days of presentations from the front lines of genome analysis. With less than the usual amount of razzle-dazzle on display in this year's product launches, the event was stolen by some Outstanding scientific achievements pulled off with existing platforms. Pacific Biosciences, this year's gold sponsor, highlighted several of these in a star-studded workshop Friday afternoon to Show off the feats that can is accomplished with its SMRT (single molecule Real time) sequencers, the instruments of Choi Ce for recovering long-range structural information on the genome. Speakers included J. Craig Venter, who runs the world's largest genome sequencing center at he company Human Longevity, I NC., and is best known-competing with the Human genome Project to produce the first whole Human genome sequence; Deanna Church, who had helped shape improvements to the humanReference genome in hers work with the genome reference Consortium; and Gene Myers, one of the world's premier Bioinformaticians and co-author of the Foundational Genome analysis tool BLAST.

In a piece the January looking back to last year's milestones in genomics, bio-it World wrote that Coul D be looked at as the year of PacBio, when the [midsize] company proved there is, the market for a pricier instrum ent that won ' t flinch at the high GC coverage, large indels, or de novo assembly. " The present moment might eventually come to being seen as the peak of PacBio ' s powers, a window in which the company was Trul Y producing the most comprehensive, highest-quality genomes money could buy.

PacBio ' s commercial future was murky:companies like 10X Genomics be toying with more affordable ways to get reliable long -range genomic information, and if Oxford Nanopore gets a handle on its error rates and releases the Production-scale Prom Ethion, they ' re likely to undercut PacBio on price while delivering the same top-of-the-line features. But whatever it market prospects, scientifically PacBio are driving some of the most innovative sequencing projects going on today. Among other accomplishments, the PacBio workshop at AGBT presented multiple users ' de novo Assemblies of whole Hu Man Genomes-until very recently, a vanishingly rare type of project because no high-throughput instrument could deliver The type of data needed to put together a whole human genome without aligning reads to a reference genome.

De Novo Assemblies as a commodity?

Today, the very presence of SMRT sequencers on the market have encouraged Bioinformaticians to build a whole suite of a Alytical tools to deal with Multi-kilobase reads. As the AGBT Workshop made clear, PacBio users now has something like a standard pipeline for going all the the-the-from Raw R EADS to a whole genome. A Typical workflow might use Gene Myers ' Daligner-Find local alignments between reads, FALCON for assembly, and Quiver For variant calling. As PacBio CEO Mike Hunkapillar announced in he opening remarks, DNAnexus recently used this daligner-falcon pipeline to C Reate a new diploid assembly of J. Craig Venter ' s genome, following a sequencing effort that took less than a month to Gen Erate all the required raw data on SMRT instruments.

Diploid assembly, correctly distinguishing between the maternal and paternal copies of each chromosome, is the gold STA Ndard for a full genome sequence. This ability sets FALCON assemblies apart from even the human reference Genome-which, as Deanna Church memorably pointed Out in her own presentation, had historically included "Franken-alleles" stitched together from different copies of the S Ame chromosomes.

DNAnexus also appears to has set a world record for the fastest human genome assembly last week, patching together the Genome of a peculiar breast cancer cell line, sk-br-3, in less than hours. The process wrapped up at the Friday morning, just in time for a shout-out at the workshop from W. Richard McCombie o f the Cold Spring Harbor Laboratory. DNAnexus'll now being making this workflow available to all customers through its cloud-based informatics service, offering Rapid assembly to any labs with the sequencing capacity to drive through enough PacBio reads.

All the starting to make de novo assembly look less like a titanic enterprise, and a little more like a co Mmodity. Venter, giving the first talk at the workshop, revealed plans to produce a extraordinary new reference genomes at Huma N Longevity, Inc., combining, SMRT sequencers with his bank of Ultra-high-throughput Illumina hiseq X instruments. "I ' m delighted with the focus I ' m hearing here, on getting back to assembled genomes," said Venter. "If we ' re going to understand each of our genomes, we need to do de novo assembly."

The collection of new Reference-grade assemblies at Human longevity isn ' t just a matter of showing off; getting new ref Erence genomes from donors with diverse ethnic and geographical backgrounds would help with all future interpretation of LA Rge structural variants, which differ widely between human populations and is difficult to square with a single reference Assembly. (sadly unmentioned were whether and when Human longevity might share their reference genomes with the wider scientific commun ity.)

Venter, of course, have a knack for thinking big. His reference assemblies would represent just a small fraction of the one million whole genomes he intends to sequence B Y 2020. In him presentation, Venter even spoke glibly about the pace at which he hopes to see his massively expensive bank of SEQU Encers (an investment in excess of $21 million) become obsolete, based on the historical trend toward Ever-cheaper Sequenc Ing. "We ' re counting on $ genomes in three or four years, and hopefully we can truck away to the dumpster all the machines we has [now], "Venter said.

Many of our readers should also being interested to hear, Venter casually mentioned looking to hire around new Bioinf Ormaticians in 2015.

The second speaker, Gene Myers, was also keenly interested in the possibilities PacBio have opened up for relatively straig Htforward de novo assembly. Myers spent many years in the "s more" or less out of the limelight, reportedly because he is dissatisfied with the in Dustry ' s trend toward using short-read sequencers and reference alignment for most applications. However, he reemerged at AGBT last year, after a conversation with Hunkapillar in which Myers learned that SMRT sequencers Deliver long reads with both random sampling of the genome, and random, unbiased error rates at any point in the genome.

"As a mathematician, when Mike used this word ' random ' in those II places I got incredibly excited," said Myers at this y Ear ' s workshop. "Because I understood, from theory alone, and that's meant is immediately that perfect assembly is back on the table .”

Since then, Myers had been hard in work making perfect assembly a reality. In addition to building Daligner, he had also started work on a new tool called Dascrub, which is a major focus of his wo Rkshop presentation. The purpose of Dascrub is to clean up raw PacBio reads, which be error-prone and vulnerable to sequencing artifacts, with Out sacrificing valuable data. Myers presented an E. coli assembly produced with 30x coverages of the sample that produced a complete circular ge Nome without requiring any correction steps between running Daligner and performing full assembly, except for using Dascru B to clear out artifacts.

Key Genomes

None of these advances in de novo assembly would do much to advance the if we don t choose samples that TRU Ly has something to teach us. The last three speakers at PacBio's AGBT workshop rounded out the afternoon with some compelling applications for this BUR Geoning technology.

Deanna Church, formerly of the National Center for biotechnology information and now Senior Director of Genomics and Co Ntent at Genetic diagnostics company Personalis, GKFX thoughts on using long-read data to update the human reference Genome, and in particular to deal with regions of high structural complexity and large differences between human Haplotyp Es. This is a subject Church have spoken about with bio-it World before-in fact, in Hunkapillar ' s opening remarks he Quoted an interview we-ran with Church in April, in which she said that "if we is truly going to being successful in H Aving Genomics affect clinical medicine and we want to understand variation within individuals, we have the de Nov o assembly. "

At AGBT, Church noted, the reference genome was essential even when working with de novo assemblies, both as a Resource for calling variants, and as a coordinate system for describing those. That means missing or confounded sequence in the reference can cause problems for interpretation no matter how scrupulous A new genome May.

Church touted the addition of many alternate loci in the latest update to the human reference genome, which allow Genetici STS to consider multiple "paths" through variable regions. She also urged Bioinformaticians to update their tools to take these alternate loci into account, something that few group S has done to date. "In aggregate, these alt loci contribute a additional 3.6 megabases of novel sequence that contain 153 unique genes," sai D Church. "So if" is not a using these sequences in your analyses, you is missing part of the exome, and you is missing some imp Ortant sequence. "

At the same time, Church acknowledged, the patchwork of alternate loci, the "The long term" is "the most efficient To represent large structural variants across the genome. In a question-and-answer session, she mentioned the Global Alliance for Genomics and Health, which was working on an altern Ative-represent chromosomal positions as a branching graph that spans an entire chromosome. "I think this movement-graph-based representation is really the" the "the-go," she said, "because it allows u s to represent the complexity in a much more natural. While Church expects it-to-take some time before this structure are ready-to-be as widely adopted as the current standards For representing genetic variation, she does say that the alternate loci provide a ' graph-lite ' approach in the current hum An reference assembly.

The fourth speaker, Jeong-sun Seo of Seoul National University and Macrogen, presented on a critical new resource for G Enomics, a diploid assembly of a whole Asian genome. "We have to consider seriously ethnic differences for personalized medicine," Seo reminded the audience. Ultimately, Seo ' s work on the This new assembly, of a genome donated by an Altaic Korean individual, was meant to support an as Ian Genome Project recruiting patient volunteers for whole genome sequencing across South Korea, Japan, China, and Mongolia.

Like Human longevity, Macrogen have a bank of hiseq X instruments and have been using a cross-platform approach to genera Ting new reference assemblies. Interestingly, Seo mentioned that he team is also using a Irys device from Bionano, which uses fluorescent markers to MA P out very large structural variation on the order of hundreds of kilobases. In a interview with Bio-it World , Bionano CEO Eric Holmlin recently told us that the Irys have been paired with S MRT sequencing but declined to reveal more details; Seo ' s presentation offers at least one example of both techniques for getting long-range genomic information being used in Parallel.

Highlighting the magnitude of difference between the Korean assembly his group performed and the standard reference Gen ome, Seo noted that on chromosome alone, he is able to pinpoint nearly-structural variants, totaling over kilo bases inserted or deleted relative to the reference. He also shared one example of a phenotypic difference that appears to being traceable to one of the these structural variants, an 8-kilobase insertion in the ninl gene related to pigmentation. " Ninl is the most significantly differentially expressed gene between Asians and Caucasians," Seo observed, a FAC T that can likely is attributed to this large insertion. Other structural variants that differ widely between ethnic groups is likely to has direct relevance to health and Disea SE risks.

The final speaker was W. Richard McCombie, whose own assembly of interest is the previously-mentioned sk-br-3 cell line, Collected from a her2-positive case of breast cancer. The sk-br-3 genome is profoundly disordered-so much so that hunkapillar, introducing McCombie's talk, said that looking At this genome, "What's wonder in the heck is this thing alive?"

McCombie, much like Myers, believes that short-read sequencing have been a mixed blessing for the genomics community, offer ing more data than ever before is at the cost of distracting researchers from profoundly important sources of variation. He quoted Evan Eichler's term "The Seduction of Next-gen sequencing," which he called "very appropriate. You can get really good SNP data from a very large number of individual genomes ... but do miss ... a lot of the structural Variants. "

Turning to the sk-br-3 genome, McCombie showed some detailed data, derived from SMRT sequencing, on complex Translocati ONS between chromosomes 8 and, which occurred across multiple different sites on both chromosomes. With more precise information in precisely how these regions is arranged, which Translocations has undergone inversion, And the complete sequence of gene Fusions, McCombie's team is now trying to reconstruct the precise history of the STRUCTU RAL events that has produced the sk-br-3 chromosome, particularly at the locus where the HER2 gene resides. Happily, McCombie announced, all of his data in this genome are publicly available online, and that he'll soon be releas ing methylation data as well-something the can be recovered routinely off SMRT sequencers.

PacBio is still very much a niche player in sequencing, and with a notably lower throughput and higher costs than its C Ompetitors, that's unlikely to the change any time soon. Nonetheless, the company have done a remarkable job drawing attention to features like haplotypes and structural variants T Hat cannot is captured by Short-read sequencing. While the genomics community never really forgot about these factors, they has been shortchanged D cheaper data in the Next-generation sequencing era.

Today, it seems possible that projects like those presented at PacBio ' s AGBT Workshop is just the leading edge of a CULTU RAL shift in genomics toward full representations of genomic variation and more routine use of de novo assembly. The full force of that shift would have a to-wait for technology this brings long-read data in reach of the average user. But whether this comes from the future PacBio instruments, a new contender like Oxford Nanopore, a parallel platform like 10X Genomics, or a combination of all three, this year's AGBT demonstrated that the groundwork have been laid to make the best Use of the This data once we have it.

Three-generation sequencing articles

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.