Oar v2.0 was built using next generation sequence derived from one female and one male Texel. The primary de novo assembly was performed using 75 fold Illumina GA sequence from the female, before the mate pair characteristics of the paired end reads were used to produce scaffolds spanning 2.71 Gb or approximately 91% of the sheep genome (scaffold build v1.2). A further 45 fold coverage of the male Texel was used for gap filling, before scaffolds are anchored onto the 27 sheep chromosomes. Scaffolds that are clearly chimeric were identified by comparison with the bovine UMD3 assembly and manually split in the gap between adjacent contigs mapped to two different bovine chromosomes. Superscaffolds were built from the set of scaffolds and split scaffolds >2 kb in length using the end sequences derived from the male Texel BAC library, CHOR-243 and the predicted locations on OARv1 of SNPs included on the Illumina Ovine SNP50 BeadChip. This was undertaken as a single integrated process and non-congruent BACs and out of position SNPs were minimised. Several rounds of manual checking and final error correction were carried out using the end sequences of the BACs in the bovine CHORI-240 library and 454 mate pair sequence data derived from 8kb and 20kb insert libraries of the male Texel. Ambiguous positions were resolved using the predicted location of the SNPs based on OARv1.0 and conserved synteny with the UMD3 bovine genome assembly. Superscaffolds were initially ordered and oriented into chromosomes using the locations of the SNP in OARv1.0. The positions of the SNPs in the sheep linkage map and the sheep RH map were used to identify remaining errors and to refine the assembly.
OARv2.0 should be considered a working draft release that contains both known and unknown errors and discrepancies. For example, the ISGC is aware of examples where sequence contigs are correctly assigned to a genomic location in an orientation inconsistent with analysis of related species. If you believe that you have identified a discrepancy in the data, please contact the consortium as we work to improve the current release.
Improvements to the interim assembly are underway and the ISGC anticipates a subsequent release (Oar v3.0) will occur in the second half of 2011. We intend to fill gaps, correctly assign to chromosomes more of the currently unassigned sequence, provide data for the 5 million SNP identified and release associated transcriptomic datasets. Oar v3.0 is intended to form the basis of ISGC publications on the sheep genome and is the version to be submitted into public databases EMBL and GenBank.
All projects undertaken by the ISGC are conducted within the public domain. In order to promote public benefit arising from early access to the sheep genome assembly, this interim assembly (OARv2.0) is being made available prior to the publication of an ISGC paper describing global analysis of the sheep genome. This prepublication data release is in accordance with the Bermuda and Fort Lauderdale agreements and the more recent Toronto Statement (Toronto International Data Release Workshop 2009) which provides guidelines for both data generators and data users. The ISGC asks that users adhere to each of these policies and refrain from publishing any global analysis of the sheep genome before the ISGC using consortium data. Global analysis includes the identification of complete (whole genome) sets of genomic features such as genes, gene families, regulatory elements, repeat structures and GC content. Global analysis also includes chromosome wide or whole-genome scale comparisons with other species and reassemblies of the sheep data. Further, users are asked to cite the ISGC .marker paper. (ISGC et al., (2010) Animal Genetics 41: 449 . 453) as the citable reference when using this prepublication data.
Richard Talbot, Alan Archibald (The Roslin Institute) : Illumina GA sequencing of the male Texel
Kim Worley (Human Genome Sequence Center): 454 sequencing of the male Texel
Wen Wang, Jiang Yu (Kunming Institute of Zoology) and Wenguang Zhang (Inner Mongolia Agricultural University): Contig assembly
Brian Dalrymple, James Kijas (CSIRO Livestock Industries): High order assembly
John McEwan, Rudi Brauning (AgResearch): QC of assembly by identification of polymorphism, and alignment with existing independently assembled BACs
Jillian Maddox (ISGC) and Thomas Faraut (INRA): Genetic and Physical mapping
Noelle Cockett (USU) and Hutton Oddy (UNE): Project Coordination
The International Sheep Genomics Sequencing Consortium is grateful to the following for funding support for the sheep genome sequencing project, The Roslin Institute, University of Edinburgh and Biotechnology and Biological Sciences Research Council, U.K.; The Scottish Government, U.K.; Defra/HEFC/SHEFC Veterinary Training and Research Initiative, U.K., USDA-ARS, USA, USDA-NRICGP, USA (grant numbers 2008-03923 and 2009-03305); USDA-NRSP-8, USA; Meat and Livestock Australia and Australian Wool Innovation Limited through sheepGENOMICS, Australia; Australian Government International Science Linkages Grant (CG090143), Australia; University of Sydney, Australia; CSIRO, Australia; AgResearch, NZ, Beef + Lamb NZ though Ovita, New Zealand; INRA and ANR project SheepSNPQTL, France; European Union through FP7 Quantomics and 3-SR projects; a 973 Program (No. 2007CB815700), 100 Talents Program of Chinese Academy of Sciences and a CAS-Max Planck Society Fellowship to Kunming Institute of Zoology, China; The National Natural Science Foundation of China (30725008, 30960246), Shenzhen (ZYC200903240077A, ZYC200903240078A), the Ole RÝmer grant from Danish Natural Science Research Council, and the Solexa project (272-07-0196) to BGI Shenzhen, China.
For queries and feedback please contact Brian Dalrymple