Sheep Genome v3.1
Downloads (right click save as)
Complete genome assembly unmasked as three downloads
To obtain the complete assembly download the following 3 files
Complete genome assembly masked as single download
Includes the unassigned scaffolds and contigs
GFF files of genome annotation
A single Texel ewe and a single Texel ram were sequenced and the 75-fold coverage by Illumina reads of the Texel ewe were assembled de novo into contigs and scaffolds with SOAPdenovo. Then 120-fold coverage Illumina sequences from both the ewe and the ram were used for a round of gap-filling. At this point, the N50 length of contigs was 18 kb, and the N50 length of the assembled scaffolds was 1.1Mb, achieving a total length of 2.64 Gb and leaving 6.9% gaps. This assembly was the previous Oar v2.0. The current assembly, Oar_v3.1, was created with another round of gap-filling by adding 21-fold coverage of GC content unbiased Illumina sequencing data from the male and 3 Gb MeDIP-seq for high GC content sequence from the female to the original datasets. Approximately 200,000 gaps were filled, including about 5000 that were filled using CHORI-243 BAC library sequences and 454 reads generated for the Oar v1.0 assembly (ACIV000000000, BioProject PRJNA33937). Segmental duplicates were identified by Whole Genome Assembly Comparison (WAGC), and their read coverage checked by GC content adjustment. Probable artificial tandem duplicates were identified using the larger insert 454 and BAC libraries and comparison to the UMD3 bovine genome assembly, and one copy of the tandem pairs was removed. The assembly of 775 overlapping scaffold ends were revised and these adjacent scaffold pairs were linked together. Artificial duplicated copies that had been generated by multiple gap-filling steps using gapcloser were removed, and erroneously assembled scaffolds identified during the error checking were manually split. A high-density RH map with 39,042 SNP markers and Ovine SNP50 genotyping linkage data were used to check scaffold integrity and to anchor scaffolds and super-scaffolds to chromosomes, leaving roughly 5700 unplaced scaffolds.
To facilitate annotation of the genome assembly a full update of the assembly is not planned until mid to late 2014. In the meantime we intend to fill large gaps and improve the coverage in the 5' ends of genes. These improvements will be released as sequence patches prior to the release of the Oar v4.0 reference genome assembly.
All projects undertaken by the ISGC are conducted within the public domain. In order to promote public benefit arising from early access to the sheep genome assembly, this interim assembly (OARv3.1) is being made available prior to the publication of an ISGC paper describing global analysis of the sheep genome. This prepublication data release is in accordance with the Bermuda and Fort Lauderdale agreements and the more recent Toronto Statement (Toronto International Data Release Workshop 2009) which provides guidelines for both data generators and data users. The ISGC asks that users adhere to each of these policies and refrain from publishing any global analysis of the sheep genome before the ISGC using consortium data. Global analysis includes the identification of complete (whole genome) sets of genomic features such as genes, gene families, regulatory elements, repeat structures and GC content. Global analysis also includes chromosome wide or whole-genome scale comparisons with other species and reassemblies of the sheep data. Further, users are asked to cite the ISGC .marker paper. (ISGC et al., (2010) Animal Genetics 41: 449 . 453) as the citable reference when using this prepublication data.
Richard Talbot, Alan Archibald (The Roslin Institute) : Illumina GA sequencing of the male Texel
Kim Worley and Richard Gibbs (Human Genome Sequence Center Baylor College of Medicine): 454 sequencing of the male Texel
Wen Wang, Yu Jiang (Kunming Institute of Zoology) and Wenguang Zhang (Inner Mongolia Agricultural University): Contig assembly
Yu Jiang, Brian Dalrymple and James Kijas (CSIRO Animal, Food and Health Sciences): High order assembly
John McEwan, Rudi Brauning (AgResearch): QC of assembly by identification of polymorphism, and alignment with existing independently assembled BACs
Jillian Maddox (ISGC) and Thomas Faraut (INRA): Genetic and Physical mapping
Noelle Cockett (USU) and Hutton Oddy (UNE): Project Coordination
The International Sheep Genomics Sequencing Consortium is grateful to the following for funding support for the sheep genome sequencing project, The Roslin Institute, University of Edinburgh and Biotechnology and Biological Sciences Research Council, U.K.; The Scottish Government, U.K.; Defra/HEFC/SHEFC Veterinary Training and Research Initiative, U.K., USDA-ARS, USA, USDA-NRICGP, USA (grant numbers 2008-03923 and 2009-03.15); USDA-NRSP-8, USA; Meat and Livestock Australia and Australian Wool Innovation Limited through sheepGENOMICS, Australia; Australian Government International Science Linkages Grant (CG090143), Australia; University of Sydney, Australia; CSIRO, Australia; AgResearch, NZ, Beef + Lamb NZ though Ovita, New Zealand; INRA and ANR project SheepSNPQTL, France; European Union through FP7 Quantomics and 3-SR projects; a 973 Program (No. 2007CB815700), 100 Talents Program of Chinese Academy of Sciences and a CAS-Max Planck Society Fellowship to Kunming Institute of Zoology, China; The National Natural Science Foundation of China (30725008, 30960246), Shenzhen (ZYC200903240077A, ZYC200903240078A), the Ole RÝmer grant from Danish Natural Science Research Council, and the Solexa project (272-07-0196) to BGI Shenzhen, China.
For queries and feedback please contact Brian Dalrymple