Those SNP candidates that are displayed as "low quality SNPs" represent ANY base pair change in the clustered sequences. These low quality SNPs are funneled through quality control step to attempt to differentiate putative SNP from sequence errors. Those remaining are displayed as "putative SNPs". Of these putative SNP candidates, if any would cause an amino acid change at that site they are displayed as "Amino Acid Changing SNPs".
Each of these three categories of SNPs is further divided into two subcategories. If the sequence variant is only in one sequence of the multiple sequence alignment file, the SNP candidate is denoted with a light grey background as shown above. If the same sequence variant is in more than one sequence then it is called a >1 (greater than one) SNP and the SNP candidate is boxed in black. It is this type of putative SNP that has the best chance of being real. Estimates indicated that approximately 50% of putative >1 SNPs are real.
This SNP analysis has been undertaken at the level of the contig ct# and thus is an underestimate of the true number of SNPs in the data set. Judicious analysis at the cluster level will enable more SNPs to be identified. The estimate of SNPs that change an amino acid is also an underestimate. The amino acid differences are currently generated from BLAST with the human refseq. If there is a hit, then the alternate base can be tested for amino acid changes. BUT if there is no refseq hit, then there is no protein sequence to examine and an amino acid change cannot be identified.
The validation of SNPs within IBISS was accomplished by resequencing regions of 39 genes as well as from parsing variation information from 41 SNP sequences from the public domain. These sequences were compared to all IBISS consensus sequences to reveal the percentage of predicted SNPs that are genuine. As of this writing, 53% of the putative 1 SNPs, and 35% of the amino acid changing 1 SNPs appear to be genuine.