dS/dN ratio

Nucleotide substitutions in genes coding for proteins can be either synonymous (do not change amino acid), alternatively called silent substitutions, or non-synonymous (changes amino acid).  Usually, most non-synonymous changes would be expected to be eliminated by purifying selection, but under certain conditions Darwinian selection may lead to their retention.  Investigating the number of synonymous and non-synonymous substitutions may therefore provide information about the degree of selection operating on a system.

This program uses the Nei and Gojobori (1986) method of estimating synonymous substitutions, which is an unweighted pathway method (see also Nei, Molecular Evolutionary Genetics, 1987, Columbia University Press, New York.).  The first step in the procedure requires enumeration of the number of synonymous and non-synonymous sites present at each codon, where each site may be both partially synonymous and non-synonymous.

fi is defined as the proportion of synonymous changes at the ith position of a codon.  The number of synonymous sites, s, for this codon is therefore given by

and consequently the number of non-synonymous sites, n, is given by n = 3 - s.  As an example, for codon ATT (Ile), f1 = 0, f2 = 0 and f3 = 2/3 (because two of the possible changes at position 3 do not result in a change of amino acid).  Summing for each position, s = 2/3 and n = 7/3.

For a sequence of r codons, the total number of synonymous sites, S is given by

where sj is the value of s at the jth codon, and the total number of non-synonymous sites, N = 3r - S.

The next stage is to determine the number of synonymous and non-synonymous changes between each pair of aligned sequences, codon-by-codon.   Where there is one nucleotide difference, it is obvious whether the change is synonymous or non-synonymous.  When there are two or three changes, however, the number of possible pathways between the codons increases to two or six respectively.  In the case of two changes, the number of synonymous and non-synonymous differences per codon, s d and nd respectively, add up to 2, with each possible pathway having two steps (to make a total of 4 possible steps).  For example, comparing CTA and GTT, the possible pathways are:

Pathway 1: CTA (Leu) --> GTA (Val) --> GTT (Ile) 2 non-synonymous changes

Pathway 2: CTA (Leu) --> CTT (Leu) --> GTT (Ile) 1 synonymous, 1 non-synonymous change. 

As only one of these steps is a synonymous change, then sd = 0.5 (or ¼ x 2) and nd = 1.5.

With three nucleotide changes, there are six possible pathways, each with three mutational changes.  Considering each of these possible changes, the values of sd and nd are determined in the same way, with s d and nd adding up to 3.  If any of the possible pathways goes via a stop codon, the pathway is removed from the calculation.

Summing for all codons, the total numbers of synonymous and non-synonymous differences, Sd and Nd respectively, are given by


where sdj and ndj are the numbers of synonymous and non-synonymous differences for the jth codon, and r is the number of codons compared.

The proportions of synonymous (pS) and non-synonymous (pN ) differences are estimated by the equations p S = Sd / S and pN = Nd / N, with mean values taken over every pair-wise comparison.

The numbers of synonymous (dS) and non- synonymous (dN) substitutions per site are estimated using the Jukes-Cantor formula as below:


This analysis requires both allelic profiles and the allele sequences to be loaded. Again, individual isolates may be selected or deselected from inclusion using the 'Isolate Selection' dialog reachable from the 'Options' menu. This analysis also requires that the open reading frames are identified for each locus and the program will prompt for this information. If necessary, it will take you to the 'Reading Frames' dialog where these can be auto-detected.

To run the analysis, click 'Analysis ... Tests for Selection ... dS/dN ratio'.

Program Output

Mean number of synonymous sites, S = 96.2
Mean number of non-synonymous sites, N = 335.9
Number of coding sites analysed, S+N = 432

Number of pairwise comparisons made : 190
Mean synonymous substitutions per synonymous site, dS = 0.2280
Std. Deviation (dS) = 0.1117; 95% Confidence Interval (dS) : 0.2121 - 0.2439

Mean non-synonymous substitutions per non-synonymous site, dN = 0.0162
Std. Deviation (dN) = 0.0118; 95% Confidence Interval (dN) : 0.0145 - 0.0179

dN/dS = 0.0710
dS/dN = 14.1