*d*S/*d*N ratio

Nucleotide substitutions in genes coding for proteins can be either synonymous (do not change amino acid), alternatively called silent substitutions, or non-synonymous (changes amino acid). Usually, most non-synonymous changes would be expected to be eliminated by purifying selection, but under certain conditions Darwinian selection may lead to their retention. Investigating the number of synonymous and non-synonymous substitutions may therefore provide information about the degree of selection operating on a system.

This program uses the Nei and Gojobori (1986) method of estimating synonymous substitutions, which is an unweighted pathway method (see also Nei, Molecular Evolutionary Genetics, 1987, Columbia University Press, New York.). The first step in the procedure requires enumeration of the number of synonymous and non-synonymous sites present at each codon, where each site may be both partially synonymous and non-synonymous.

fi is defined as the proportion of synonymous
changes at the *i*th position of a codon. The number of synonymous
sites, s, for this codon is therefore given by

and consequently the number of non-synonymous sites, n, is given by n = 3 - s. As an example, for codon ATT (Ile), f1 = 0, f2 = 0 and f3 = 2/3 (because two of the possible changes at position 3 do not result in a change of amino acid). Summing for each position, s = 2/3 and n = 7/3.

For a sequence of r codons, the total number of synonymous sites, S is given by

where sj is the value of s at the*j*th codon, and the total number of non-synonymous sites, N = 3r - S.

The next stage is to determine the number of synonymous and
non-synonymous changes between each pair of aligned sequences, codon-by-codon.
Where there is one nucleotide difference, it is obvious whether the
change is synonymous or non-synonymous. When there are two or three
changes, however, the number of possible pathways between the codons increases
to two or six respectively. In the case of two changes, the number of
synonymous and non-synonymous differences per codon, *s**
d* and *n**d* respectively, add up to
2, with each possible pathway having two steps (to make a total of 4 possible
steps). For example, comparing CTA and GTT, the possible pathways are:

Pathway 1: CTA (Leu) --> GTA (Val) --> GTT (Ile) 2 non-synonymous changes

Pathway 2: CTA (Leu) --> CTT (Leu) --> GTT (Ile) 1 synonymous, 1 non-synonymous change.

As only
one of these steps is a synonymous change, then *s**d*
= 0.5 (or ¼ x 2) and *n**d* = 1.5.

With three nucleotide changes, there are six possible pathways,
each with three mutational changes. Considering each of these possible
changes, the values of sd and nd are determined in the same way, with *s*
*d* and *n**d*
adding up to 3. If any of the possible pathways goes via a stop codon,
the pathway is removed from the calculation.

Summing for all
codons, the total numbers of synonymous and non-synonymous differences,
*S**d* and *N**d*
respectively, are given by

and

where *s**dj* and
*n**dj* are the numbers of synonymous and
non-synonymous differences for the *j*th codon, and r is the number of
codons compared.

The proportions of synonymous (*p**S*) and non-synonymous
(*p**N*
) differences are estimated by the equations *p*
*S* = *S**d* */ S* and
*p**N** = N**d*
* / N*, with mean values taken over every pair-wise comparison.

The
numbers of synonymous (*d**S*) and non-
synonymous (dN) substitutions per site are estimated
using the Jukes-Cantor formula as below:

and

This analysis requires both allelic profiles and the allele sequences to be loaded. Again, individual isolates may be selected or deselected from inclusion using the 'Isolate Selection' dialog reachable from the 'Options' menu. This analysis also requires that the open reading frames are identified for each locus and the program will prompt for this information. If necessary, it will take you to the 'Reading Frames' dialog where these can be auto-detected.

To run the analysis, click 'Analysis ... Tests for Selection ... dS/dN ratio'.

## Program Output

Mean number of synonymous sites, S = 96.2

Mean number of non-synonymous sites, N = 335.9

Number of coding sites analysed, S+N = 432

Number of pairwise comparisons made : 190

Mean synonymous substitutions per synonymous site, *d*_{S} = 0.2280

Std. Deviation (*d*_{S}) = 0.1117; 95% Confidence Interval
(*d*_{S}) : 0.2121 - 0.2439

Mean non-synonymous substitutions per non-synonymous site,
*d*_{N} = 0.0162

Std. Deviation (*d*_{N}) = 0.0118; 95% Confidence Interval
(*d*_{N}) : 0.0145 - 0.0179

*d*_{N}/*d*_{S} = 0.0710

*d*_{S}/*d*_{N} = 14.1