Homoplasy Test

The Homoplasy Test (Maynard Smith & Smith, 1998) aims to measure the importance of recombination between members of a population.  It is only valid where sequences differ by ~5% of nucleotides or less.  The test tries to determine if there is a statistically significant excess of homoplasies (shared similarities found in different branches of a phylogenetic tree not inherited directly from an ancestor) derived from the dataset, compared to an estimate of the number of homoplasies expected by mutation in the absence of recombination.  An excess of homoplasies is likely to have been brought about by recombination.  The test requires at least six sequences containing at least ten 'informative sites' (sites at which the rarer of two alternative bases is present at least twice).  A 'homoplasy ratio' is calculated which should range from zero, for a clonal population, to one, for a population under free recombination.

This analysis requires both allelic profiles and the allele sequences to be loaded.  Again, individual isolates may be selected or deselected from inclusion using the 'Isolate Selection' dialog reachable from the 'Options' menu.  This analysis also requires that the open reading frames are identified for each locus and the program will prompt for this information.  If necessary, it will take you to the 'Reading Frames' dialog where these can be auto-detected.

To run the analysis, click 'Analysis ... Tests for Recombination ... Homoplasy Test'.  The test only looks at third site positions and removes all non-synonymous codons from the analysis.  During the first pass, the number of variable third sites and informative sites are determined for each locus.  You are then presented with a dialog box presenting a series of check boxes for each locus.   Check the ones you wish to perform the analysis on.  If a locus has less than 10 informative sites, the box will be unchecked and the analysis will not be performed on it.  For large data sets on older, slower computers, computation time is significant and may run into hours.  You are also prompted to enter an estimate of the 'Effective site number', Se, based as a proportion of the number of third sites, S.  Se is an estimate of the number of sites that are 'free to change', with the rest being selectively constrained.  For a fuller understanding please see Maynard-Smith & Smith, 1998.  Changing the value of Se affects the significance of the results.  The most reliable method of estimating Se is to utilise an outgroup sequence that has the same codon bias and GC ratio as the ingroup sequences.  At the moment, Se estimation using an outgroup is not supported by the package, but it is envisaged that future versions will offer the option.   In the absence of an outgroup, using a value of Se = 0.6S is conservative (where if recombination is detected it is likely to be real).  A value of Se = S, which assumes that all sites are equally likely to change, is likely to lead to an overestimate and therefore you may conclude that recombination has occurred when there hasn't been any.  A value of 0.6S is the default setting.

The program outputs the number of observed homoplasies, h, and the expected number of homoplasies if the data set was clonal, hc.  It then randomly shuffles the sequence matrix to simulate free recombination and repeats the test.  It does this ten times to provide a mean value.  The observed number of homoplasies should, hopefully, full between the expected value for a clonal data set and that for one under free recombination, hr.  A homoplasy ratio defined as (h - hc)/( hr - hc) is then calculated, which should be a value between 0 and 1.  The significance of this value is also calculated and represented by a P value (should be below 0.05 for a 95% confidence level).

homoplasy test

Program Output

First pass determines the number of informative sites at each locus.

Locus Alleles analysed Variable sites Informative sites
abcZ 20 49 37
adk 15 14 9
aroE 17 42 29
fumC 25 37 24
gdh 18 20 16
pdhC 23 54 51
pgm 24 47 35

Second Pass determines the observed number of homoplasies in a maximum-parsimony tree (MPT) and the expected homoplasies if there was no recombination.

fumC

Value of Se assumed: 92
True homoplasies: 35
Expected homoplasies if clonal: 9.9

Homoplasies in 10 shuffled matrices:
62 61 64 62 68 66 63 62 61 64

Mean number of homoplasies (10 trials) in a MPT of a matrix obtained by shuffling real matrix: 63.3
0 case(s) of (expected homoplasies > true homoplasies) out of 1000 trials
P(exph >= trueh) = 0.000
Homoplasy ratio (trueh-exph)/(meansh-exph) = 0.470