concat.pl - Multi Locus Concatenator
Written by Keith Jolley
Copyright (c) 2004, University of Oxford
General
This Perl script will create a concatenated FASTA file from a list of allelic profiles and allele FASTA files. Allele files can be stored locally or retrieved automatically from a web site. It is assumed that alleles are numbered sequentially from 1 in the FASTA files.
Usage
concat.pl <config file>
Configuration file
All configuration options are passed to the script in a configuration file. This should contain a line showing the name of a locus, the path (or web URL) to its FASTA file, and the open reading frame (i.e. 1,2 or 3) for each locus in the profile. It should also contain the path to the profiles file.
For example, the configuration file to concatenate some of the Neisseria MLST alleles (retrieved from the MLST website) along with pilA alleles (retrieved from a local FASTA file) would be:
locus:abcZ=http://pubmlst.org/neisseria/alleles/abcZ.tfa:1
locus:adk=http://pubmlst.org/neisseria/alleles/adk_.tfa:1
locus:aroE=http://pubmlst.org/neisseria/alleles/aroE.tfa:2
locus:pilA=pilA.tfa:1
profiles=profiles.txt
Note that the open reading frame for aroE starts at nucleotide 2 (the first nucleotide for this allele will be removed from the concatenation), all others start at nucleotide 1. The pilA.tfa file and profiles file 'profiles.txt' should be saved in the current directory.
Profiles file
The profiles file should be in tab-delimited text format (you can save in this format from any spreadsheet). The first row should contain headers. The first header is for the sequence id and can be anything. The other headers should match the names of the loci given in the configuration file but can be in any order.
For example:
id abcZ adk aroE pilA 1 2 3 4 2 2 9 6 9 7 3 6 7 5 3
Running the script
Once you have created your configuration file, save it as e.g. 'config.txt'. Save your profiles file with the name you have used to refer to it in the configuration file e.g. 'profiles.txt'. Make sure any local allele files that you require are in the current directory (or in the path you have specified in the configuration). Run the script by typing:
concat.pl config.txt
This sends the concatenated output to the screen - you may want to do this first to make sure there are no errors. Once you are happy that it is working, you can save the output to a file, e.g.
concat.pl config.txt > output.txt