concat.pl - Multi Locus Concatenator

Written by Keith Jolley
Copyright (c) 2004, University of Oxford

General

This Perl script will create a concatenated FASTA file from a list of allelic profiles and allele FASTA files. Allele files can be stored locally or retrieved automatically from a web site. It is assumed that alleles are numbered sequentially from 1 in the FASTA files.

Usage

concat.pl <config file>

Configuration file

All configuration options are passed to the script in a configuration file. This should contain a line showing the name of a locus, the path (or web URL) to its FASTA file, and the open reading frame (i.e. 1,2 or 3) for each locus in the profile. It should also contain the path to the profiles file.

For example, the configuration file to concatenate some of the Neisseria MLST alleles (retrieved from the MLST website) along with pilA alleles (retrieved from a local FASTA file) would be:

locus:abcZ=http://pubmlst.org/neisseria/alleles/abcZ.tfa:1
locus:adk=http://pubmlst.org/neisseria/alleles/adk_.tfa:1
locus:aroE=http://pubmlst.org/neisseria/alleles/aroE.tfa:2
locus:pilA=pilA.tfa:1
profiles=profiles.txt

Note that the open reading frame for aroE starts at nucleotide 2 (the first nucleotide for this allele will be removed from the concatenation), all others start at nucleotide 1.  The pilA.tfa file and profiles file 'profiles.txt' should be saved in the current directory.

Profiles file

The profiles file should be in tab-delimited text format (you can save in this format from any spreadsheet). The first row should contain headers. The first header is for the sequence id and can be anything. The other headers should match the names of the loci given in the configuration file but can be in any order.

For example:

id	abcZ	adk	aroE	pilA
1	2	3	4	2
2	9	6	9	7
3	6	7	5	3

Running the script

Once you have created your configuration file, save it as e.g. 'config.txt'. Save your profiles file with the name you have used to refer to it in the configuration file e.g. 'profiles.txt'. Make sure any local allele files that you require are in the current directory (or in the path you have specified in the configuration). Run the script by typing:

concat.pl config.txt

This sends the concatenated output to the screen - you may want to do this first to make sure there are no errors. Once you are happy that it is working, you can save the output to a file, e.g.

concat.pl config.txt > output.txt

Navigation

- PubMLST+ PubMLST
MLST Home
Search / site map
- Software+ Software
Bio-Linux
Web tools
Software
- Bacteria+ Bacteria
A. baumannii
Arcobacter
B. cereus
Bordetella
Brachyspira
B. cepacia
C. fetus
C. helveticus
C. insulaenigrae
C. jejuni & C. coli
C. lari
C. upsaliensis
Chlamydiales
H. pylori
L. monocytogenes
Neisseria
P. aeruginosa
P. gingivalis
S. agalactiae
S. uberis
S. zooepidemicus
Streptomyces
V. parahaemolyticus
V. vulnificus
Wolbachia
- Eukaryotes+ Eukaryotes
A. fumigatus
C. krusei
C. tropicalis
- Other dbases+ Other dbases
B. burgdorferi MLSA
Plasmid MLST
- Mirrors+ Mirrors
About our mirrors Primary |UK1 |UK2 |US1
- Developers+ Developers
SOAP API

Download concat.pl

Download
Version 0.3 Beta

This is a test version of the software. Please send any comments regarding potential improvements, bugs etc. to Keith Jolley.

Installation

Simply download the script and make executable by typing:

chmod a+x concat.pl