Database structure
The Neisseria MLST database has a new distributed structure, with allelic profiles separated from isolate data. This offers the following advantages over the original single database:
- There is no replication of allelic profiles. This allows for better integrity checking with the database engine ensuring that an allelic profile cannot be assigned to multiple sequence types and vice-versa.
- Multiple isolate databases can be set up which communicate with the profiles database. These databases do not need to be located on the same machine and can be widely dispersed with communication via the Internet. Now distinctions can be made between public and private data. Isolate databases can also be set up for individual projects or ones that represent true populations rather than the arbitrary nature of the isolates in the original database. The old database has become PubMLST and will continue as before, while as an example, a new database is available containing just the 107 isolates used in the original validation of the MLST scheme. These databases store the ST number for an isolate and look up the allelic profile and clonal complex from the profiles database. Alternatively, databases can contain the allelic profiles and look up the ST number. This would be suitable for project databases where sequence data was entered as it was obtained.
- A user is not confronted with large amounts of mainly irrelevant isolate data when making queries about an allelic profile.
- With the profiles separate, an automated script can now assign allelic profiles to clonal complexes based on similarity to previously identified 'central genotypes'. This complex definition is available to queries directly to the profiles database and also from any isolate database that makes a network connection. The use of an automated script ensures consistency of complex naming which is important when performing searches by complex.
- The database structure is not limited to MLST, since the isolate databases may also link to other data sources. An example of this is the PorA variable regions database. Both the PubMLST and the 107 reference strains databases make connections to the PorA variable region database to retrieve amino sequences for each variable region. Such connections can be made by a simple edition to the XML database configuration script and do not require any further programming.
If you would like to connect a database to the profiles database, your server IP address needs to be registered with us. Our server will then accept one-way connections on its database port and allow you to make queries using standard SQL commands. Please contact the webmaster for further details.