Isolate database structure
The isolate databases contain tables for user and isolate information. These tables are linked so that every submitted isolate is associated with a sender in the users table.
Users table
The users table contains contact information for anybody who sends data to the database or curates it. Every allele sequence and allelic profile has an entry for both the sender and the curator of the data that correspond to an entry in the users table. The information from this table is not displayed on the website, but can be used by the curator to trace who provided a sequence or profile. The fields in this table are:
- id - the unique identifier for the entry - this is an integer that can be used in the profile and locus tables to link to the user.
- username - this is a short version of a user's name that is displayed to the curator in a drop-down list that can be selected when entering data. This removes the need for the curator to look up the id number of a user.
- surname - the user's surname.
- first_name - the users' first name.
- email - the user's E-mail address.
- affiliation - the organisation for which the user works.
- status - this can be either 'user' or 'curator'. Only a user with a status of 'curator' is allowed by the software to edit data.
- datestamp - the date that the user information was added (or last edited).
- curator - the name of the curator who added the user.
Isolates table
Isolate databases can contain any field that is deemed appropriate for the particular dataset. However, most databases will have the following fields:
- id - the unique identifier for the entry.
- strain - the strain identifier. This isn't guaranteed to be unique for PubMLST datasets where any isolate can be deposited. For some datasets it can be appropriate to use the strain identifier as the id.
- ST or allelic profile fields - The database can be configured to accept either an ST number (in which case the allelic profile and clonal complex will be retrieved from the profiles database) or a full allelic profile (in which case the ST and clonal complex will be retrieved from the profiles database). The former is usually used for databases where complete data is being deposited, such as the PubMLST isolate databases, while the latter is more appropriate for project databases where data is deposited as it is obtained since it allows partial profiles.
- reference1 - reference5 - Five reference fields which can accept a PubMed id. If the details of the isolate is published, adding the PubMed reference in one of these fields will allow the database to be searched by reference or author.
- date_entered - the date that the information was entered.
- datestamp - the date that the information was last edited.
- curator - an integer that represents the curator of the data. This corresponds to an id number in the user's table.
- sender - an integer that represents the sender of the data. This corresponds to an id number in the user's table.
Other fields will be specific for each database. It is possible to set a list of allowed values for any field which ensures that data is entered consistently (and can hence be easily searched). This would be appropriate for fields such as sex, serogroup, disease, and source.
Some fields can be set to look up information from other external databases. For example, in the Neisseria PubMLST database, there are PorA VR1 and VR2 fields where entering a PorA variant will enable the software to retrieve the appropriate peptide sequence hyperlinked to the PorA website.