Pneumococcal Genome Library information

How was the Pneumococcal Genome Library (PGL) created?

We screened 1,091 PubMed papers from 2000-2020 and identified 211 peer-reviewed publications with genome data from at least one naturally-occurring pneumococcal isolate.

Overall, 55% (n=115) of the publications provided all genome and contextual data reported in their papers. There were discrepancies in the remaining papers, e.g. the inclusion of invalid accession numbers, an incorrect total of accession numbers, or other data integrity issues. Papers were included in the PGL if all data could be recovered.

What data are currently in the Pneumococcal Genome Library?

The first version of the PGL contains 33,303 genomes from 129 publications. The quality of each genome sequence was assessed based upon several criteria, including species identification and standard measures of genome quality (genome size, GC content, number of contigs, N50, number of Ns, and number of gaps). Overall, 93.0% (n=30,976) of the genomes passed all quality control metrics.

Articles were published between 2001 and 2020, and individual genomes were included in up to 52 publications. Pneumococci were recovered from 80 countries between 1916-2018, and 96 serotypes are represented in the PGL, including 24 serotypes that are included in one or more licensed pneumococcal vaccines.

Will more genomes be added to the Pneumococcal Genome Library?

Yes. The first version of the PGL was created in 2019 with a major update in 2022. Among the publications that had discrepancies or data issues, many could likely be included in the PGL if more information and/or data were provided by the original authors. If you would like your paper added to the PGL, please contact angela.brueggemannatndph.ox.ac.uk (Angela Brueggemann).

Using and contributing to the Pneumococcal Genome Library

We used the PGL to develop a pneumococcal cgMLST scheme and LIN barcoding system to provide fine-scale clustering approaches for population-based analyses. Read more here.

The PGL, cgMLST scheme and LIN barcodes are open-access resources for the scientific community and we welcome feedback. If you have studies that you would like added to the library, please contact angela.brueggemannatndph.ox.ac.uk (Angela Brueggemann).