Skip to content

Website Documentation

Microbial Genome list

The Microbial Genome tab provides metadata related to bacterial strains in the Wormbiome database. This includes:

  • Taxonomic assignment of bacterial strains.
  • Links to public repositories for genomic data.
  • Sampling information for each strain.
  • Details of the laboratory that isolated the strain.

Wormbiome annotation process.

Our genomic annotations are derived from four tools: Bakta, Prokka, BV-BRC (formerly PATRIC), and IMG.

The Wormbiome database is created with the following key steps:

  1. Annotations are combined into a multi-track database.
  2. Overlapping Annotations (80% overlap) from different pipelines are merged into a single master feature.
  3. Merged features retain tool-specific details and are assigned unique WormBiome IDs (e.g., WBM_BH3_0000012).

A pan-genomic analysis is performed on the Bakta-based gene predictions with Anvio to identify clusters of annotated and hypothetical genes with similar sequences. Using the DIAMOND software, the Anvio pan-genomic pipeline starts by calculating the similarity between all predicted protein amino acid sequences in the GenBank files generated by our Bakta annotation pipeline. Then, the pipeline resolves gene clusters using the BLAST results via an MCL algorithm after discarding weak hits from the search results using an minbit heuristic (for more details see the Anvio website). Because our pan-genomic comparison includes distantly related microbial genomes, we used a loose MCL inflation parameter (2) to generate clusters not overly specific to microbial taxa.

Available data

Our database currently regroups 87 different types of information. A detailled description is available in the unfoldable tab below:

Column details
Column Name Description
WBM_geneID Unique Gene ID specific to the Wormbiome database
Genome Genome Strain ID
gene_cluster_id Pangenome cluster identifier
Contig_name Common contig name
contig_Bakta Contig-level identifier assigned by Bakta
contig_gapseq Contig-level identifier assigned by GapSeq
contig_IMG Contig-level identifier assigned by IMG
contig_PATRIC Contig-level identifier assigned by PATRIC
contig_Prokka Contig-level identifier assigned by Prokka
Bakta_start Start position of gene prediction by Bakta
Bakta_end Stop position of gene prediction by Bakta
Bakta_frame Reading frame of gene prediction by Bakta
Bakta_strand Bakta gene prediction DNA strand (+/-)
Bakta_type Type of gene feature (e.g., CDS, tRNA)
Bakta_Gene Gene symbol assigned by Bakta
Bakta_product Functional annotation of the gene product from Bakta
Bakta_BlastRules NCBI BlastRules used by Bakta for gene annotation
Bakta_Cazy Custom CAZy annotation based on Bakta gene calling.
Bakta_COG Cluster of Orthologous Groups (COG) annotation from Bakta
Bakta_EC Enzyme Commission (EC) number annotations
Bakta_GO Gene Ontology (GO) annotations
Bakta_ID Bakta-generated unique identifier for the gene
Bakta_IS Insertion sequence (IS) element annotations from Bakta
Bakta_KO Custom KEGG Orthology (KO) annotation based on Bakta gene calling
Bakta_NCBIFam NCBI Protein family classification for the gene product
Bakta_NCBIProtein NCBI protein accession number associated with the gene
Bakta_PFAM Protein family (PFAM) annotation
Bakta_RefSeq RefSeq accession for the gene product
Bakta_RFAM RNA family (RFAM) annotation
Bakta_score Confidence score assigned to Bakta predictions
Bakta_SO Sequence Ontology term for the feature
Bakta_UniParc UniParc identifier for the protein
Bakta_UniRef UniRef identifier for the protein cluster
Bakta_VFDB Virulence Factor Database (VFDB) annotation
gapseq_start Start position of gene prediction by GapSeq
gapseq_end End position of gene prediction by GapSeq
gapseq_strand DNA strand for GapSeq prediction (+/-)
gapseq_frame Reading frame of gene prediction by GapSeq
gapseq_type Type of Feature predicted by GapSeq
gapseq_ID Unique identifier for GapSeq-predicted gene
gapseq_BiocycRxn BioCyc reaction identifier from GapSeq annotation
gapseq_SeedID SEED subsystem identifier for GapSeq prediction
gapseq_substances Predicted metabolic substances associated with GapSeq gene
gapseq_tc Transport classification (TCDB) for GapSeq gene
IMG_ID Unique identifier for IMG-predicted gene
IMG_start Start position of gene prediction by IMG
IMG_end End position of gene prediction by IMG
IMG_strand DNA strand for IMG prediction (+/-)
IMG_product Functional annotation of the gene product from IMG
IMG_frame Reading frame of gene prediction by IMG
IMG_type Type of Feature predicted by IMG
IMG_cog Cluster of Orthologous Groups (COG) annotation from IMG
IMG_ko KEGG Orthology (KO) annotation from IMG
IMG_pfam Protein family (PFAM) annotation from IMG
IMG_score Confidence score assigned to IMG predictions
IMG_signalp Signal peptide prediction by IMG
IMG_smart SMART domain annotation from IMG
IMG_superfam Superfamily annotation from IMG
IMG_tigrfam TIGRFAM annotation from IMG
IMG_tmhmm Transmembrane helix prediction by IMG
PATRIC_start Start position of gene prediction by PATRIC
PATRIC_end End position of gene prediction by PATRIC
PATRIC_strand DNA strand for PATRIC prediction (+/-)
PATRIC_frame Reading frame of gene prediction by PATRIC
PATRIC_ID Unique identifier for PATRIC-predicted gene
PATRIC_type Type of Feature predicted by PATRIC
PATRIC_product Functional annotation of the gene product from PATRIC
PATRIC_class Classification of gene product by PATRIC
PATRIC_pathID Pathway identifier from PATRIC
PATRIC_Pathway Pathway annotation from PATRIC
PATRIC_score Confidence score assigned to PATRIC predictions
PATRIC_SPclassification Subsystem classification from PATRIC
PATRIC_SPproperty Subsystem property from PATRIC
PATRIC_subclass Subclass annotation from PATRIC
PATRIC_subsystem Subsystem annotation from PATRIC
PATRIC_superclass Superclass annotation from PATRIC
Prokka_start Start position of gene prediction by Prokka
Prokka_end End position of gene prediction by Prokka
Prokka_strand DNA strand for Prokka prediction (+/-)
Prokka_frame Reading frame of gene prediction by Prokka
Prokka_ID Unique identifier for Prokka-predicted gene
Prokka_type Type of Feature predicted by Prokka
Prokka_gene Gene symbol assigned by Prokka
Prokka_product Functional annotation of the gene product from Prokka
Prokka_COG Cluster of Orthologous Groups (COG) annotation from Prokka
Prokka_EC_number Enzyme Commission (EC) number annotations from Prokka
Prokka_KO KEGG Orthology (KO) annotation from Prokka
Prokka_score Confidence score assigned to Prokka predictions

Wormbiome Download information

All the data used for the Wormbiome database is publicly available. The Raw Data:

  • Genome assemblies are available on NCBI.
  • IMG and PATRIC assemblies can be accessed on their respective websites.

The Wormbiome Processed Data: Custom annotation tables are publicly available on it Zenodo project page:

  • Consensus Database Entry: Compiled consensus table of curated annotations.
  • Individual strain entries with all associated files generated by the different annotation pipelines.

Individual links to the different public repositories are available on the Microbial Genome tab.

The Gene Search tab allows text-based queries on the Wormbiome database.

How to Use:

  1. Enter a query in the search box and hit "Search." Search Example
  2. Narrow your search:
  3. By Taxonomy: Select taxonomic level (e.g., Genus) and specific taxa (e.g., Ochrobactrum).
  4. By Genome: Filter by strain ID.
  5. By Column: Search specific fields like gene name or KEGG annotations. Filter Example Genes of interest can then be saved to a User Cart by ticking the gene in the list and clicking "Select Gene."

Annotation Browser

The annotation browser allows the user to list all the genes specific to a bacteria or specific taxonomic group.

The annotation database option displays a default set of columns specific to the different annotation pipelines used for the Wormbiome database. Users can choose which column to display using the scrolling list on the left side panel.

Similarly to the Gene search option, users can either list all the genes of a specific bacterial strain or all the genes associated with specific microbial taxa.

Tools

We offer two interactive tools to browse and analyze the genomes available in the Wormbiome database: An annotation comparison tool called Compare Feature A custom blast search tool.

Compare Feature

The annotation comparison tool allows some simple comparisons between taxonomic groups or one to four custom groups.

Blast

Users can perform sequence-based queries on the Wormbiome database to look for specific sequences of interest.

User Cart

The User Cart stores genes selected from the Gene Search or Annotation Browser. The tab has three different displays. Overview Table: Display the selected genes with their associated information. Users can choose which column to display Taxonomic Overview: Visualize the distribution of selected genes across genomes. Annotation Summary: Display the count of specific associated annotations (e.g., KEGG, CAZy, PFAM).

There are two export options: - Download raw or displayed data. - Export generated visualizations.