Website Documentation
Microbial Genome list
The Microbial Genome tab provides metadata related to bacterial strains in the Wormbiome database. This includes:
- Taxonomic assignment of bacterial strains.
- Links to public repositories for genomic data.
- Sampling information for each strain.
- Details of the laboratory that isolated the strain.
Wormbiome annotation process.
Our genomic annotations are derived from four tools: Bakta, Prokka, BV-BRC (formerly PATRIC), and IMG.
The Wormbiome database is created with the following key steps:
- Annotations are combined into a multi-track database.
- Overlapping Annotations (80% overlap) from different pipelines are merged into a single master feature.
- Merged features retain tool-specific details and are assigned unique WormBiome IDs (e.g.,
WBM_BH3_0000012).
A pan-genomic analysis is performed on the Bakta-based gene predictions with Anvio to identify clusters of annotated and hypothetical genes with similar sequences. Using the DIAMOND software, the Anvio pan-genomic pipeline starts by calculating the similarity between all predicted protein amino acid sequences in the GenBank files generated by our Bakta annotation pipeline. Then, the pipeline resolves gene clusters using the BLAST results via an MCL algorithm after discarding weak hits from the search results using an minbit heuristic (for more details see the Anvio website). Because our pan-genomic comparison includes distantly related microbial genomes, we used a loose MCL inflation parameter (2) to generate clusters not overly specific to microbial taxa.
Available data
Our database currently regroups 87 different types of information. A detailled description is available in the unfoldable tab below:
Column details
| Column Name | Description |
|---|---|
| WBM_geneID | Unique Gene ID specific to the Wormbiome database |
| Genome | Genome Strain ID |
| gene_cluster_id | Pangenome cluster identifier |
| Contig_name | Common contig name |
| contig_Bakta | Contig-level identifier assigned by Bakta |
| contig_gapseq | Contig-level identifier assigned by GapSeq |
| contig_IMG | Contig-level identifier assigned by IMG |
| contig_PATRIC | Contig-level identifier assigned by PATRIC |
| contig_Prokka | Contig-level identifier assigned by Prokka |
| Bakta_start | Start position of gene prediction by Bakta |
| Bakta_end | Stop position of gene prediction by Bakta |
| Bakta_frame | Reading frame of gene prediction by Bakta |
| Bakta_strand | Bakta gene prediction DNA strand (+/-) |
| Bakta_type | Type of gene feature (e.g., CDS, tRNA) |
| Bakta_Gene | Gene symbol assigned by Bakta |
| Bakta_product | Functional annotation of the gene product from Bakta |
| Bakta_BlastRules | NCBI BlastRules used by Bakta for gene annotation |
| Bakta_Cazy | Custom CAZy annotation based on Bakta gene calling. |
| Bakta_COG | Cluster of Orthologous Groups (COG) annotation from Bakta |
| Bakta_EC | Enzyme Commission (EC) number annotations |
| Bakta_GO | Gene Ontology (GO) annotations |
| Bakta_ID | Bakta-generated unique identifier for the gene |
| Bakta_IS | Insertion sequence (IS) element annotations from Bakta |
| Bakta_KO | Custom KEGG Orthology (KO) annotation based on Bakta gene calling |
| Bakta_NCBIFam | NCBI Protein family classification for the gene product |
| Bakta_NCBIProtein | NCBI protein accession number associated with the gene |
| Bakta_PFAM | Protein family (PFAM) annotation |
| Bakta_RefSeq | RefSeq accession for the gene product |
| Bakta_RFAM | RNA family (RFAM) annotation |
| Bakta_score | Confidence score assigned to Bakta predictions |
| Bakta_SO | Sequence Ontology term for the feature |
| Bakta_UniParc | UniParc identifier for the protein |
| Bakta_UniRef | UniRef identifier for the protein cluster |
| Bakta_VFDB | Virulence Factor Database (VFDB) annotation |
| gapseq_start | Start position of gene prediction by GapSeq |
| gapseq_end | End position of gene prediction by GapSeq |
| gapseq_strand | DNA strand for GapSeq prediction (+/-) |
| gapseq_frame | Reading frame of gene prediction by GapSeq |
| gapseq_type | Type of Feature predicted by GapSeq |
| gapseq_ID | Unique identifier for GapSeq-predicted gene |
| gapseq_BiocycRxn | BioCyc reaction identifier from GapSeq annotation |
| gapseq_SeedID | SEED subsystem identifier for GapSeq prediction |
| gapseq_substances | Predicted metabolic substances associated with GapSeq gene |
| gapseq_tc | Transport classification (TCDB) for GapSeq gene |
| IMG_ID | Unique identifier for IMG-predicted gene |
| IMG_start | Start position of gene prediction by IMG |
| IMG_end | End position of gene prediction by IMG |
| IMG_strand | DNA strand for IMG prediction (+/-) |
| IMG_product | Functional annotation of the gene product from IMG |
| IMG_frame | Reading frame of gene prediction by IMG |
| IMG_type | Type of Feature predicted by IMG |
| IMG_cog | Cluster of Orthologous Groups (COG) annotation from IMG |
| IMG_ko | KEGG Orthology (KO) annotation from IMG |
| IMG_pfam | Protein family (PFAM) annotation from IMG |
| IMG_score | Confidence score assigned to IMG predictions |
| IMG_signalp | Signal peptide prediction by IMG |
| IMG_smart | SMART domain annotation from IMG |
| IMG_superfam | Superfamily annotation from IMG |
| IMG_tigrfam | TIGRFAM annotation from IMG |
| IMG_tmhmm | Transmembrane helix prediction by IMG |
| PATRIC_start | Start position of gene prediction by PATRIC |
| PATRIC_end | End position of gene prediction by PATRIC |
| PATRIC_strand | DNA strand for PATRIC prediction (+/-) |
| PATRIC_frame | Reading frame of gene prediction by PATRIC |
| PATRIC_ID | Unique identifier for PATRIC-predicted gene |
| PATRIC_type | Type of Feature predicted by PATRIC |
| PATRIC_product | Functional annotation of the gene product from PATRIC |
| PATRIC_class | Classification of gene product by PATRIC |
| PATRIC_pathID | Pathway identifier from PATRIC |
| PATRIC_Pathway | Pathway annotation from PATRIC |
| PATRIC_score | Confidence score assigned to PATRIC predictions |
| PATRIC_SPclassification | Subsystem classification from PATRIC |
| PATRIC_SPproperty | Subsystem property from PATRIC |
| PATRIC_subclass | Subclass annotation from PATRIC |
| PATRIC_subsystem | Subsystem annotation from PATRIC |
| PATRIC_superclass | Superclass annotation from PATRIC |
| Prokka_start | Start position of gene prediction by Prokka |
| Prokka_end | End position of gene prediction by Prokka |
| Prokka_strand | DNA strand for Prokka prediction (+/-) |
| Prokka_frame | Reading frame of gene prediction by Prokka |
| Prokka_ID | Unique identifier for Prokka-predicted gene |
| Prokka_type | Type of Feature predicted by Prokka |
| Prokka_gene | Gene symbol assigned by Prokka |
| Prokka_product | Functional annotation of the gene product from Prokka |
| Prokka_COG | Cluster of Orthologous Groups (COG) annotation from Prokka |
| Prokka_EC_number | Enzyme Commission (EC) number annotations from Prokka |
| Prokka_KO | KEGG Orthology (KO) annotation from Prokka |
| Prokka_score | Confidence score assigned to Prokka predictions |
Wormbiome Download information
All the data used for the Wormbiome database is publicly available. The Raw Data:
- Genome assemblies are available on NCBI.
- IMG and PATRIC assemblies can be accessed on their respective websites.
The Wormbiome Processed Data: Custom annotation tables are publicly available on it Zenodo project page:
- Consensus Database Entry: Compiled consensus table of curated annotations.
- Individual strain entries with all associated files generated by the different annotation pipelines.
Individual links to the different public repositories are available on the Microbial Genome tab.
Gene Search
The Gene Search tab allows text-based queries on the Wormbiome database.
How to Use:
- Enter a query in the search box and hit "Search."

- Narrow your search:
- By Taxonomy: Select taxonomic level (e.g., Genus) and specific taxa (e.g., Ochrobactrum).
- By Genome: Filter by strain ID.
- By Column: Search specific fields like gene name or KEGG annotations.
Genes of interest can then be saved to a User Cart by ticking the gene in the list and clicking "Select Gene."
Annotation Browser
The annotation browser allows the user to list all the genes specific to a bacteria or specific taxonomic group.
The annotation database option displays a default set of columns specific to the different annotation pipelines used for the Wormbiome database. Users can choose which column to display using the scrolling list on the left side panel.
Similarly to the Gene search option, users can either list all the genes of a specific bacterial strain or all the genes associated with specific microbial taxa.
Tools
We offer two interactive tools to browse and analyze the genomes available in the Wormbiome database: An annotation comparison tool called Compare Feature A custom blast search tool.
Compare Feature
The annotation comparison tool allows some simple comparisons between taxonomic groups or one to four custom groups.
Blast
Users can perform sequence-based queries on the Wormbiome database to look for specific sequences of interest.
User Cart
The User Cart stores genes selected from the Gene Search or Annotation Browser. The tab has three different displays. Overview Table: Display the selected genes with their associated information. Users can choose which column to display Taxonomic Overview: Visualize the distribution of selected genes across genomes. Annotation Summary: Display the count of specific associated annotations (e.g., KEGG, CAZy, PFAM).
There are two export options: - Download raw or displayed data. - Export generated visualizations.