Website Documentation

Microbial Genome list

The Microbial Genome tab provides metadata related to bacterial strains in the Wormbiome database. This includes:

Taxonomic assignment of bacterial strains.
Links to public repositories for genomic data.
Sampling information for each strain.
Details of the laboratory that isolated the strain.

Wormbiome annotation process.

Our genomic annotations are derived from four tools: Bakta, Prokka, BV-BRC (formerly PATRIC), and IMG.

The Wormbiome database is created with the following key steps:

Annotations are combined into a multi-track database.
Overlapping Annotations (80% overlap) from different pipelines are merged into a single master feature.
Merged features retain tool-specific details and are assigned unique WormBiome IDs (e.g., WBM_BH3_0000012).

A pan-genomic analysis is performed on the Bakta-based gene predictions with Anvio to identify clusters of annotated and hypothetical genes with similar sequences. Using the DIAMOND software, the Anvio pan-genomic pipeline starts by calculating the similarity between all predicted protein amino acid sequences in the GenBank files generated by our Bakta annotation pipeline. Then, the pipeline resolves gene clusters using the BLAST results via an MCL algorithm after discarding weak hits from the search results using an minbit heuristic (for more details see the Anvio website). Because our pan-genomic comparison includes distantly related microbial genomes, we used a loose MCL inflation parameter (2) to generate clusters not overly specific to microbial taxa.

Available data

Our database currently regroups 87 different types of information. A detailled description is available in the unfoldable tab below:

Column details

Column Name	Description
WBM_geneID	Unique Gene ID specific to the Wormbiome database
Genome	Genome Strain ID
gene_cluster_id	Pangenome cluster identifier
Contig_name	Common contig name
contig_Bakta	Contig-level identifier assigned by Bakta
contig_gapseq	Contig-level identifier assigned by GapSeq
contig_IMG	Contig-level identifier assigned by IMG
contig_PATRIC	Contig-level identifier assigned by PATRIC
contig_Prokka	Contig-level identifier assigned by Prokka
Bakta_start	Start position of gene prediction by Bakta
Bakta_end	Stop position of gene prediction by Bakta
Bakta_frame	Reading frame of gene prediction by Bakta
Bakta_strand	Bakta gene prediction DNA strand (+/-)
Bakta_type	Type of gene feature (e.g., CDS, tRNA)
Bakta_Gene	Gene symbol assigned by Bakta
Bakta_product	Functional annotation of the gene product from Bakta
Bakta_BlastRules	NCBI BlastRules used by Bakta for gene annotation
Bakta_Cazy	Custom CAZy annotation based on Bakta gene calling.
Bakta_COG	Cluster of Orthologous Groups (COG) annotation from Bakta
Bakta_EC	Enzyme Commission (EC) number annotations
Bakta_GO	Gene Ontology (GO) annotations
Bakta_ID	Bakta-generated unique identifier for the gene
Bakta_IS	Insertion sequence (IS) element annotations from Bakta
Bakta_KO	Custom KEGG Orthology (KO) annotation based on Bakta gene calling
Bakta_NCBIFam	NCBI Protein family classification for the gene product
Bakta_NCBIProtein	NCBI protein accession number associated with the gene
Bakta_PFAM	Protein family (PFAM) annotation
Bakta_RefSeq	RefSeq accession for the gene product
Bakta_RFAM	RNA family (RFAM) annotation
Bakta_score	Confidence score assigned to Bakta predictions
Bakta_SO	Sequence Ontology term for the feature
Bakta_UniParc	UniParc identifier for the protein
Bakta_UniRef	UniRef identifier for the protein cluster
Bakta_VFDB	Virulence Factor Database (VFDB) annotation
gapseq_start	Start position of gene prediction by GapSeq
gapseq_end	End position of gene prediction by GapSeq
gapseq_strand	DNA strand for GapSeq prediction (+/-)
gapseq_frame	Reading frame of gene prediction by GapSeq
gapseq_type	Type of Feature predicted by GapSeq
gapseq_ID	Unique identifier for GapSeq-predicted gene
gapseq_BiocycRxn	BioCyc reaction identifier from GapSeq annotation
gapseq_SeedID	SEED subsystem identifier for GapSeq prediction
gapseq_substances	Predicted metabolic substances associated with GapSeq gene
gapseq_tc	Transport classification (TCDB) for GapSeq gene
IMG_ID	Unique identifier for IMG-predicted gene
IMG_start	Start position of gene prediction by IMG
IMG_end	End position of gene prediction by IMG
IMG_strand	DNA strand for IMG prediction (+/-)
IMG_product	Functional annotation of the gene product from IMG
IMG_frame	Reading frame of gene prediction by IMG
IMG_type	Type of Feature predicted by IMG
IMG_cog	Cluster of Orthologous Groups (COG) annotation from IMG
IMG_ko	KEGG Orthology (KO) annotation from IMG
IMG_pfam	Protein family (PFAM) annotation from IMG
IMG_score	Confidence score assigned to IMG predictions
IMG_signalp	Signal peptide prediction by IMG
IMG_smart	SMART domain annotation from IMG
IMG_superfam	Superfamily annotation from IMG
IMG_tigrfam	TIGRFAM annotation from IMG
IMG_tmhmm	Transmembrane helix prediction by IMG
PATRIC_start	Start position of gene prediction by PATRIC
PATRIC_end	End position of gene prediction by PATRIC
PATRIC_strand	DNA strand for PATRIC prediction (+/-)
PATRIC_frame	Reading frame of gene prediction by PATRIC
PATRIC_ID	Unique identifier for PATRIC-predicted gene
PATRIC_type	Type of Feature predicted by PATRIC
PATRIC_product	Functional annotation of the gene product from PATRIC
PATRIC_class	Classification of gene product by PATRIC
PATRIC_pathID	Pathway identifier from PATRIC
PATRIC_Pathway	Pathway annotation from PATRIC
PATRIC_score	Confidence score assigned to PATRIC predictions
PATRIC_SPclassification	Subsystem classification from PATRIC
PATRIC_SPproperty	Subsystem property from PATRIC
PATRIC_subclass	Subclass annotation from PATRIC
PATRIC_subsystem	Subsystem annotation from PATRIC
PATRIC_superclass	Superclass annotation from PATRIC
Prokka_start	Start position of gene prediction by Prokka
Prokka_end	End position of gene prediction by Prokka
Prokka_strand	DNA strand for Prokka prediction (+/-)
Prokka_frame	Reading frame of gene prediction by Prokka
Prokka_ID	Unique identifier for Prokka-predicted gene
Prokka_type	Type of Feature predicted by Prokka
Prokka_gene	Gene symbol assigned by Prokka
Prokka_product	Functional annotation of the gene product from Prokka
Prokka_COG	Cluster of Orthologous Groups (COG) annotation from Prokka
Prokka_EC_number	Enzyme Commission (EC) number annotations from Prokka
Prokka_KO	KEGG Orthology (KO) annotation from Prokka
Prokka_score	Confidence score assigned to Prokka predictions

Wormbiome Download information

All the data used for the Wormbiome database is publicly available. The Raw Data:

Genome assemblies are available on NCBI.
IMG and PATRIC assemblies can be accessed on their respective websites.

The Wormbiome Processed Data: Custom annotation tables are publicly available on it Zenodo project page:

Consensus Database Entry: Compiled consensus table of curated annotations.
Individual strain entries with all associated files generated by the different annotation pipelines.

Individual links to the different public repositories are available on the Microbial Genome tab.

Gene Search

The Gene Search tab allows text-based queries on the Wormbiome database.

How to Use:

Enter a query in the search box and hit "Search."
Narrow your search:
By Taxonomy: Select taxonomic level (e.g., Genus) and specific taxa (e.g., Ochrobactrum).
By Genome: Filter by strain ID.
By Column: Search specific fields like gene name or KEGG annotations. Genes of interest can then be saved to a User Cart by ticking the gene in the list and clicking "Select Gene."

Annotation Browser

The annotation browser allows the user to list all the genes specific to a bacteria or specific taxonomic group.

The annotation database option displays a default set of columns specific to the different annotation pipelines used for the Wormbiome database. Users can choose which column to display using the scrolling list on the left side panel.

Similarly to the Gene search option, users can either list all the genes of a specific bacterial strain or all the genes associated with specific microbial taxa.

Tools

We offer two interactive tools to browse and analyze the genomes available in the Wormbiome database: An annotation comparison tool called Compare Feature A custom blast search tool.

Compare Feature

The annotation comparison tool allows some simple comparisons between taxonomic groups or one to four custom groups.

Blast

Users can perform sequence-based queries on the Wormbiome database to look for specific sequences of interest.

User Cart

The User Cart stores genes selected from the Gene Search or Annotation Browser. The tab has three different displays. Overview Table: Display the selected genes with their associated information. Users can choose which column to display Taxonomic Overview: Visualize the distribution of selected genes across genomes. Annotation Summary: Display the count of specific associated annotations (e.g., KEGG, CAZy, PFAM).

There are two export options: - Download raw or displayed data. - Export generated visualizations.