Publication date: Jun 02, 2025
Protein domains are key structural and functional units within proteins, driving essential activities like signaling, DNA binding, and catalysis. Conserved across species, these domains can be identified using their hidden Markov models (HMMs) in uncharacterized proteins. Resources such as Pfam and SUPERFAMILY offer HMM libraries for annotated domains, facilitating the analysis of conserved domains in novel sequences. HMMER, a powerful software suite, applies these models to identify homologous sequences and domain organization in protein databases, enabling comprehensive genome-wide analysis. This protocol presents a framework using the HMMER suite to identify domain sequences within a target protein sequence database and to generate multiple sequence alignments (MSAs) for phylogenomic studies. I demonstrate this approach by identifying homologs of the SARS-CoV-2 receptor-binding domain (RBD) in the UniProt database using its HMM profile. Resulting MSA reveals conserved features across species, while Jalview was used to visualize and edit the alignment for phylogenetic analysis. This protocol provides a starting point for identifying conserved domains and building MSAs for exploring their evolutionary relationships, supporting both functional annotation and comparative analysis of protein domain organization in viral and other genomes.
Semantics
Type | Source | Name |
---|---|---|
disease | IDO | protein |
drug | DRUGBANK | Altretamine |
disease | MESH | COVID-19 |