Protein sequence alignment—comparing protein sequences for similarities—is fundamental to modern biology and medicine. It illuminates gene functions by reconstructing evolutionary relationships, technically called homology inference, that can inform drug development. When scientists discover or design a new protein, they can align it with known protein sequences to infer its structure and function.
This homology search can reveal promising drug targets (by comparing pathogen proteins to human proteins, for example) or pinpoint disease-causing mutations (by comparing a patient’s protein to a healthy version). However, the rapid expansion of genomic and metagenomic data now strains traditional alignment tools.
This post explores how recent advances in protein alignment accelerate protein science by using GPU-optimized alignment to enhance AI-driven drug discovery, structural prediction, and protein design at unprecedented speed.
Protein sequence alignment scales scientific insight
Sequence alignment might sound technical, but its importance is straightforward: scientists can compare protein sequences to find similarities. Similar sequences often imply similar functions or structural features. This is the basis of homology inference: if Protein A resembles Protein B, they might share a biological role.
Protein sequence alignment is essential for functional annotation, evolutionary studies, disease research, and drug discovery by identifying conserved regions, predicting protein functions, and detecting unlikely mutations that could lead to disease. Evolutionary information encoded in sequence alignments can also guide drug target selection and optimization.

The human proteome exhibits immense complexity. A typical mammalian cell contains approximately 10 billion protein molecules, spanning a dynamic range of 106 in abundance per cell. Body fluids such as plasma exhibit an even greater 1010-fold variation between the most and least abundant proteins. This complexity poses significant challenges for comprehensive proteomic analysis?.
Mapping protein interactions is even more daunting. While nearly 200 million possible pairwise interactions exist in the human proteome, only ~53,000 have been experimentally confirmed, making this mapping akin to finding a needle in a haystack.
De novo protein design, the most computationally complex of these challenges, involves navigating an astronomical search space (20N sequences for an N-length protein) and solving an NP-hard problem in folding and function optimization. Advances in AI and automation have significantly improved success rates—from <10% to ~30–50% for some design classes—but experimental validation remains resource-intensive, often requiring iterative testing and refinement. While recent breakthroughs have accelerated progress, these fundamental problems stay at the frontier of structural biology.
Evolution of alignments from BLAST to MMseqs2
Bioinformaticians have developed increasingly efficient algorithms to accelerate sequence alignment. BLAST revolutionized search speeds in the 1990s but struggled with growing data, leading to the development of DIAMOND and MMseqs2 in the 2010s.
MMseqs2 achieves sensitivities better than PSI-BLAST while running over 400 times faster. In profile searches with three iterations, MMseqs2 was 433 times faster than PSI-BLAST while also demonstrating considerably higher sensitivity.
MMseqs2 and DIAMOND are now widely used in genome annotation and drug discovery, replacing BLAST in pipelines that once took weeks to compute. However, as data volumes continue to explode, even the fastest CPU-based tools face limits, prompting a shift toward GPU acceleration. Despite advances in AI, protein alignments remain critical. Deep learning models like AlphaFold2 rely on multiple sequence alignments (MSAs) to predict protein structures, demonstrating the enduring importance of fast and scalable sequence search methods (Table 1).
MMseqs2 Use Case | Examples | References |
Expanding and cascading MSA searches for protein structure prediction | 1. ColabFold 2. RoseTTAFold and ColabFold Search 3. OpenFold and ColabFold Search | Highly Accurate Protein Structure Prediction with AlphaFold ColabFold: Making Protein Folding Accessible to All Accurate Prediction of Protein Structures and Interactions Using a Three-Track Neural Network OpenFold: Retraining AlphaFold2 Yields New Insights into Its Learning Mechanisms and Capacity for generalization |
Filtering redundant or homologous sequences | ADOPT: Identifies intrinsically disordered protein regions | ADOPT: Intrinsic Protein Disorder Prediction Through Deep Bidirectional Transformers |
Clustering for protein interaction analysis | SENSE-PPI | SENSE-PPI Reconstructs Interactomes Within, Across, and Between Species at the Genome Scale |
The potential upside of faster MSA in deep learning-based workflows is large. MSA search can be computationally expensive, often dominating inference and training times. For example, inference for AlphaFold2 and OpenFold is dominated by MSA search (~70–90% of total time), resulting in cost savings of 57% and 51% for AlphaFold and OpenFold when splitting the MSA alignment and folding algorithms in two jobs as compared to a single compute job.
MMseqs2-GPU and the next leap in speed
MMseqs2-GPU leverages novel GPU-specific accelerations to unlock multiple sequence alignments on CUDA?. The joint research team that developed MMseqs2-GPU was led by researchers at Seoul National University, Johannes Gutenberg University Mainz, and NVIDIA. The team created a novel GPU-optimized “gapless” filtering algorithm to replace the CPU k-mer-based prefilter?. In simple terms, instead of scanning for short matching substrings (k-mers) as BLAST and MMseqs2 do on CPU, the GPU version uses a highly parallel algorithm that directly scores alignments without gaps (mismatches allowed but no insertions/deletions) across the sequences.
This approach, implemented with CUDA, is tailored to avoid memory bottlenecks and simultaneously keep thousands of GPU cores busy??. After this fast prefilter finds promising matches, the GPU can also carry out the more precise gapped alignment (using an optimized Smith-Waterman algorithm) for those hits.
To put this advancement in perspective, key performance comparisons from the MMseqs2 GPU technical report include the following:
- MMseqs2-GPU achieves up to 100 TCUPS (trillions of cell updates per second) across eight GPUs for gapless filtering, outperforming previous acceleration methods by one to two orders of magnitude.
- On a single NVIDIA L40S GPU, MMseqs2-GPU is 20x faster and 71x cheaper than MMseqs2 k-mer running on a 128-core CPU for protein sequence searches.
- In ColabFold, MMseqs2-GPU accelerates structure prediction 23x compared to AlphaFold2 using JackHMMER, while maintaining equivalent prediction accuracy.
- For protein structure alignment in Foldseek, MMseqs2-GPU provides up to 27x speedup compared to the CPU-based version.
- MMseqs2-GPU enables faster homology searches even on cost-effective GPUs such as the NVIDIA L4, offering a ten-fold speed increase over JackHMMER when searching UniRef90.
Sequence alignment and protein structure prediction
As an example of MMseqs2-GPU solving a real-world problem in a drug discovery workflow, the following illustrates the role of sequence alignment in AI-driven target structure prediction. Here, MMSeqs2-GPU (MSA-Search) and the OpenFold model are each packaged as containerized NVIDIA NIM microservices (Figure 2).

For example, the MSA-Search NIM and OpenFold2 NIM can be used together to model multiple conformational states of a protein of interest. The workflow involves the following steps:
- Query → Alignment: Send the FASTA sequence to MSA-Search NIM → Get an A3M MSA plus a template-hit HHR file.
- Pick templates: Choose two PDB IDs (open versus closed) and extract their individual HHR blocks.
- Predict each state: Call OpenFold 2 NIM twice, passing the same A3M but a different HHR slice each time.
- Write results: Save the two returned PDB strings as hClpP_active.pdb and hClpP_inactive.pdb.
The following code shows how that could be done, using the human hCLpP protein as an example:
""" Example: model two conformations of human ClpP (hClpP) """ import os, json, requests, pathlib # ------------------------------------------------------------------ # Config # ------------------------------------------------------------------ API_KEY = os.environ[ "NIM_API_KEY" ] HEADERS = { "Authorization" : f "Bearer {API_KEY}" , "Content-Type" : "application/json" } OF2_URL = "http://localhost:8000/biology/openfold/openfold2/predict-structure-from-msa-and-template" SEQ = """>hClpP_HUMAN MARGKIIGELASKKKVEAMAAKLAEAG... (FASTA truncated for clarity)""" # ------------------------------------------------------------------ # 1) Run GPU-MMseqs2 via MSA-Search NIM # ------------------------------------------------------------------ msa_payload = { "sequence" : SEQ, "databases" : [ "uniref90-2024_02" , "pdb70" ], "return_templates" : True # tells the service to emit an HHR-formatted hit list } msa_resp = requests.post(MSA_URL, headers = HEADERS, data = json.dumps(msa_payload), timeout = 900 ) msa_resp.raise_for_status() msa = msa_resp.json() a3m_alignment = msa[ "alignments" ][ "uniref90-2024_02" ][ "a3m" ][ "alignment" ] # Helper: pick two PDB templates that represent distinct states. ACTIVE_PDB = "7DKF" # hClpP active/open INACTIVE_PDB = "7D7G" # hClpP inactive/closed hhr_all = msa[ "templates" ][ "pdb70" ][ "hhr" ][ "alignment" ] # full HHR text def slice_hhr_for(pdb_id: str , hhr_text: str ) - > str : """Return an HHR minimal block for a single template PDB hit.""" keep = [] write = False for line in hhr_text.splitlines(): if line.startswith( ">PDBID:" ) and pdb_id in line: write = True elif line.startswith( ">PDBID:" ) and write: break if write: keep.append(line) return "\n" .join(keep) hhr_active = slice_hhr_for(ACTIVE_PDB, hhr_all) hhr_inactive = slice_hhr_for(INACTIVE_PDB, hhr_all) # ------------------------------------------------------------------ # 2) Predict *active* conformation with OpenFold 2 # ------------------------------------------------------------------ def run_openfold2(hhr_block: str , tag: str ) - > pathlib.Path: payload = { "sequence" : SEQ, "alignments" : { "uniref90-2024_02" : { "a3m" : { "alignment" : a3m_alignment, "format" : "a3m" } } }, "templates" : { "pdb70" : { "hhr" : { "alignment" : hhr_block, "format" : "hhr" } } }, "selected_models" : [ 1 ] # run a single model for speed; omit for ensemble } r = requests.post(OF2_URL, headers = HEADERS, data = json.dumps(payload), timeout = 1800 ) r.raise_for_status() pdb_text = r.json()[ "predictions" ][ 0 ][ "structure" ] out = pathlib.Path(f "hClpP_{tag}.pdb" ) out.write_text(pdb_text) print (f "Wrote {out}" ) return out active_pdb = run_openfold2(hhr_active, "active" ) inactive_pdb = run_openfold2(hhr_inactive, "inactive" ) |
Future directions in GPU-accelerated bioinformatics
Advancements in sequence alignment, from BLAST to MMseqs2-GPU, have revolutionized protein science, enabling faster insights into function, evolution, and drug discovery.
This tool is already used for synthetic dataset generation, an essential part of the AI model development lifecycle, and accelerated inference for test-time scaling and real-time predictions. MMseqs2-GPU is already widely adopted in the industry, including leading companies like Basecamp Research, VantAI, and Iambic Therapeutics.
As AI-driven models integrate alignment into predictive workflows, GPU acceleration is redefining molecular research. The convergence of AI, HPC, and bioinformatics promises even greater breakthroughs, accelerating discoveries in medicine and biotechnology. Learn more about the NVIDIA BioNeMo Blueprint for generative protein binder design.
Try MMSeqs2-GPU as the NVIDIA MSA-Search NIM and OpenFold as the NVIDIA OpenFold2 NIM.