What is UniProt?
Definition
UniProt (Universal Protein Resource) is the most comprehensive, freely accessible database of protein sequence and functional information. It is maintained by a consortium of the European Bioinformatics Institute (EMBL-EBI), the Swiss Institute of Bioinformatics (SIB), and the Protein Information Resource (PIR). UniProt contains over 250 million protein sequences, with its curated section (Swiss-Prot) providing expert-reviewed annotations for approximately 570,000 proteins.
Detailed Explanation
UniProt is organized into three main components. Swiss-Prot contains manually annotated, reviewed protein entries — each curated by expert biologists who synthesize information from published literature, large-scale studies, and computational analyses. TrEMBL contains automatically annotated entries that have not yet been manually reviewed. UniRef provides clustered protein sequence sets at different identity thresholds (100%, 90%, 50%) for efficient similarity searches. For drug design and pharmacogenomics, Swiss-Prot is the gold standard because its annotations include detailed information on protein function, domain architecture, post-translational modifications, disease associations, and variant effects.
The power of UniProt for drug design lies in its feature annotations. Each protein entry includes the complete amino acid sequence, known functional domains (e.g., active sites, binding sites, transmembrane regions), experimentally characterized variants with their phenotypic effects, tissue expression patterns, protein-protein interactions, and cross-references to over 170 external databases including PDB (3D structures), Pfam (protein families), and Reactome (biological pathways). This interconnected annotation allows researchers to understand not just what a protein does, but how specific genetic variants alter its function at the molecular level.
PepFold uses UniProt as its primary protein knowledge base. After retrieving variant information from ClinVar, the pipeline queries UniProt to obtain the full protein sequence, identify the functional domain affected by the variant, retrieve known 3D structural information, and map the variant's position relative to active sites, binding pockets, and protein-protein interaction interfaces. This structural and functional context drives the peptide design algorithm: candidates are generated to target specific protein domains with full awareness of how the patient's genetic variant alters the target protein's structure and function.
Related Terms
ClinVar is a freely accessible public database maintained by the National Center for Biotechnology Information (NCBI) that aggregates information about the relationships between human genetic variants and observed health conditions (phenotypes). Submitters — including clinical laboratories, research groups, and expert panels — classify variants using a standardized five-tier system: pathogenic, likely pathogenic, uncertain significance (VUS), likely benign, and benign.
What is ESMFold?ESMFold is a protein structure prediction model developed by Meta AI (formerly Facebook AI Research) that predicts the three-dimensional structure of a protein directly from its amino acid sequence. Unlike AlphaFold, ESMFold does not require multiple sequence alignments (MSAs), enabling predictions in seconds rather than minutes, which makes it particularly suitable for high-throughput peptide design pipelines.
What is Binding Affinity?Binding affinity is a quantitative measure of the strength of interaction between two molecules, typically a drug (ligand) and its biological target (receptor or protein). It is most commonly expressed as the dissociation constant (Kd), which represents the concentration of ligand at which 50% of the target binding sites are occupied. A lower Kd indicates stronger binding — nanomolar (nM) or picomolar (pM) affinities are typical for effective drugs.
What is Pharmacogenomics?Pharmacogenomics (PGx) is the study of how an individual's genetic makeup influences their response to medications. It combines pharmacology (the science of drugs) and genomics (the study of genes and their functions) to develop effective, personalized drug therapies based on a patient's DNA.
Apply This Knowledge with PepFold
Submit rsIDs and get ranked peptide candidates with 3D structures and Fmoc-SPPS synthesis protocols in under 2 minutes.