PepFold

Glossary

What is ClinVar?

Definition

ClinVar is a freely accessible public database maintained by the National Center for Biotechnology Information (NCBI) that aggregates information about the relationships between human genetic variants and observed health conditions (phenotypes). Submitters — including clinical laboratories, research groups, and expert panels — classify variants using a standardized five-tier system: pathogenic, likely pathogenic, uncertain significance (VUS), likely benign, and benign.

Detailed Explanation

ClinVar was launched in 2013 to address a critical gap in clinical genetics: the lack of a centralized, publicly accessible repository of variant interpretations. Before ClinVar, genetic testing laboratories maintained proprietary databases, leading to inconsistent variant classifications across labs. ClinVar aggregates submissions from hundreds of sources, including major clinical labs like GeneDx, Invitae, and Ambry Genetics, as well as expert curation panels like ClinGen. As of 2024, the database contains over 2.5 million submitted variant interpretations covering more than 1.2 million unique variants.

The five-tier classification system follows guidelines established by the American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology (AMP) in 2015. A 'pathogenic' classification means there is strong evidence that the variant causes disease. 'Likely pathogenic' indicates greater than 90% certainty. 'Variant of uncertain significance' (VUS) means the evidence is insufficient or conflicting — this is the most challenging category for clinicians, as it provides no actionable guidance. The 'likely benign' and 'benign' classifications indicate the variant is not expected to cause disease. Conflicts between submitters are flagged, and star ratings (0-4 stars) indicate the level of review.

PepFold queries ClinVar as the first step in its pharmacogenomic pipeline. When a user submits an rsID, the system retrieves the variant's clinical significance, associated conditions, review status, and molecular consequence from ClinVar's API. This annotation determines how the variant is handled downstream: pathogenic variants in drug-metabolizing enzymes trigger different peptide design strategies than risk-factor variants in disease-associated proteins. The ClinVar data feeds directly into PepFold's variant annotation layer, ensuring every peptide candidate is designed with full clinical context.

Related Terms

What is a SNP (Single Nucleotide Polymorphism)?

A single nucleotide polymorphism (SNP, pronounced 'snip') is a variation at a single position in a DNA sequence among individuals. SNPs are the most common type of genetic variation in humans, with approximately 4-5 million SNPs per individual genome and over 660 million cataloged in the dbSNP database.

What is an rsID?

An rsID (reference SNP cluster ID) is a unique identifier assigned by the NCBI dbSNP database to a specific genetic variant, typically a single nucleotide polymorphism (SNP). The format is 'rs' followed by a number — for example, rs429358 identifies the APOE4-defining variant. rsIDs serve as the universal language for referencing genetic variants across research, clinical testing, and bioinformatics tools.

What is Pharmacogenomics?

Pharmacogenomics (PGx) is the study of how an individual's genetic makeup influences their response to medications. It combines pharmacology (the science of drugs) and genomics (the study of genes and their functions) to develop effective, personalized drug therapies based on a patient's DNA.

What is UniProt?

UniProt (Universal Protein Resource) is the most comprehensive, freely accessible database of protein sequence and functional information. It is maintained by a consortium of the European Bioinformatics Institute (EMBL-EBI), the Swiss Institute of Bioinformatics (SIB), and the Protein Information Resource (PIR). UniProt contains over 250 million protein sequences, with its curated section (Swiss-Prot) providing expert-reviewed annotations for approximately 570,000 proteins.

Related SNPs

Apply This Knowledge with PepFold

Submit rsIDs and get ranked peptide candidates with 3D structures and Fmoc-SPPS synthesis protocols in under 2 minutes.