Top-level module

hgvs is a package to parse, format, and manipulate biological sequence variants. See https://github.com/biocommons/hgvs/ for details.

Example use:

>>> import hgvs.dataproviders.uta
>>> import hgvs.parser
>>> import hgvs.variantmapper

# start with these variants as strings >>> hgvs_g, hgvs_c = “NC_000007.13:g.36561662C>T”, “NM_001637.3:c.1582G>A”

# parse the genomic variant into a Python structure >>> hp = hgvs.parser.Parser() >>> var_g = hp.parse_hgvs_variant(hgvs_g) >>> var_g SequenceVariant(ac=NC_000007.13, type=g, posedit=36561662C>T, gene=None)

# SequenceVariants are composed of structured objects, e.g., >>> var_g.posedit.pos.start SimplePosition(base=36561662, uncertain=False)

# format by stringification >>> str(var_g) ‘NC_000007.13:g.36561662C>T’

# initialize the mapper for GRCh37 with splign-based alignments >>> hdp = hgvs.dataproviders.uta.connect() >>> am = hgvs.assemblymapper.AssemblyMapper(hdp, … assembly_name=”GRCh37”, alt_aln_method=”splign”, … replace_reference=True)

# identify transcripts that overlap this genomic variant >>> transcripts = am.relevant_transcripts(var_g) >>> sorted(transcripts) [‘NM_001177506.1’, ‘NM_001177507.1’, ‘NM_001637.3’]

# map genomic variant to one of these transcripts >>> var_c = am.g_to_c(var_g, “NM_001637.3”) >>> var_c SequenceVariant(ac=NM_001637.3, type=c, posedit=1582G>A, gene=None) >>> str(var_c) ‘NM_001637.3:c.1582G>A’

# CDS coordinates use BaseOffsetPosition to support intronic offsets >>> var_c.posedit.pos.start BaseOffsetPosition(base=1582, offset=0, datum=Datum.CDS_START, uncertain=False)