Creating a SequenceVariant from scratch

0. Overview

A SequenceVariant consists of an accession (a string), a sequence type (a string), and a PosEdit, like this:

var = hgvs.sequencevariant.SequenceVariant(ac=‘NM_01234.5’, type=‘c’, posedit=…)

Unsurprisingly, a PosEdit consists of separate position and Edit objects. A position is generally an Interval, which in turn is comprised of SimplePosition or BaseOffsetPosition objects. An edit is a subclass of Edit, which includes classes like NARefAlt for substitutions, deletions, and insertions) and Dup (for duplications).

Importantly, each of the objects we’re building has a rule in the parser, which means that you have the tools to serialize and deserialize (parse) each of the components that we’re about to construct.

1. Make an Interval to define a position of the edit

(BaseOffsetPosition(base=200, offset=-6, datum=Datum.CDS_START, uncertain=False),
 '200-6')
(BaseOffsetPosition(base=22, offset=0, datum=Datum.CDS_END, uncertain=False),
 '*22')
(Interval(start=200-6, end=*22, uncertain=False), '200-6_*22')

2. Make an edit object

(NARefAlt(ref='A', alt='T', uncertain=False), 'A>T')
(PosEdit(pos=200-6_*22, edit=A>T, uncertain=False), '200-6_*22A>T')

3. Make the variant

(SequenceVariant(ac=NM_01234.5, type=c, posedit=200-6_*22A>T),
 'NM_01234.5:c.200-6_*22A>T')

Important: It is possible to bogus variants with the hgvs package. For example, the above interval is incompatible with a SNV. See hgvs.validator.Validator for validation options.

4. Update your variant

The stringification happens on-the-fly. That means that you can update components of the variant and see the effects immediately.

'NM_01234.5:c.456-6_*22A>T'
'NM_01234.5:c.200-6_*22delinsCT'
'NM_01234.5:c.200-6_*22A>T'