Creating a SequenceVariant from scratch

0. Overview

A SequenceVariant consists of an accession (a string), a sequence type (a string), and a PosEdit, like this:

var = hgvs.variant.SequenceVariant(ac=’NM_01234.5’, type=’c’, posedit=...)

Unsurprisingly, a PosEdit consists of separate position and Edit objects. A position is generally an Interval, which in turn is comprised of SimplePosition or BaseOffsetPosition objects. An edit is a subclass of Edit, which includes classes like NARefAlt for substitutions, deletions, and insertions) and Dup (for duplications).

Importantly, each of the objects we’re building has a rule in the parser, which means that you have the tools to serialize and deserialize (parse) each of the components that we’re about to construct.

1. Make an Interval to defined a position of the edit

import hgvs.location
import hgvs.posedit
start = hgvs.location.BaseOffsetPosition(base=200,offset=-6,datum=hgvs.location.CDS_START)
start, str(start)
(BaseOffsetPosition(base=200, offset=-6, datum=1, uncertain=False), '200-6')
end = hgvs.location.BaseOffsetPosition(base=22,datum=hgvs.location.CDS_END)
end, str(end)
(BaseOffsetPosition(base=22, offset=0, datum=2, uncertain=False), '*22')
iv = hgvs.location.Interval(start=start,end=end)
iv, str(iv)
(Interval(start=200-6, end=*22, uncertain=False), '200-6_*22')

2. Make an edit object

import hgvs.edit, hgvs.posedit
edit = hgvs.edit.NARefAlt(ref='A',alt='T')
edit, str(edit)
(NARefAlt(ref=A, alt=T, uncertain=False), 'A>T')
posedit = hgvs.posedit.PosEdit(pos=iv,edit=edit)
posedit, str(posedit)
(PosEdit(pos=200-6_*22, edit=A>T, uncertain=False), '200-6_*22A>T')

3. Make the variant

import hgvs.variant
var = hgvs.variant.SequenceVariant(ac='NM_01234.5', type='c', posedit=posedit)
var, str(var)
(SequenceVariant(ac=NM_01234.5, type=c, posedit=200-6_*22A>T),
 'NM_01234.5:c.200-6_*22A>T')

Important: The hgvs package intentionally permits callers to create invalid variants. For example, the above interval is incompatible with a SNV. See hgvs.validator.Validator for validation options.

4. Update your variant

The stringification happens on-the-fly. That means that you can update components of the variant and see the effects immediately.

import copy
var2 = copy.deepcopy(var)
var2.posedit.pos.start.base=456
str(var2)
'NM_01234.5:c.456-6_*22A>T'
var2 = copy.deepcopy(var)
var2.posedit.edit.alt='CT'
str(var2)
'NM_01234.5:c.200-6_*22delAinsCT'
var2 = copy.deepcopy(var)
var2.posedit.pos.end.uncertain=True
str(var2)
'NM_01234.5:c.200-6_(*22)A>T'