Variant Object Representation

hgvs.edit

Representation of edit operations in HGVS variants

NARefAlt and AARefAlt are abstractions of several major variant types. They are distinguished by whether the ref and alt elements of the structure. The HGVS grammar for NA and AA are subtly different (e.g., the ref AA in a protein substitution is part of the location).

class hgvs.edit.AAExt(ref=None, alt=None, aaterm=None, length=None, uncertain=False)[source]

Bases: hgvs.edit.Edit

aaterm
alt
format(conf=None)[source]
length
ref
type

return the type of this Edit

Returns:edit type (str)
uncertain
class hgvs.edit.AAFs(ref=None, alt=None, length=None, uncertain=False)[source]

Bases: hgvs.edit.Edit

alt
format(conf=None)[source]
length
ref
type

return the type of this Edit

Returns:edit type (str)
uncertain
class hgvs.edit.AARefAlt(ref=None, alt=None, uncertain=False, init_met=False)[source]

Bases: hgvs.edit.Edit

alt
format(conf=None)[source]
init_met
ref
type

return the type of this Edit

Returns:edit type (str)
uncertain
class hgvs.edit.AASub(ref=None, alt=None, uncertain=False, init_met=False)[source]

Bases: hgvs.edit.AARefAlt

format(conf=None)[source]
type

return the type of this Edit

Returns:edit type (str)
class hgvs.edit.Conv(from_ac=None, from_type=None, from_pos=None, uncertain=False)[source]

Bases: hgvs.edit.Edit

Conversion

from_ac
from_pos
from_type
type

return the type of this Edit

Returns:edit type (str)
uncertain
class hgvs.edit.Dup(ref=None, uncertain=False)[source]

Bases: hgvs.edit.Edit

format(conf=None)[source]
ref
ref_s

returns a string representing the ref sequence, if it is not None and smells like a sequence

type

return the type of this Edit

Returns:edit type (str)
uncertain
class hgvs.edit.Edit[source]

Bases: object

format(conf=None)[source]
class hgvs.edit.Inv(ref=None, uncertain=False)[source]

Bases: hgvs.edit.Edit

Inversion

ref
ref_n

returns an integer, either from the seq instance variable if it’s a number, or None otherwise

ref_s
type

return the type of this Edit

Returns:edit type (str)
uncertain
class hgvs.edit.NACopy(copy=None, uncertain=False)[source]

Bases: hgvs.edit.Edit

Represent copy number variants (Invitae-specific use)

This class is intended for Invitae use only and does not represent a standard HGVS concept. The class may be changed, moved, or removed without notice.

copy
type

return the type of this Edit

Returns:edit type (str)
uncertain
class hgvs.edit.NARefAlt(ref=None, alt=None, uncertain=False)[source]

Bases: hgvs.edit.Edit

represents substitutions, deletions, insertions, and indels.

Variables:
  • ref – reference sequence or length
  • alt – alternate sequence
  • uncertain – boolean indicating whether the variant is uncertain/undetermined
alt
format(conf=None)[source]
ref
ref_n

returns an integer, either from the ref instance variable if it’s a number, or the length of ref if it’s a string, or None otherwise

>>> NARefAlt("ACGT").ref_n
4
>>> NARefAlt("7").ref_n
7
>>> NARefAlt(7).ref_n
7
ref_s

returns a string representing the ref sequence, if it is not None and smells like a sequence

>>> NARefAlt("ACGT").ref_s
u'ACGT'
>>> NARefAlt("7").ref_s
>>> NARefAlt(7).ref_s
type

return the type of this Edit

Returns:edit type (str)
uncertain
class hgvs.edit.Repeat(ref=None, min=None, max=None, uncertain=False)[source]

Bases: hgvs.edit.Edit

format(conf=None)[source]
max
min
ref
type

return the type of this Edit

Returns:edit type (str)
uncertain

hgvs.hgvsposition

Represent partial HGVS tags that refer to a position without alleles

class hgvs.hgvsposition.HGVSPosition(ac, type, pos, gene=None)[source]

Bases: object

HGVSPosition – Represent partial HGVS tags that refer to a position without alleles

Parameters:
  • ac (str) – sequence accession
  • type (str) – type of sequence and coordinate
  • pos (str) – sequence position
  • gene (str) – gene symbol (may be None)
ac
gene
pos
type

hgvs.location

Provides classes for dealing with the locations of HGVS variants

This module provides for Representing the location of variants in HGVS nomenclature, including:

  • integers and integer intervals (e.g., NC_012345.6:g.3403243_3403248A>C)
  • CDS positions and intervals (e.g., NM_01234.5:c.56+12_56+14delAC)
  • CDS stop coordinates (e.g., NM_01234.5:c.*13A>C)

Classes:

class hgvs.location.AAPosition(base=None, aa=None, uncertain=False)[source]

Bases: object

aa
base
format(conf=None)[source]
is_uncertain

return True if the position is marked uncertain or undefined

pos

return base, for backward compatibility

uncertain
validate()[source]
class hgvs.location.BaseOffsetInterval(start=None, end=None, uncertain=False)[source]

Bases: hgvs.location.Interval

BaseOffsetInterval isa Interval of BaseOffsetPositions. The only additional functionality over Interval is to ensure that the dutum of end and start are compatible.

check_datum()[source]
class hgvs.location.BaseOffsetPosition(base=None, offset=0, datum=<Datum.SEQ_START: 1>, uncertain=False)[source]

Bases: object

Class for dealing with CDS coordinates in transcript variants.

This class models CDS positions using a base coordinate, which is measured relative to a specified datum (CDS_START or CDS_END), and an offset, which is 0 for exonic positions and non-zero for intronic positions. Positions and offsets are 1-based, with no 0, per the HGVS recommendations. (If you”re using this with UTA, be aware that UTA uses interbase coordinates.)

hgvs datum base offset meaning
r.55 SEQ_START 55 0 RNA position 55
c.55 CDS_START 55 0 CDS position 55
c.55 CDS_START 55 0 CDS position 55
c.55+1 CDS_START 55 1 intronic variant +1 from boundary
c.-55 CDS_START -55 0 5’ UTR variant, 55 nt upstream of ATG
c.1 CDS_START 1 0 start codon
c.1234 CDS_START 1234 0 stop codon (assuming CDS length is 1233)
c.*1 CDS_END 0 1 STOP + 1
c.*55 CDS_END 0 55 3’ UTR variant, 55 nt after STOP
base
datum
format(conf)[source]
is_intronic

returns True if the variant is intronic (if the offset is None or non-zero)

is_uncertain

return True if the position is marked uncertain or undefined

offset
uncertain
validate()[source]
class hgvs.location.Interval(start=None, end=None, uncertain=False)[source]

Bases: object

end
format(conf=None)[source]
is_uncertain

return True if the position is marked uncertain or undefined

start
uncertain
validate()[source]
class hgvs.location.SimplePosition(base=None, uncertain=False)[source]

Bases: object

base
format(conf)[source]
is_uncertain

return True if the position is marked uncertain or undefined

uncertain
validate()[source]

hgvs.posedit

implements a (position,edit) tuple that represents a localized sequence change

class hgvs.posedit.PosEdit(pos=None, edit=None, uncertain=False)[source]

Bases: object

represents a simple variant, consisting of a single position and edit pair

edit
format(conf=None)[source]

Formatting the string of PosEdit

length_change(on_error_raise=True)[source]

Returns the net length change for this posedit.

The method for computing the net length change depends on the type of variant (dup, del, ins, etc). The length_change method hides this complexity from callers.

Parameters:
Returns:

A signed int for the net change in length. Negative values imply net deletions, 0 implies a balanced insertion and deletion (e.g., SNV), and positive values imply a net insertion.

Raises:

HGVSUnsupportedOperationError – When determining the length for this variant type is ill-defined or unsupported.

There are many circumstances in which the net length change cannot be determined, is ill-defined, or is unsupported. In these cases, the result depends on the value of on_error_raise: when on_error_raise is True, an exception is raised; when False, the exception is caught and None is returned. Callers might wish to pass on_error_raise=False in list comprehensions to avoid dealing with exceptions.

pos
uncertain
validate()[source]

hgvs.sequencevariant

represents simple sequence-based variants

class hgvs.sequencevariant.SequenceVariant(ac, type, posedit, gene=None)[source]

Bases: object

represents a basic HGVS variant. The only requirement is that each component can be stringified; for example, passing pos as either a string or an hgvs.location.CDSInterval (for example) are both intended uses

ac
fill_ref(hdp)[source]
format(conf=None)[source]

Formatting the stringification of sequence variants

Parameters:conf – a dict comprises formatting options. None is to use global settings.

See hgvs.config.

gene
posedit
type
validate()[source]