Mapping

hgvs.variantmapper

class hgvs.variantmapper.EasyVariantMapper(hdp, primary_assembly=u'GRCh37', alt_aln_method=u'splign', replace_reference=True, normalize=True)

Bases: hgvs.variantmapper.VariantMapper

Provides simplified variant mapping for a single assembly and transcript-reference alignment method.

EasyVariantMapper is instantiated with a primary_assembly and alt_aln_method. These enable the following conveniences over VariantMapper:

  • The primary assembly and alignment method are used to automatically select an appropriate chromosomal reference sequence when mapping from a transcript to a genome (i.e., c_to_g(...) and n_to_g(...)).
  • A new method, relevant_trancripts(g_variant), returns a list of transcript accessions available for the specified variant. These accessions are candidates mapping from genomic to trancript coordinates (i.e., g_to_c(...) and g_to_n(...)).

IMPORTANT: Callers should be prepared to catch HGVSError exceptions. These will be thrown whenever a transcript maps ambiguously to a chromosome, such as for pseudoautosomal region transcripts.

Parameters:
  • primary_assembly (str) – assembly name (‘GRCh37’)
  • alt_aln_method (str) – genome-transcript alignment method (‘splign’, ‘blat’, ‘genewise’)
  • replace_reference (bool) – replace reference (entails additional network access)
  • normalize (bool) – normalize variants
Raises:

HGVSError subclasses – for a variety of mapping and data lookup failures

c_to_g(var_c)
c_to_n(var_c)
c_to_p(var_c)
g_to_c(var_g, tx_ac)
g_to_n(var_g, tx_ac)
n_to_c(var_n)
n_to_g(var_n)
relevant_transcripts(var_g)

return list of transcripts accessions (strings) for given variant, selected by genomic overlap

class hgvs.variantmapper.VariantMapper(hdp)

Bases: object

Maps SequenceVariant objects between g., n., r., c., and p. representations.

g⟷{c,n,r} projections are similar in that c, n, and r variants may use intronic coordinates. There are two essential differences that distinguish the three types:

  • Sequence start: In n and r variants, position 1 is the sequence start; in c variants, 1 is the transcription start site.
  • Alphabet: In n and c variants, sequences are DNA; in r. variants, sequences are RNA.

This differences are summarized in this diagram:

g ----acgtatgcac--gtctagacgt----      ----acgtatgcac--gtctagacgt----      ----acgtatgcac--gtctagacgt----
      \         \/         /              \         \/         /              \         \/         /
c      acgtATGCACGTCTAGacgt         n      acgtatgcacgtctagacgt         r      acguaugcacgucuagacgu   
           1                               1                                   1
p          MetHisValTer

The g excerpt and exon structures are identical. The g⟷n transformation, which is the most basic, accounts for the offset of the aligned sequences (shown with “1”) and the exon structure. The g⟷c transformation is akin to g⟷n transformation, but requires an addition offset to account for the translation start site (c.1). The CDS in uppercase. The g⟷c transformation is akin to g⟷n transformation with a change of alphabet.

Therefore, this this code uses g⟷n as the core transformation between genomic and c, n, and r variants: All c⟷g and r⟷g transformations use n⟷g after accounting for the above differences. For example, c_to_g accounts for the transcription start site offset, then calls n_to_g.

All methods require and return objects of type hgvs.variant.SequenceVariant.

c_to_g(var_c, alt_ac, alt_aln_method=u'splign')

Given a parsed c. variant, return a g. variant on the specified transcript using the specified alignment method (default is ‘splign’ from NCBI).

Parameters:
  • var_c (hgvs.variant.SequenceVariant) – a variant object
  • alt_ac (str) – a reference sequence accession (e.g., NC_000001.11)
  • alt_aln_method (str) – the alignment method; valid values depend on data source
Returns:

variant object (hgvs.variant.SequenceVariant)

Raises:

HGVSInvalidVariantError – if var_c is not of type ‘c’

c_to_n(var_c)

Given a parsed c. variant, return a n. variant on the specified transcript using the specified alignment method (default is ‘transcript’ indicating a self alignment).

Parameters:var_c (hgvs.variant.SequenceVariant) – a variant object
Returns:variant object (hgvs.variant.SequenceVariant)
Raises:HGVSInvalidVariantError – if var_c is not of type ‘c’
c_to_p(var_c, pro_ac=None)

Converts a c. SequenceVariant to a p. SequenceVariant on the specified protein accession Author: Rudy Rico

Parameters:
Return type:

hgvs.variant.SequenceVariant

g_to_c(var_g, tx_ac, alt_aln_method=u'splign')

Given a parsed g. variant, return a c. variant on the specified transcript using the specified alignment method (default is ‘splign’ from NCBI).

Parameters:
  • var_g (hgvs.variant.SequenceVariant) – a variant object
  • tx_ac (str) – a transcript accession (e.g., NM_012345.6 or ENST012345678)
  • alt_aln_method (str) – the alignment method; valid values depend on data source
Returns:

variant object (hgvs.variant.SequenceVariant) using CDS coordinates

Raises:

HGVSInvalidVariantError – if var_g is not of type ‘g’

g_to_n(var_g, tx_ac, alt_aln_method=u'splign')

Given a parsed g. variant, return a n. variant on the specified transcript using the specified alignment method (default is ‘splign’ from NCBI).

Parameters:
  • var_g (hgvs.variant.SequenceVariant) – a variant object
  • tx_ac (str) – a transcript accession (e.g., NM_012345.6 or ENST012345678)
  • alt_aln_method (str) – the alignment method; valid values depend on data source
Returns:

variant object (hgvs.variant.SequenceVariant) using transcript (n.) coordinates

Raises:

HGVSInvalidVariantError – if var_g is not of type ‘g’

n_to_c(var_n)

Given a parsed n. variant, return a c. variant on the specified transcript using the specified alignment method (default is ‘transcript’ indicating a self alignment).

Parameters:var_n (hgvs.variant.SequenceVariant) – a variant object
Returns:variant object (hgvs.variant.SequenceVariant)
Raises:HGVSInvalidVariantError – if var_n is not of type ‘n’
n_to_g(var_n, alt_ac, alt_aln_method=u'splign')

Given a parsed n. variant, return a g. variant on the specified transcript using the specified alignment method (default is ‘splign’ from NCBI).

Parameters:
  • var_n (hgvs.variant.SequenceVariant) – a variant object
  • alt_ac (str) – a reference sequence accession (e.g., NC_000001.11)
  • alt_aln_method (str) – the alignment method; valid values depend on data source
Returns:

variant object (hgvs.variant.SequenceVariant)

Raises:

HGVSInvalidVariantError – if var_n is not of type ‘n’

hgvs.intervalmapper

class hgvs.intervalmapper.CIGARElement(len, op)

Bases: object

represents elements of a CIGAR string and provides methods for determining the number of ref and tgt bases consumed by the operation

len
op
ref_len

returns number of nt/aa consumed in reference sequence for this edit

tgt_len

returns number of nt/aa consumed in target sequence for this edit

class hgvs.intervalmapper.Interval(start_i, end_i)

Bases: object

Represents a segment of a sequence in interbase coordinates (0-based, right-open).

end_i
len
start_i
class hgvs.intervalmapper.IntervalMapper(interval_pairs)

Bases: object

Provides mapping between sequence coordinates according to an ordered set of IntervalPairs.

Parameters:interval_pairs (list (of IntervalPair instances)) – an ordered list of IntervalPair instances
Returns:an IntervalMapper instance
static from_cigar(cigar)
Parameters:cigar (str.) – a Compact Idiosyncratic Gapped Alignment Report string
Returns:an IntervalMapper instance from the CIGAR string
interval_pairs
map_ref_to_tgt(start_i, end_i, max_extent=False)
map_tgt_to_ref(start_i, end_i, max_extent=False)
ref_intervals
ref_len
tgt_intervals
tgt_len
class hgvs.intervalmapper.IntervalPair(ref, tgt)

Bases: object

Represents a match, insertion, or deletion segment of an alignment. If a match, the lengths must be equal; if an insertion or deletion, the length of the ref or tgt must be zero respectively.

ref
tgt
hgvs.intervalmapper.cigar_to_intervalpairs(cigar)

For a given CIGAR string, return a list of (Interval,Interval) pairs. The length of the returned list will be equal to the number of CIGAR operations

hgvs.projector

class hgvs.projector.Projector(hdp, alt_ac, src_ac, dst_ac, src_alt_aln_method=u'splign', dst_alt_aln_method=u'splign')

Bases: object

The Projector class implements liftover between two transcripts via a common reference sequence.

Parameters:
  • hdp – HGVS Data Provider Interface-compliant instance (see hgvs.dataproviders.interface.Interface)
  • ref – string representing the common reference assembly (e.g., GRCh37.p10)
  • src_ac – string representing the source transcript accession (e.g., NM_000551.2)
  • dst_ac – string representing the destination transcript accession (e.g., NM_000551.3)
  • src_alt_aln_method – string representing the source transcript alignment method
  • dst_alt_aln_method – string representing the destination transcript alignment method

This class assumes (and verifies) that the transcripts are on the same strand. This assumption obviates some work in flipping sequence variants twice unnecessarily.

project_interval_backward(c_interval)

project c_interval on the destination transcript to the source transcript

Parameters:c_interval – an hgvs.interval.Interval object on the destination transcript
Returns:c_interval: an hgvs.interval.Interval object on the source transcript
project_interval_forward(c_interval)

project c_interval on the source transcript to the destination transcript

Parameters:c_interval – an hgvs.interval.Interval object on the source transcript
Returns:c_interval: an hgvs.interval.Interval object on the destination transcript
project_variant_backward(c_variant)

project c_variant on the source transcript onto the destination transcript

Parameters:c_variant – an hgvs.variant.SequenceVariant object on the source transcript
Returns:c_variant: an hgvs.variant.SequenceVariant object on the destination transcript
project_variant_forward(c_variant)

project c_variant on the source transcript onto the destination transcript

Parameters:c_variant – an hgvs.variant.SequenceVariant object on the source transcript
Returns:c_variant: an hgvs.variant.SequenceVariant object on the destination transcript

hgvs.transcriptmapper

class hgvs.transcriptmapper.TranscriptMapper(hdp, tx_ac, alt_ac, alt_aln_method)

Bases: object

Provides coordinate (not variant) mapping operations between genomic (g), rna (r), cds (c), and protein (p) coordinates. All coordinates are 1-based inclusive, per the HGVS recommendations. All methods take hgvs.location.Interval objects.

Parameters:
  • hdp – HGVS Data Provider Interface-compliant instance (see hgvs.dataproviders.interface.Interface)
  • tx_ac (str) – string representing transcript accession (e.g., NM_000551.2)
  • alt_ac (str) – string representing the reference sequence accession (e.g., NM_000551.3)
  • alt_aln_method (str) – string representing the alignment method; valid values depend on data source
c_to_g(c_interval)

convert a transcript CDS (c.) interval to a genomic (g.) interval

c_to_n(c_interval)

convert a transcript CDS (c.) interval to a transcript cDNA (n.) interval

g_to_c(g_interval)

convert a genomic (g.) interval to a transcript CDS (c.) interval

g_to_n(g_interval)

convert a genomic (g.) interval to a transcript cDNA (n.) interval

n_to_c(n_interval)

convert a transcript cDNA (n.) interval to a transcript CDS (c.) interval

n_to_g(n_interval)

convert a transcript cDNA (n.) interval to a genomic (g.) interval