Parsing and Formatting¶
hgvs.parser
¶
Provides parser for HGVS strings and HGVS-related conceptual components, such as intronic-offset coordiates
-
class
hgvs.parser.
Parser
(grammar_fn='/home/docs/.cache/Python-Eggs/hgvs-1.4.0-py3.7.egg-tmp/hgvs/_data/hgvs.pymeta', expose_all_rules=False)[source]¶ Bases:
object
Provides comprehensive parsing of HGVS varaint strings (i.e., variants represented according to the Human Genome Variation Society recommendations) into Python representations. The class wraps a Parsing Expression Grammar, exposing rules of that grammar as methods (prefixed with parse_) that parse an input string according to the rule. The class exposes all rules, so that it’s possible to parse both full variant representations as well as components, like so:
>>> hp = Parser() >>> v = hp.parse_hgvs_variant("NM_01234.5:c.22+1A>T") >>> v SequenceVariant(ac=NM_01234.5, type=c, posedit=22+1A>T, gene=None) >>> v.posedit.pos BaseOffsetInterval(start=22+1, end=22+1, uncertain=False) >>> i = hp.parse_c_interval("22+1") >>> i BaseOffsetInterval(start=22+1, end=22+1, uncertain=False)
The parse_hgvs_variant and parse_c_interval methods correspond to the hgvs_variant and c_interval rules in the grammar, respectively.
As a convenience, the Parser provides the parse method as a shorthand for parse_hgvs_variant: >>> v = hp.parse(“NM_01234.5:c.22+1A>T”) >>> v SequenceVariant(ac=NM_01234.5, type=c, posedit=22+1A>T, gene=None)
Because the methods are generated on-the-fly and depend on the grammar that is loaded at runtime, a full list of methods is not available in the documentation. However, the list of rules/methods is available via the rules instance variable.
A few notable methods are listed below:
parse_hgvs_variant() parses any valid HGVS string supported by the grammar.
>>> hp.parse_hgvs_variant("NM_01234.5:c.22+1A>T") SequenceVariant(ac=NM_01234.5, type=c, posedit=22+1A>T, gene=None) >>> hp.parse_hgvs_variant("NP_012345.6:p.Ala22Trp") SequenceVariant(ac=NP_012345.6, type=p, posedit=Ala22Trp, gene=None)
The hgvs_variant rule iteratively attempts parsing using the major classes of HGVS variants. For slight improvements in efficiency, those rules may be invoked directly:
>>> hp.parse_p_variant("NP_012345.6:p.Ala22Trp") SequenceVariant(ac=NP_012345.6, type=p, posedit=Ala22Trp, gene=None)
Similarly, components of the underlying structure may be parsed directly as well:
>>> hp.parse_c_posedit("22+1A>T") PosEdit(pos=22+1, edit=A>T, uncertain=False) >>> hp.parse_c_interval("22+1") BaseOffsetInterval(start=22+1, end=22+1, uncertain=False)
-
parse
(v)[source]¶ parse HGVS variant v, returning a SequenceVariant
Parameters: v (str) – an HGVS-formatted variant as a string Return type: SequenceVariant
-