Parsing and Formatting¶
Provides parser for HGVS strings and HGVS-related conceptual components, such as intronic-offset coordiates
Provides comprehensive parsing of HGVS varaint strings (i.e., variants represented according to the Human Genome Variation Society recommendations) into Python representations. The class wraps a Parsing Expression Grammar, exposing rules of that grammar as methods (prefixed with parse_) that parse an input string according to the rule. The class exposes all rules, so that it’s possible to parse both full variant representations as well as components, like so:
>>> hp = Parser() >>> v = hp.parse_hgvs_variant("NM_01234.5:c.22+1A>T") >>> v SequenceVariant(ac=NM_01234.5, type=c, posedit=22+1A>T) >>> v.posedit.pos BaseOffsetInterval(start=22+1, end=22+1, uncertain=False) >>> i = hp.parse_c_interval("22+1") >>> i BaseOffsetInterval(start=22+1, end=22+1, uncertain=False)
The parse_hgvs_variant and parse_c_interval methods correspond to the hgvs_variant and c_interval rules in the grammar, respectively.
Because the methods are generated on-the-fly and depend on the grammar that is loaded at runtime, a full list of methods is not available in the documentation. However, the list of rules/methods is available via the rules instance variable.
A few notable methods are listed below:
parse_hgvs_variant() parses any valid HGVS string supported by the grammar.
>>> hp.parse_hgvs_variant("NM_01234.5:c.22+1A>T") SequenceVariant(ac=NM_01234.5, type=c, posedit=22+1A>T) >>> hp.parse_hgvs_variant("NP_012345.6:p.Ala22Trp") SequenceVariant(ac=NP_012345.6, type=p, posedit=Ala22Trp)
The hgvs_variant rule iteratively attempts parsing using the major classes of HGVS variants. For slight improvements in efficiency, those rules may be invoked directly:
>>> hp.parse_p_variant("NP_012345.6:p.Ala22Trp") SequenceVariant(ac=NP_012345.6, type=p, posedit=Ala22Trp)
Similarly, components of the underlying structure may be parsed directly as well:
>>> hp.parse_c_posedit("22+1A>T") PosEdit(pos=22+1, edit=A>T, uncertain=False) >>> hp.parse_c_interval("22+1") BaseOffsetInterval(start=22+1, end=22+1, uncertain=False)