1.0.0 (2017-04-08)

Changes since 0.4.0 (2015-09-09).

This is a major release of the hgvs package that includes new features and behavior changes. Some client code may require minor modification. (Note: Previously, we had tentatively called this release 0.5.0. The magnitude of the changes and the desire to migrate to public API versioning led us release as 1.0.0.)

See Installing hgvs for installation instructions.

Major Features and Changes

This section highlights import behavior or interface changes relative to the 0.4.x series. Code modifications are likely for most of the features listed below.

★ EasyVariantMapper renamed to AssemblyMapper, now with GRCh38 and PAR support. EasyVariantMapper was renamed to AssemblyMapper to better reflect its role. AssemblyMapper defaults to GRCh38. Transcripts in pseudoautosomal regions have alignments to X and Y. Previously, EVM would refuse to guess which alignment to use and raise an exception. AssemblyMapper has a new argument, in_par_assume, which enables callers to prefer X or Y alignments.

★ VariantMapper validates variants before mapping. Several bug reports resulted from attempting to project invalid variants, such as as variants with insertions between non-adjacent nucleotides. These generated exceptions or unexpected results. Intrinsic validation is now peformed before mapping and normalization, and callers may wish to catch these.

★ Fully local installations – no network access required. hgvs requires access to transcripts and sequences for most operations. By default, hgvs will use public remote resources at runtime, which incurs significant latency and, in principle, presents a minor privacy concern. While UTA has always been available for local installation, the more significant delay was in sequence lookup. A new package, SeqRepo, provides a local sequence database that is synchronized with UTA. When used together, these packages completely obviate network connectivity and improve speed by approximately 50x.

★ hgvs transitions to public API versioning conventions. By transitioning from major version 0 (“initial development”) to 1 (“public API”), we are indicating that the API is expected to be stable. In practice, this change will mean that x.y.z versions will clearly distinguish bug fix releases (increment z), backward-compatible feature additions (increment y), and API incompatible changes (increment x). See Semantic Versioning for more information.

★ Changes in p. formatting to better conform to current varnomen recommendations. Inferred changes with unknown p. effects are now represented with p.? rather than p.(?) (#393). In addition, silent SNV mutations now include the amino acid, as in NP_000050.2:p.Lys2597= rather than NP_000050.2:p.(=) (#317). Both changes improve conformance with current varnome recommendations.

★ By default, do not show reference sequence in dels and dups. For example, NM_000059.3:c.22_23delAAinsT would be shown as NM_000059.3:c.22_23delinsT. Users may configure max_ref_length (default 0) order to change this behavior (#404).

★ BaseOffsetPosition datums now use enums, defined in hgvs.enums. For example, previous hgvs.location.SEQ_START must be replaced with hgvs.enums.Datum.SEQ_START (#396).

★ Unknown protein effect are now internally represented with `var.posedit=None`. This case is printed as NP_01234.5:p.? (#412).

Deprecations and Removals

  • #349: drop support for dupN [0dbe440]
  • #360: HGVSValidationError removed; used HGVSInvalidVariantError instead.

Bug Fixes

  • #284: validation fails for g variants [512b882]
  • #292: Fix bug in validator when validating m. variants and add tests [12d9b48]
  • #308: validation across CDS start and CDS end boundaries [ac066ee], [8c8db03]
  • #346: ensure that alignment starts at transcript position 0 [3af24b3], [9f29c87]
  • #381: fix bug attempting to normalize p. variants; add issue test (test_issues.py) [834bed9]
  • #393: fix inconsistent representation of unknown p. effect when inferred by c_to_p [3f1ac48]
  • #409: Fix bug in normalizer when normalizing ident variant that is written as delins [94607ecc30da]
  • #415: Limit assembly mapper to NC accessions [6056fd4414df]

New Features

  • #105: configurable formatting [c8b9fd1]
  • #249: Move intrinsic validation to models
  • #253: Add p. validation support [3d3b9da], [ba943ae]
  • #256: rename EasyVariantMapper to AssemblyMapper to better indicate functionality [d6697d6]
  • #258, #330, #342: ensure that start and end datums are compatible [247d8bf]
  • #260, #322: added tests to verify that exceptions are raised when mapping invalid variants [ac37ae0]
  • #270: add is_intronic method to BaseOffsetPosition [4e40866]
  • #282: preserve position in “identity” variants (e.g., norm(c.123A>A) => c.123= rather than c.=) [cc523ed]
  • #295: raise exception when validating deletion length for intronic variants [4575ed8]
  • #309: make errors more informative when coordinate is outside sequence bounds [d667d8b], [f4cd048]
  • #314: support parsing identity variants [9116c72]
  • #315: Validate accession type pairs [be90d50]
  • #316: provide generalized transcript-to-genome projects to handle coding and non-coding transcripts transparently [26006c7]
  • #317: Output silent p. variants according to HGVS recommendations [4babbb5]
  • #319: added PosEdit.length_change() method [c191567], [c71a48b], [c70fded]
  • #326: provide special handling for disambiguating alignments in pseudoautosomal regions [acc560e]
  • #330: datum-matching [e05674b], [461ccd7]
  • #336: add hgvs-shell as excutable for exploration, debugging, bug submission [f6b6c3c]
  • #356: add position comparision operators [4f7f7e4]
  • #365: graded validation
  • #366: move validation to variantmapper
  • #372: rename hgvs/variant.py to hgvs/sequencevariant.py [2f69d65], [ad604fd]
  • #379: move replace_reference to variantmapper (from evm) [c0f4be1]
  • #386: reject discontiguous alignments [ea2527c]
  • #391: Attempt reconnection if db connection is lost [2aef5fac3a61]
  • #399: validators should raise only HGVSInvalidVariantError exceptions
  • #404: Implement max_ref_length in formatter and don’t show reference sequence by default

Other Changes

  • #276: raise error when user attempts to map to/from c. with non-coding transcript [aaa0ff5]
  • #363: update railroad diagram [3e23e10]

Internal and Developer Changes

  • #236: migrate from built-in seqfetcher to bioutils seqfetcher [5e9a951]
  • #237: Mock testing data; dropped unmaintained sqlite-based tests
  • #287: ensure that modules are also getting doctested (continues #287) [3cbe93a]
  • #347: Replace recordtype with attrs
  • #395: get ThreadedConnectionPool sizes from config [a2a216c]
  • #397: use hgvs.config for VariantMapper.__init__() [154cf5e]
  • #400: make hdp cache mode (for testing) settable via HGVS_CACHE_MODE environment variable [09303c7]
  • removed sqlite-based uta dataprovider; updated reqs in etc [e8c9d8d85d35]