Frequently Asked Questions¶

Alignments for my transcript are not available. What can I do?¶

The short answer is: not much.

In order to project a variant between genomic and transcript coordinates, hgvs needs a sequence alignment. Sequence alignments are obtained from the Universal Transcript Archive (UTA), a compendium of transcripts and their genome alignments from multiple sources. Data are loaded from snapshots; the loading process is currently semi-automated and run irregularly.

UTA loads only high-quality alignments exactly as provided by the data sources. If an alignment is not provided by a data source, or if it fails filters recommended by NCBI, it won’t be in UTA (with a small number of exceptions). Importantly, NCBI provides alignment data only for current transcripts against current assemblies; historical data are not available.

So, there are two common reasons that an alignment may not exist in UTA:

The transcript was obsoleted before UTA started in 2014, or existed only between UTA snapshots.
The transcript does not have any high-quality alignments.

If an alignment for a particular transcript-reference sequence pair and for a particular alignment method are not available, an exception like the following will be raised:

HGVSDataNotAvailableError: No alignments for NM_000018.2 in GRCh37 using splign

Currently, there is no way for users to provide their own alignments.

For example, UTA contains ten alignments for NM_000314 family of transcripts for PTEN:

transcript	genome	method
NM_000314.4	AC_000142.1	splign
NM_000314.4	NC_000010.10	blat
NM_000314.4	NC_000010.10	splign
NM_000314.4	NC_018921.2	splign
NM_000314.4	NG_007466.2	splign
NM_000314.5	NC_000010.10	splign
NM_000314.6	NC_000010.10	blat
NM_000314.6	NC_000010.10	splign
NM_000314.6	NC_000010.11	splign
NM_000314.6	NW_013171807.1	splign

A variant can be projected between any of the transcript, genome, and method combinations, and no other combination.

Why do I get different results on the UCSC browser?¶

The UCSC Genome Browser uses alignments generated by BLAT, which gives different results than the official alignments generated by NCBI using splign. Although BLAT and splign typically agree, there are many small differences in ambiguous alignments and even some substantial differences in a small number of transcripts. In some cases, the differences might cause a variant to be interpreted as coding using a splign alignment and non-coding by a BLAT alignment, or vice versa. Furthermore, one typically doesn’t know which alignment set was used when publishing a variant. (Yes, that’s a hot mess.)

Why do I get different results with Mutalyzer?¶

Some transcript-genome alignments contain indels. hgvs is careful to account for these indel discrepancies when projecting variants. In contrast, Mutalyzer does not account for such discrepancies. Therefore, the Mutalyzer results will be incorrect when projecting or validating a variant that is downstream of the first indel. For details and other examples, see https://www.ncbi.nlm.nih.gov/pubmed/30129167.