Skip to content

Cross check RefSeq and Ensembl annotation sources for HGNC Symbols

Notifications You must be signed in to change notification settings

barslmn/cross-symbol-checker

Repository files navigation

Cross-symbol checker

https://omics.sbs/static/bioscripts/img/crosssymbolchecker.svg

This tool maps aliases, previous or withdrawn symbols to current approved HGNC symbols. Checks for any entries that are not gene symbols and fixes capitalizations. Ensembl and RefSeq annotation files sometimes use previous or alias symbols. This tool also checks Ensembl and NCBI annotation files for different genome versions and shows which gene symbol is used. This way a more appropriate gene set can be used to avoid the false negatives in the variant discovery process.

How to install

Install from git

Clone this repository to install.

git clone https://github.com/barslmn/bioscripts

To get the most up to date annotation files run the ./get-data.sh. This process takes some time as it downloads various gene symbol annotation files from NCBI and Ensembl and HGNC. Check install section at cross-symbol-checker.org to see what is being downloaded.

Get container

Installing with docker.

docker pull barslmn/cross-symbol-checker

How to use

Main operation is performed at cross-symbol-checker.sh this file takes a single symbol as its input and returns the status of all of the approved, alias, previous or withdrawn symbols in all available annotation sources. Another script is check-geneset.sh which takes one or more gene and target assembly as its arguments. This script is a wrapper around cross-symbol-checker and parses its output.

Example use: Check single gene for all annotation sources:

cross-symbol-checker.sh SHFM6

Check multiple genes for given annotation source:

check-geneset.sh SHFM6 ADA2 -a GRCh37 -s RefSeq

Or check against a custom target:

check-geneset.sh SHFM6 ADA2 -a GRCh37 -s RefSeq -t my.target.bed.gz

Example use with docker: Check single gene for all annotation sources:

docker run barslmn:cross-symbol-checker /opt/cross-symbol-checker/cross-symbol-checker.sh SHFM6

Check multiple genes for given annotation source:

docker run barslmn:cross-symbol-checker /opt/cross-symbol-checker/check-geneset.sh SHFM6 ADA2 -a GRCh37 -s RefSeq

./cross-symbol-checker.sh --help
Usage: ./cross-symbol-checker.sh symbol

This script checks given symbol against every possible assembly

-c --no-cross-check
    Don't check annotation sources. Just check alternative gene symbols and exit.

-h --help
    Display this help message and exit.

-V
    Print current version and exit

Functionality of the script can be further altered with environment variables.
These are mainly used by check-geneset.sh.

CSC_SOURCES
    Limit which annotation sources to be used.

CSC_ASSEMBLIES
    Limit which assemblies sources to be used.

CSC_VERSIONS
    Limit which versions sources to be used.

CSC_TARGETS
    Custom target file.

CSC_LOGLVL
    Set log level. Default is INFO.

./check-geneset.sh --help
Usage: ./check-geneset.sh symbol1 symbol2 -a T2T -s RefSeq -v latest

-a --assembly
    Default assemblies are GRCh37, GRCh38 and T2T.
    You can use multiple assemblies by quoting them together like -a "GRCh37 GRCh38"

-h --help
    Display this help message and exit.

-o --only-target
    By default check-geneset.sh will run for every assembly. Use this option
    to check only against given target file.

-s --source
    Default assemblies are GRCh37, GRCh38 and T2T.
    You can use multiple assemblies by quoting them together like -a "GRCh37 GRCh38"

-t --target
    Custom bed file. If file name has the format: source.assembly.version.bed,
    columns in the output table will be filled accordingly.
    Custom file should look like this:
           chrom	start	end	symbol
           chr1	1266694	1270686	TAS1R3
           chr1	1270656	1284730	DVL1
           chr1	1288069	1297157	MXRA8

-v --version
    There is only one version for all of the assemblies which is latest.
    You can install older assemblies and specify them with this parameter.

-V
    Print current version and exit

About

Cross check RefSeq and Ensembl annotation sources for HGNC Symbols

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published