Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ALT fields with . + seq in vcf_to_dataframe are annotated as nan. #349

Open
gmgs-999 opened this issue Jan 29, 2021 · 0 comments
Open

ALT fields with . + seq in vcf_to_dataframe are annotated as nan. #349

gmgs-999 opened this issue Jan 29, 2021 · 0 comments

Comments

@gmgs-999
Copy link

Hi, I'm Gabriel. I'm doing my thesis with SV's and vcf files. I'm doing a script to annotate SV's BND format in short annotation. One type of insertion is a single breakend INS and it's annotated like:
REF: A ; ALT: .AGTA(etc..).
Example:
ALELO_EJEMPLO
The variant with gridss273b_534b ID starts with "." (this variant was annotated by SNPEff as INS.).
In python after using vcf_1kg=allel.vcf_to_dataframe(ruta_vcf,fields=['*'],exclude_fields=['FILTER_NO_RP', 'FILTER_SINGLE_ASSEMBLY','FILTER_ASSEMBLY_TOO_FEW_READ', 'FILTER_NO_ASSEMBLY', 'FILTER_NO_SR','FILTER_INSUFFICIENT_SUPPORT', 'FILTER_ASSEMBLY_BIAS','FILTER_ASSEMBLY_ONLY', 'FILTER_ASSEMBLY_TOO_SHORT','FILTER_SMALL_EVENT', 'FILTER_REF', 'FILTER_SINGLE_SUPPORT','FILTER_LOW_QUAL','FILTER_SnpSift'],alt_number=1)(OBS: ruta_vcf it's the PATH to the vcf. STRAND ,INS_LEN and INS_SEQ fields was added after to DataFrame in script)
ALELO_EJEMPLO_python
ALT field is annotated as nan. I know that initial "." means missing value, and it is the reason to annotate as nan these fields. Can it be solved?. I'm using allel V1.3.2 and python 3.5.3.
Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant