Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong Short Tandem Repeat size in vcf file #30

Open
pailloufat-stack opened this issue May 6, 2024 · 4 comments
Open

Wrong Short Tandem Repeat size in vcf file #30

pailloufat-stack opened this issue May 6, 2024 · 4 comments

Comments

@pailloufat-stack
Copy link

Hi,

I work on 16 mice samples. I look at STR variants in 13 / 16 of them, which are heterozygous ; the 3 others are wild homozygous. What I'm interested in are the size differences of the STRs between these 13 samples. The STRs found in the 13 samples with the same size do not interest me.

I ran TRGT, and I created a merged VCF file. I modified it a bit to get the information I want (with the "MS" field).

I noticed some errors. For example, I have this STR (I reduce the numbers of samples to 3 to make it clearer) : I normally have three deletions in the STR with 3 different sizes :

chr2 154720638 0/6;0(0-219),0(0-75) 0/1;0(0-219),0(0-63) 0/5;0(0-219),0(0-105)

When I look at the IGV track, I have one deletion but at the same size (146pb) , which is not reflected in the VCF :

image

I should have : 219-75 pb = 146 pb for sample 1, 158 pb for sample 2 and 114 pb for sample 3.

Do I miss something?
Best

@hdashnow
Copy link
Collaborator

hdashnow commented May 6, 2024

Do these mice carry a humanized HTT sequence or just mouse sequence? What is the TRGT definition for this locus?
What are the sequences of those inserted and deleted bases? What was the full allele sequence reported by TRGT?

@pailloufat-stack
Copy link
Author

They only carry mouse sequences. The TRGT definition for this locus is (CCTCTG)n . About the inserted sequences, you talk about the 61bp insertion?

I show you the full line of the initial VCF (which is pretty unreadable, tell me if you want the file) :

chr2 154720638 . CTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTCTGCCTCTGCCTCTGCCTCTGCCTCTCTGCCTCTGCCTCTCTGCCTCTGCCTCTGCCTCTGCCTCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTC CTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGAGGTGCCACATTCACCTGGTGACCTTTTAGCTCAGGCTGTTCTCATGACTCCTGTCTTTATC,CTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTCTGCCTCTGCCTCTGCCTCTGCCTCTCTGCCTCTGCCTCTCTGCCTCTGCCTCTGCCTCTGCCTCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTC,CTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGAGGTGCCACATTCACCTGGTGACCTTTTAGCTCAGGCTGTTCTCATGACTCCTGTCTTTATC,CTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGAGGTGCCACATTCACCTGGTGACCTTTTAGCTCAGGCTGTTCTCATGACTCCTGTCTTTATC,CTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTCTGCCTCTGCCTCTGGCTCTGGCTGTGCCTCTTTATC,CTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGCCTCTGAGGTGCCACATTCACCTGGTGACCTTTTAGCTCAGGCTGTTCTCATGACTCCTGTCTT.. AC=2,1,8,1,1,1;AN=32;END=154720856;MOTIFS=CCTCTG;SF=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15;STRUC=(CCTCTG)n;TRID=MOUSE_STR_936638 GT:ALLR:AP:MS:AL:MC:SD:AM 0/0:37-230,219-219:0.9125,0.9125:0(0-219),0(0-219):219,219:40,40:125,125:.,. 0/6:189-228,127-143:0.9125,0.551471:0(0-219),0(0-75):219,133:40,13:139,111:.,. 0/0:197-230,219-219:0.9125,0.9125:0(0-219),0(0-219):219,219:40,40:125,125:.,. 0/3:194-226,131-147:0.9125,0.535714:0(0-219),0(0-75):219,137:40,13:158,92:.,. 0/0:177-240,219-219:0.9125,0.9125:0(0-219),0(0-219):219,219:40,40:117,117:.,. 0/1:206-226,125-140:0.9125,0.492188:0(0-219),0(0-63):219,125:40,11:149,82:.,. 0/3:204-228,131-143:0.9125,0.535714:0(0-219),0(0-75):219,137:40,13:145,105:.,. 0/5:212-231,79-149:0.9125,0.868421:0(0-219),0(0-105):219,105:40,19:168,82:.,. 0/3:204-231,131-146:0.9125,0.535714:0(0-219),0(0-75):219,137:40,13:128,60:.,. 0/3:203-230,131-144:0.9125,0.535714:0(0-219),0(0-75):219,137:40,13:109,49:.,. 0/3:198-225,130-173:0.9125,0.535714:0(0-219),0(0-75):219,137:40,13:150,100:.,. 1/2:125-150,205-233:0.492188,0.898374:0(0-63),0(0-221):125,221:11,41:102,148:.,. 0/3:205-238,129-137:0.9125,0.535714:0(0-219),0(0-75):219,137:40,13:79,28:.,. 0/3:195-228,131-143:0.9125,0.535714:0(0-219),0(0-75):219,137:40,13:137,90:.,. 0/3:205-230,131-138:0.9125,0.535714:0(0-219),0(0-75):219,137:40,13:92,76:.,. 0/4:201-241,131-143:0.9125,0.514925:0(0-219),0(0-69):219,131:40,12:132,43:.,.

@egor-dolzhenko
Copy link
Collaborator

Thank you for reporting this. Would you be open to sharing BAM slices containing this repeat for these three samples? If yes, please feel free to share them by email.

@pailloufat-stack
Copy link
Author

pailloufat-stack commented May 6, 2024

I just contacted you. Thanks

Actually, I noticed many wrong interpretations in the VCF file comparing to the BAM files. For example, the sample12 is 1/2;0(0-63),0(0-221) , where I should get two "new" alleles here but I still have the wild allele and the 146pb deletion in the IGV track.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants