Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can you supply the instructions about how to use real-world data to train model #3

Open
Dinxin opened this issue Aug 7, 2018 · 18 comments

Comments

@Dinxin
Copy link

Dinxin commented Aug 7, 2018

I viewed the whole code and found that the code only use toy dummy data to train model. So I don't really understand how you use those data to train GCN model. Can you supply the code or instructions about how to use real-world data to train model?

@ddofer
Copy link

ddofer commented Aug 21, 2018

  • It's also not clear how to get predictions from the trained model on new data/ a new pair of drugs. Do i put in SIDER codes? STITCH? other codes? in what format?

@colinwxl
Copy link

colinwxl commented Nov 2, 2018

I am also confused about how to apply the model to the real data.

@Msan1995
Copy link

Msan1995 commented Nov 4, 2018

Please give instructions on how to apply the actual dataset in the code. It is very difficult to understand what the variables represent in the code for dummy data.

@vidarmehr
Copy link

I am trying to apply the code to the real datasets. In the first step, I tried to check if I have the same parameters (number of proteins, drugs,...) for the network. The number of proteins as what has mentioned in the paper should be 19085. But, from the protein-protein network(bio-decagon-ppi), I get 19081 proteins. Has anyone tried applying the code to the real dataset? and have you got the same number of proteins for the network? Thanks.

@bbjy
Copy link

bbjy commented Feb 26, 2019

I am also confused about how to apply the model to the real data. Has anyone solved the problem? Thanks.

@westzhicanchen
Copy link

Same problem for me, not quite sure how to apply that.

@bbjy
Copy link

bbjy commented Apr 3, 2019

@vidarmehr I also get 19081 proteins from the protein-protein network(bio-decagon-ppi), and 1317 side effects, not the same as mentioned in paper (1318). Is it the same with your parameters (number of proteins, drugs,...) ? Thanks.

@chao1224
Copy link

Any updates? Same issue here. We want to reproduce the paper's results.

@vidarmehr
Copy link

@chao1224 I was not able to reproduce the results of paper and I decided to stop working on Decagon for now.

@vidarmehr
Copy link

@vidarmehr I also get 19081 proteins from the protein-protein network(bio-decagon-ppi), and 1317 side effects, not the same as mentioned in paper (1318). Is it the same with your parameters (number of proteins, drugs,...) ? Thanks.
Sorry for my delay. I just saw your comment. As I mentioned, I am not working on Decagon anymore. Here is data that I got from the paper and from the real datasets:
Number of proteins = 19,085 (paper) ....... Number of proteins = 19,081(ppi data)
Number of drugs = 645 (paper).......... Number of drugs = 645 (polypharmacy side effect data (combo))
Number of protien-protien edges= 715,612(paper) ....... Number of protien-protien edges= 715,612 (ppi data)
Number of drug-drug edges= 4,651,131 (paper) ......... Number of drug-drug edges= 4,649,441 (polypharmacy side effect data (combo))
Number of drug-protein edges= 18,596 (paper) ........ Number of drug-protein edges= 18,690 (Drug-target protein (targets))

@bbjy
Copy link

bbjy commented Jul 2, 2019

@vidarmehr I got it. Thank you so much for your reply.

@chao1224
Copy link

chao1224 commented Jul 2, 2019

Thanks for the reply @vidarmehr.

Just want to quickly clarify a number:

  1. In bio-decagon-targets.csv, there are 18,690 interactions.
  2. In bio-decagon-targets-all.csv, there are 131,034 interactions, and 112,438 of them are invalid (not included in the STITCH list or Gene list). Therefore, there are 131,034 -112,438 = 18,596 valid interactions.

@rubjim
Copy link

rubjim commented Nov 12, 2019

Are there any updates on this issue? I was also unable to reproduce the results in the paper. They say that they only focus on predicting the 964 polypharmacy side effects that each occurred in at least 500 drug combinations. However, the data they provide is the full TWOSIDES dataset. I don't know if they filter out some side effects in the code, but I couldn't find any evidence of this.

@Dinxin
Copy link
Author

Dinxin commented Nov 26, 2019

@rubjim I only can get 963 side effect types which appear in more than 500 drug combinations. I think the decagon dataset is so confusing that we could not apply it in our research work.

@chimkens
Copy link

Was anyone ever able to reproduce the results? Or at least get it running properly?

@rubjim
Copy link

rubjim commented Dec 13, 2020

@Dinxin I agree with you, that's what I also get when I filter the side effects myself. However, they claim they predict for 964 which doesn't correspond to the actual numbers in the dataset. @christina-s-wang at least I wasn't able to do it.

@maryamag85
Copy link

NO one cares for these people asking some help? I am in the same spot.

@avi-pomicell
Copy link

to use this code with real data + python 3.6 try this fork:
https://github.com/DeepVivo/decagon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests