Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to use pre-trained embeddings as initializers #107

Open
mmcenta opened this issue Dec 9, 2019 · 2 comments
Open

Option to use pre-trained embeddings as initializers #107

mmcenta opened this issue Dec 9, 2019 · 2 comments

Comments

@mmcenta
Copy link

mmcenta commented Dec 9, 2019

I don't know if this feature is available yet, but I needed to initialize the node embeddings with some pre-trained vectors and I wasn't able to (and I believe the gensim library supports it).
I will probably implement this feature for my own use. If there is interest, I can send a PR and we can figure it out.

@GTmac
Copy link
Collaborator

GTmac commented Dec 9, 2019

It is not supported in DeepWalk yet. Will be happy to take a look if you send a PR :-)

One thing I am not sure about is the context vectors. Since each word has two vectors (embedding vector + context vector), when you load pre-trained embeddings, how are the context vectors being set? If that is still randomly initialized, then will this make the pre-trained embeddings less powerful?

@mmcenta
Copy link
Author

mmcenta commented Dec 9, 2019

I don't think I understand your question 😅

What I mean by using pre-trained embeddings is that the context vectors are initialized to a set of pre-defined embeddings given by the user. This means that they need to have the same dimensions!

Perhaps an example will help: I am currently training a model for the task of link prediction on the french web, and I am testing an approach in which I insert text information into the graph and then use the deepwalk embeddings as input for the classifier. One of the ideas is to initialize the node embeddings to text embedding of the webpage corresponding to that node with the same dimension.

As for the implementation, I am working on it right now. I am trying to understand how you deal with walks that don't fit in memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants