Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can someone explain this line? #21

Open
teucer opened this issue Jul 9, 2018 · 4 comments
Open

Can someone explain this line? #21

teucer opened this issue Jul 9, 2018 · 4 comments

Comments

@teucer
Copy link

teucer commented Jul 9, 2018

If my understanding is correct this is finding the places where there is delimiter and filters for them. How does this help with training?

clf_h = clf_h[flat == self.clf_token, :]

@rodgzilla
Copy link
Contributor

When the information reaches the classification head, it has one vector of dimension n_embd associated to each position of each input. If you want to get a single prediction for each input (as it is the case with classification tasks) you have to select one of these input.

As the transformer network is auto-regressive, the value you select has to be the rightmost one which corresponds to clf_token in the input as it is created like this:

x12 = [start] + x1[:max_len] + [delimiter] + x2[:max_len] + [clf_token]
x13 = [start] + x1[:max_len] + [delimiter] + x3[:max_len] + [clf_token]

@teucer
Copy link
Author

teucer commented Jul 10, 2018

@rodgzilla Thank you a lot for the explanation. It makes a lot of sense! Out of curiosity, why all the values cannot be used?

@thomwolf
Copy link
Member

Well for a classifier, we usually want a fixed length representation of the sentence so we can't really use a varying number of values. Starting from that, the last hidden state is the most logical summary of the sentence. But there are other possible options of course, feel free to try your ideas!

@mehdimashayekhi
Copy link

mehdimashayekhi commented Jul 19, 2018

in original open ai code (https://github.com/openai/finetune-transformer-lm/blob/bd1cf7d678926041e6d19193cab7e5cd8ce2fce6/train.py#L191) in train.py in the model function here in this line clf_logits = clf(clf_h, 1, train=train), why ny is 1?, shouldn't it be 2? because we have two classes. is there a reason to use 1 and then later reshape the logits second dimension to 2?! I really appreciate your help,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants