Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relabeling the buffer with updated reward - potential bug? #7

Open
vrn25 opened this issue Feb 22, 2023 · 0 comments
Open

Relabeling the buffer with updated reward - potential bug? #7

vrn25 opened this issue Feb 22, 2023 · 0 comments

Comments

@vrn25
Copy link

vrn25 commented Feb 22, 2023

Hi,

@pokaxpoka
The function used for relabeling the data in the buffer with an updated reward function is defined here: relabel_with_predictor. self.idx is used to compute total_iter here. After the replay buffer is full to capacity, self.idx will again start from 0 (in a cyclic manner). However, we would still want to relabel all the samples in the buffer with an updated reward function. The current code (line 72) does not allow this.

Maybe this should work:

import math
def relabel_with_predictor(self, predictor):
    batch_size = 200
    if self.full: # if the buffer is full
        total_iter = math.ceil(self.capacity/batch_size) # line added
    else:
        total_iter = int(self.idx/batch_size)
            
        if self.idx > batch_size*total_iter:
            total_iter += 1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant