Relabeling the buffer with updated reward - potential bug? #7

vrn25 · 2023-02-22T20:59:45Z

Hi,

@pokaxpoka
The function used for relabeling the data in the buffer with an updated reward function is defined here: relabel_with_predictor. self.idx is used to compute total_iter here. After the replay buffer is full to capacity, self.idx will again start from 0 (in a cyclic manner). However, we would still want to relabel all the samples in the buffer with an updated reward function. The current code (line 72) does not allow this.

Maybe this should work:

import math
def relabel_with_predictor(self, predictor):
    batch_size = 200
    if self.full: # if the buffer is full
        total_iter = math.ceil(self.capacity/batch_size) # line added
    else:
        total_iter = int(self.idx/batch_size)
            
        if self.idx > batch_size*total_iter:
            total_iter += 1

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Relabeling the buffer with updated reward - potential bug? #7

Relabeling the buffer with updated reward - potential bug? #7

vrn25 commented Feb 22, 2023 •

edited

Loading

Relabeling the buffer with updated reward - potential bug? #7

Relabeling the buffer with updated reward - potential bug? #7

Comments

vrn25 commented Feb 22, 2023 • edited Loading

vrn25 commented Feb 22, 2023 •

edited

Loading