`n_samples` changes the shape of plot #15

giangbang · 2022-12-08T10:03:11Z

Hello @wookayin , i'm using expt in one of my project and i have recently noticed probably a bug when plotting learning results of RL agent. Here is the image of the plot, both are actually using the same data, the only difference is the value of n_sample. There is a drop in performance in the left but by merely changing sample frequency eliminates it. I suspect that the problem is that interpolation is carried out first before the rolling step, and weird things can happen when the sampling frequency has something to do with the frequency of data, which can be the case in some rare situation.

The code for reproducing the plot can be found in here.

(I'm using the latest commit of expt.)

The text was updated successfully, but these errors were encountered:

wookayin · 2022-12-09T21:22:57Z

Hi @giangbang, thanks for using expt and reporting the issue with a full details about how to reproduce.

I think your data fluctuates a lot, success rate being 0 or 1 with no averaging/smoothing applied in the raw data of this series:

Therefore, depending on the sampling frequency we can observe such "aliasing artifacts" when the correlation between the data and the axis (global_step) is high. If the sampling frequency didn't have much luck, it would have sampled more "0" data points (and hence the dip). As you correctly guessed, the main reason for this very noticeable difference is the data would be interpolated after re-sampling, and then smoothed via df.rolling.

My suggestion is that you can make your log data smoothed when writing scalar data to avoid the aliasing problem, or pre-process the data via expt (which can be a bit more flexible) which would make the subsampled data statistically stable and consistent:

One downside of this is that we smooth the data before subsampling and we would be doing rolling-window smoothing twice; because another purpose of subsampling data is to make speed faster (by having smaller number of rows in the dataframe), as well as "smoothing" the curve for a better look. In your particular case you may want to have the data subsampled BUT not interpolated. For the sake of this, I am also going to add a more flexible API that would allow subsampling the data without linear interpolation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`n_samples` changes the shape of plot #15

`n_samples` changes the shape of plot #15

giangbang commented Dec 8, 2022

wookayin commented Dec 9, 2022 •

edited

Loading

n_samples changes the shape of plot #15

n_samples changes the shape of plot #15

Comments

giangbang commented Dec 8, 2022

wookayin commented Dec 9, 2022 • edited Loading

`n_samples` changes the shape of plot #15

`n_samples` changes the shape of plot #15

wookayin commented Dec 9, 2022 •

edited

Loading