Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

n_samples changes the shape of plot #15

Open
giangbang opened this issue Dec 8, 2022 · 1 comment
Open

n_samples changes the shape of plot #15

giangbang opened this issue Dec 8, 2022 · 1 comment

Comments

@giangbang
Copy link

Hello @wookayin , i'm using expt in one of my project and i have recently noticed probably a bug when plotting learning results of RL agent. Here is the image of the plot, both are actually using the same data, the only difference is the value of n_sample. There is a drop in performance in the left but by merely changing sample frequency eliminates it. I suspect that the problem is that interpolation is carried out first before the rolling step, and weird things can happen when the sampling frequency has something to do with the frequency of data, which can be the case in some rare situation.
image

The code for reproducing the plot can be found in here.

(I'm using the latest commit of expt.)

@wookayin
Copy link
Owner

wookayin commented Dec 9, 2022

Hi @giangbang, thanks for using expt and reporting the issue with a full details about how to reproduce.

I think your data fluctuates a lot, success rate being 0 or 1 with no averaging/smoothing applied in the raw data of this series:

image

Therefore, depending on the sampling frequency we can observe such "aliasing artifacts" when the correlation between the data and the axis (global_step) is high. If the sampling frequency didn't have much luck, it would have sampled more "0" data points (and hence the dip). As you correctly guessed, the main reason for this very noticeable difference is the data would be interpolated after re-sampling, and then smoothed via df.rolling.

image

My suggestion is that you can make your log data smoothed when writing scalar data to avoid the aliasing problem, or pre-process the data via expt (which can be a bit more flexible) which would make the subsampled data statistically stable and consistent:

image

One downside of this is that we smooth the data before subsampling and we would be doing rolling-window smoothing twice; because another purpose of subsampling data is to make speed faster (by having smaller number of rows in the dataframe), as well as "smoothing" the curve for a better look. In your particular case you may want to have the data subsampled BUT not interpolated. For the sake of this, I am also going to add a more flexible API that would allow subsampling the data without linear interpolation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants