-
-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Temporal CV #62
Comments
Issue-Label Bot is automatically applying the label Links: app homepage, dashboard and code for this bot. |
A standalone temporal CV would definitely fit here, yes. One can take different approaches of accounting for this, the most common is probably clustering (with kmeans as the default approach). The latter is already doable, the kmeans clustering can be quickly adopted by In the end you want to ensure to decluster observations that are close in time because they show a high correlation among them naturally. This is the same issue as observations in space. And yes, we can port over |
I think the thing we want here is not clustering, but instead basically splitting train / test such, that I think @mllg wanted this as well, have you already started something there? |
Ah ok, this is also an interesting approach! In I am not sure if An argument supporting a percentage increase could be interesting?
Nope, nothing exists in this way, never had such a dataset yet. |
Datasets:
A ressource on time-series cross-validation: |
I believe this would also fit nicely in mlr3. Tasks already have column role "order" which can be used in something like "ResamplingOrderedCV" or "ResamplingOrderedHoldout". |
If we have already a dedicated package for spatial and temporal CV stuff, I'd argue it should live there, simply because users might look for it there? |
Coming back to this after a while, I now have a different view on this:
From a user point of view, task and resampling stuff could then be done with one extension package (i.e. {mlr3spatiotempcv}. Thoughts? |
Oliveira et al 2021 could be an interesting read. |
I think I would like to postpone the implementation after the paper has been submitted. Including it before would require to introduce and discuss a somewhat distinct field which I would like to avoid right now. |
I need this kind of method to use mlr3 for EHR-based machine learning - specifically the ability to define training/test/validation sets using date-based splits. Is it possible for me to provide the splits to mlr3 and use the existing framework? I wasn't able to see how to do that in the documentation so far. It seems like I will need to use tidymodels otherwise. |
Hey @ck37 If you are able to compute indices for yourself, you can do it already, see (https://mlr3.mlr-org.com/reference/mlr_resamplings_custom.html). library(mlr3)
task = tsk("penguins")
task$filter(1:10)
# Instantiate Resampling
custom = rsmp("custom")
train_sets = list(1:5, 5:10)
test_sets = list(5:10, 1:5)
custom$instantiate(task, train_sets, test_sets)
custom$train_set(1)
custom$test_set(1) |
Ah ok, awesome - appreciate the help & fast response 🙏 |
I currently have a task with a column that is a date.
As the task is to basically predict values in the future, a cross-validation strategy that can take this into account would be required. Similar to see RollingWindowCV.
As this is a very common use-case, we should perhaps think about implementing this.
mlr3forecasting
, but for forecasting tasks instead of regularClassif
|Regr
Tasks.The text was updated successfully, but these errors were encountered: