Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data] Method to iterate over GroupedData #47744

Open
dominikgrygiel opened this issue Sep 19, 2024 · 0 comments
Open

[Data] Method to iterate over GroupedData #47744

dominikgrygiel opened this issue Sep 19, 2024 · 0 comments
Labels
enhancement Request for new feature and/or capability triage Needs triage (eg: priority, bug/not-bug, and owning component)

Comments

@dominikgrygiel
Copy link

Description

It would be great to have something like GroupedData.iter_groups to iterate over the groups after groupby operation.

This might be a duplicate of: #42228 I feel like we are trying to have similar functionality, but with a different approach.

Use case

Similarly to: #42228 I would like to perform some extra operations on the grouped data and persist it to disk/database. Currently, I'm doing something like the following, but it feels "wrong":

def _do_something(group: pd.DataFrame) -> pd.DataFrame:
    # Do some operations, persist to disk/database, etc.

    return group.head(0)  # Not sure if that is actually helping or would returning `group` be better

ds.groupby("column").map_groups(_do_something).materialize()
@dominikgrygiel dominikgrygiel added enhancement Request for new feature and/or capability triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Sep 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Request for new feature and/or capability triage Needs triage (eg: priority, bug/not-bug, and owning component)
Projects
None yet
Development

No branches or pull requests

1 participant