Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store min/max stats per column per partition #11083

Open
1 of 6 tasks
sopel39 opened this issue Sep 5, 2024 · 3 comments
Open
1 of 6 tasks

Store min/max stats per column per partition #11083

sopel39 opened this issue Sep 5, 2024 · 3 comments
Labels
proposal Iceberg Improvement Proposal (spec/major changes/etc)

Comments

@sopel39
Copy link

sopel39 commented Sep 5, 2024

Proposed Change

At the moment https://iceberg.apache.org/spec/#partition-statistics doesn't contain min/max stats per column. Because of that engines (e.g: https://github.com/trinodb/trino/blob/master/plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TableStatisticsReader.java#L158) need to read manifests files to compute min/max stats per column. Keeping min/max stats at partition level would allow to save time on enumerating manifest files during planning. This is especially important with high concurrency queries and on large scale tables.

Proposal document

No response

Specifications

  • Table
  • View
  • REST
  • Puffin
  • Encryption
  • Other
@sopel39 sopel39 added the proposal Iceberg Improvement Proposal (spec/major changes/etc) label Sep 5, 2024
@sopel39
Copy link
Author

sopel39 commented Sep 5, 2024

This probably should be part of #8450

cc @raunaqmorarka @lxynov

@guykhazma
Copy link
Contributor

@sopel39 adding to the above this can we can also store null_counts. see more detailed discussion here.
Null counts which are stored in the partition stats can be scaled during run time (or otherwise on the fly collection can be used).

@lxynov
Copy link
Contributor

lxynov commented Sep 5, 2024

+1 on this. Min/max values are needed by CBO to estimate the selectivity of range filters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal Iceberg Improvement Proposal (spec/major changes/etc)
Projects
None yet
Development

No branches or pull requests

3 participants