Dataframe API improvements #7455
Labels
enhancement
New feature or request
🐍 Python API
Python logging API
⛃ re_datastore
affects the datastore itself
Notes from exploration:
ComponentSelector
Proposals
Start with python refinement, and then back-propagate into rust if we like it.
Selections
The python
Dataset
object will internally track a set of columns that will be used for all queries along with anArc<ChunkStore>
.Introduce new
select_
variant APIs on theDataset
:dataset.select_entities(expr: str) -> Dataset
dataset.select_components(components: Sequence[ComponentLike]) -> Dataset
dataset.select_columns(column_selectors: : Sequence[ColumnSelector]) -> Dataset
Each of these has the potential to strictly filter/mutate the active set of descriptors relative to the previous step. I.e. first selection is from the complete set, each incremental selection only selects from the remaining set.
LatestAtQuery and RangeQuery
Our TimeType ambiguity continues to torment us.
The most ergonomic is clearly an API that looks like:
LatestAtQuery(timeline: str, at: int | float)
RangeQuery(timeline: str, min: int | float, max: int | float)
The big challenge here is that sane-looking APIs are ambiguous without knowledge of the timeline.
Concretely:
LatestAtQuery(timeline, 2.0)
needs to map to the TimeInt 2 if timeline is a Sequence, 2000000000 if timeline is Temporal and the user is thinking in seconds, and 2 if the timeline is temporal and the user is thinking in nanos.TODO: Still not sure what the right answer is here.
If we follow precedent from
TimeRangeBoundary
this ends up looking something:Choice A
Choice B, with some parameter-exploding could be simplified down to:
Choice C, diverging from what we do in TimeRangeBoundary:
Queries
Since the selection is now carried with the
Dataset
, you can now execute a query directly without providing columns.dataset.latest_at_query(latest_at: LatestAt)
dataset.range_query(range: Range, pov: ComponentSelector)
This means you can write a query like:
Column Naming
Selectors/Descriptors will be given a name.
This name will default to one of:
When specifying a component selector, users have the option to call
.rename_as()
to change the name of the component.These names are also valid INPUT to a ColumnSelector.
For example:
POV
TODO: Needs more thought
The text was updated successfully, but these errors were encountered: