Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix row slicing with sort_and_finalize throw on column slicing when t… #1838

Open
wants to merge 2 commits into
base: more-sort-and-finalize-fixes
Choose a base branch
from

Conversation

vasil-pashov
Copy link
Collaborator

…he column group is larger than the segment column size

Reference Issues/PRs

What does this implement or fix?

Any other comments?

Checklist

Checklist for code changes...
  • Have you updated the relevant docstrings, documentation and copyright notice?
  • Is this contribution tested against all ArcticDB's features?
  • Do all exceptions introduced raise appropriate error messages?
  • Are API changes highlighted in the PR description?
  • Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?

…he column group is larger than the segment column size
@vasil-pashov vasil-pashov marked this pull request as ready for review September 19, 2024 07:55
@@ -1456,6 +1456,14 @@ VersionedItem sort_merge_impl(
"Finalizing staged data is not allowed with empty staging area"
);

user_input::check<ErrorCode::E_INVALID_USER_ARGUMENT>(
write_options.dynamic_schema || pipeline_context->staged_descriptor_->field_count() < write_options.column_group_size,
"Sorting and finalizing staged data is not implemented in the case when column slicing would appear. The "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not yet implemented?

we do have a plan to solve this in future

"input DataFrame has {} fields which is more than the column group size ({}) set in the library options",
pipeline_context->staged_descriptor_->field_count(),
write_options.column_group_size
);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be an error? It should just write segments wider than the lib config setting

class TestSlicing:
def test_long_append_segment(self, lmdb_library):
set_config_int('Merge.SegmentSize', 5)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a context manager to ensure it is reset at the end of the test:

def config_context(name, value):

assert set(lib.list_symbols()) == set([sym, sym_2])

class TestSlicing:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also have a test where the individual staged segments are smaller than the segment row count, but when they are compacted they are larger?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants