Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort and finalize can reorder the fields in a segment and crash with static schema #1795

Open
Tracked by #1679
vasil-pashov opened this issue Aug 27, 2024 · 0 comments · May be fixed by #1799
Open
Tracked by #1679

Sort and finalize can reorder the fields in a segment and crash with static schema #1795

vasil-pashov opened this issue Aug 27, 2024 · 0 comments · May be fixed by #1799
Assignees
Labels
bug Something isn't working

Comments

@vasil-pashov
Copy link
Collaborator

vasil-pashov commented Aug 27, 2024

Describe the bug

Sort and finalize uses merge_descriptors to generate the field descriptor for the newly added segment. After that during the merging phase it creates an Aggregator and strips the fields from the merged descriptor leaving the aggregator to create the field collection. This later leads to crash on write since the segment field descriptor is different than the one in the header.

In the example below merge_descriptors would order the fields in the field descriptor in order of appearance: index - 0, a - 1, b - 2. The final sorted segment adds rows one by one in the order of their index, thus column b will be reported first and it will have index 1 after that a will have index 2.

Steps/Code to Reproduce

import numpy as np
import pandas as pd
import arcticdb

ac = arcticdb.Arctic("lmdb://test")
lib = ac.get_library("test", create_if_missing=True)

idx1 = pd.DatetimeIndex([
    pd.Timestamp("2024-01-02")
])
df1 = pd.DataFrame({
     "a": np.array([1], dtype="float"),
     "b": np.array([22250], dtype="int64")
}, index=idx1)

b = np.array([-53979, -53973], dtype="int64")

idx = pd.DatetimeIndex([
    pd.Timestamp("2024-01-03"),
    pd.Timestamp("2024-01-01")
])

df2 = pd.DataFrame({"b": b}, index=idx)

lib.write("sym", df1, staged=True)
lib.write("sym", df2, staged=True)
lib.sort_and_finalize_staged_data("sym")
lib.read("sym")

Expected Results

Create the right field descriptor and do not throw.

OS, Python Version and ArcticDB Version

Python: 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
OS: Windows-10-10.0.22631-SP0
ArcticDB: dev

Backend storage used

No response

Additional Context

No response

@vasil-pashov vasil-pashov added the bug Something isn't working label Aug 27, 2024
@vasil-pashov vasil-pashov self-assigned this Aug 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant