Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_create_possibly_ragged_ndarray breaks if numpy 2.x.x is installed #47711

Open
viktor-haag opened this issue Sep 17, 2024 · 0 comments
Open

_create_possibly_ragged_ndarray breaks if numpy 2.x.x is installed #47711

viktor-haag opened this issue Sep 17, 2024 · 0 comments
Labels
bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component)

Comments

@viktor-haag
Copy link

What happened + What you expected to happen

When you produce ragged ndarrays in map_batches, _create_possibly_ragged_ndarray will break if a numpy version with major version 2 is installed. The reason for that is an API change which was introduced with a major version bump in numpy.

relevant numpy changelog entry

Warnings and exceptions present in numpy.exceptions (e.g, ~numpy.exceptions.ComplexWarning, ~numpy.exceptions.VisibleDeprecationWarning) are no longer exposed in the main namespace.

As _create_possibly_ragged_ndarray is still accessing VisibleDeprecationWarning under the old path, this will fail with an AttributeError:

AttributeError: module 'numpy' has no attribute 'VisibleDeprecationWarning'

Suggestion: This can be fixed easily by replacing

warnings.simplefilter("ignore", category=np.VisibleDeprecationWarning)

with

major_version = int(np.__version__.split(".")[0])
if major_version > 1:
    visibleDeprecationWarning = np.exceptions.VisibleDeprecationWarning
else:
    visibleDeprecationWarning = np.VisibleDeprecationWarning
warnings.simplefilter("ignore", category=visibleDeprecationWarning)

Versions / Dependencies

ray==2.36.0
numpy==2.1.1

Reproduction script

import random
from typing import Dict

import numpy as np
import ray

context = ray.init()


class DummyMapper(object):
    def __call__(self, row: Dict[str, str]):
        batch_size = len(row["id"])
        dummy_data = [np.zeros(random.randint(1, 10)) for _ in range(batch_size)]
        return {"dummy_data": dummy_data, **row}


if __name__ == "__main__":
    samples = 10
    ds = ray.data.range(samples)
    ds.map_batches(DummyMapper, concurrency=1, batch_size=samples).write_json("./dummy_out")

Issue Severity

Low: It annoys or frustrates me.

@viktor-haag viktor-haag added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Sep 17, 2024
@viktor-haag viktor-haag changed the title _create_possibly_ragged_ndarray breaks if if numpy 2.x.x is installed _create_possibly_ragged_ndarray breaks if numpy 2.x.x is installed Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component)
Projects
None yet
Development

No branches or pull requests

1 participant