You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using async actors with map_batches() and Python 3.9, this error occurs:
RuntimeError: Task <Task pending name='Task-2' coro=<_generate_transform_fn_for_async_map_batches.<locals>.transform_fn.<locals>.process_batch() running at /home/ray/anaconda3/lib/python3.9/site-packages/ray/data/_internal/planner/plan_udf_map_op.py:327> cb=[as_completed.<locals>._on_completion() at /home/ray/anaconda3/lib/python3.9/asyncio/tasks.py:598]> got Future <Future pending> attached to a different loop
With Python 3.11, the error does not occur. It's likely because Python 3.11 introduced multiple improvements to asyncio, and we need to improve the way we handle the event loop so it works with previous versions of Python.
Currently, the workaround is to use Python 3.11+.
Versions / Dependencies
Python 3.9, ray 2.35
Reproduction script
async def task_yield(row):
return row
class AsyncActor:
def __init__(self):
pass
async def __call__(self, batch):
rows = [{"id": np.array([i])} for i in batch["id"]]
tasks = [asyncio.create_task(task_yield(row)) for row in rows]
for task in tasks:
yield await task
n = 8
ds = ray.data.range(n, override_num_blocks=n)
ds = ds.map_batches(
AsyncActor,
batch_size=n,
compute=ray.data.ActorPoolStrategy(size=1, max_tasks_in_flight_per_actor=n),
concurrency=1,
max_concurrency=n,
)
output = ds.take_all()
expected_output = [{"id": i} for i in range(n)]
# Because all tasks are submitted almost simultaneously,
# the output order may be different compared to the original input.
assert len(output) == len(expected_output), (len(output), len(expected_output))
Issue Severity
Medium: It is a significant difficulty but I can work around it.
The text was updated successfully, but these errors were encountered:
scottjlee
added
bug
Something that is supposed to be working; but isn't
triage
Needs triage (eg: priority, bug/not-bug, and owning component)
P1
Issue that should be fixed within a few weeks
data
Ray Data-related issues
and removed
triage
Needs triage (eg: priority, bug/not-bug, and owning component)
labels
Sep 18, 2024
scottjlee
changed the title
[Data] asyncioRuntimeError when using async actors with map_batches
[Data] asyncio event loop mismatch when using async actors with map_batchesSep 18, 2024
What happened + What you expected to happen
When using async actors with
map_batches()
and Python 3.9, this error occurs:With Python 3.11, the error does not occur. It's likely because Python 3.11 introduced multiple improvements to asyncio, and we need to improve the way we handle the event loop so it works with previous versions of Python.
Currently, the workaround is to use Python 3.11+.
Versions / Dependencies
Python 3.9, ray 2.35
Reproduction script
Issue Severity
Medium: It is a significant difficulty but I can work around it.
The text was updated successfully, but these errors were encountered: