Cloudpickle breaks local dataclass #386

froody · 2020-06-22T21:09:03Z

Consider the following test, the last assertEqual fails. This fails because the test in dataclasses.fields of f._field_type is _FIELD fails. See https://github.com/python/cpython/blob/3.7/Lib/dataclasses.py#L1028

This is because f._field_type points to a different object than dataclasses._FIELD.

def testCloudpickle(self):
    import cloudpickle
    import dataclasses

    @dataclass
    class potato:
        drink: int

    v = potato(-42)
    ref = {"drink" : -42}

    self.assertEqual(ref, dataclasses.asdict(v))

    t2 = cloudpickle.loads(cloudpickle.dumps(potato))

    v2 = potato(-42)

    self.assertEqual(ref, dataclasses.asdict(v2))

The text was updated successfully, but these errors were encountered:

pierreglaser · 2020-06-23T08:43:41Z

Hi, Thanks for the report. I see the problem. This may be worth a patch on cpython to define a custom reducer for the _FIELD class. I'm not sure yet about fixing this directly in cloudpickle because this requires manipulating private dataclasses attributes that are likely to change without notice and break things in new Python versions. WDYT @ogrisel?

ogrisel · 2020-07-02T08:03:12Z

The problem is that it's probably useless for CPython as if the dataclass is defined in an importable module, the above problem does not happen. _FIELD being a private class, I see no reason to make it pickleable so the CPython dev might not be interested in adding a reducer. But maybe I am wrong.

I think we will have to deal with a cloudpickle fix that depends on private API an rely on tests to make sure our code tracks the internal changes of the CPython standard library.

avikchaudhuri · 2020-07-24T06:02:36Z

@pierreglaser @ogrisel did you reach a resolution on a fix? We're using cloudpickle for a new project that relies pretty heavily on dataclasses for validation and not being able to use locally defined dataclasses is causing hard constraints on the design. Any update would be much appreciated! Thank you.

tsiq-bertram · 2020-08-25T11:12:29Z

+1 on this issue. Is there any known good workaround for this?

jseppanen · 2020-10-30T14:30:30Z

One simple workaround is to avoid asdict, but write an equivalent method instead, for example:

@dataclass
class potato:
    drink: int

    def asdict(self):
        return {k: getattr(self, k) for k in self.__dataclass_fields__}

mickare · 2021-02-15T19:19:35Z

This is still a major issue.

So I have some questions:

Why not pickle the fields via dataclasses.fields(cls); and reconstruct the dataclass via dataclasses.make_dataclass(...)?
Why not check if the dataclass already exists? (Better: Why overwrite the existing dataclass?)

In my case this bug destroys the original dataclasses when a ray worker returns a dataclass result.

jakubkwiatkowski · 2021-04-07T10:19:08Z

I haven't tested it extensively, but in my case this workaround is working. As @jseppanen suggested I've replaced asdict with equivalent function.

def as_dict(obj, *, dict_factory=dict):  
    if not _is_dataclass_instance(obj):
        raise TypeError("asdict() should be called on dataclass instances")
    return as_dict_inner(obj, dict_factory)

def as_dict_inner(obj, dict_factory=dict):
    if dataclasses.is_dataclass(obj):
        result = []
        for f in obj.__dict__:
            value = as_dict_inner(getattr(obj, f), dict_factory)
            result.append((f, value))
        return dict_factory(result)
    elif isinstance(obj, tuple) and hasattr(obj, '_fields'):

        return type(obj)(*[as_dict_inner(v, dict_factory) for v in obj])
    elif isinstance(obj, (list, tuple)):
        return type(obj)(as_dict_inner(v, dict_factory) for v in obj)
    elif isinstance(obj, dict):
        return type(obj)((as_dict_inner(k, dict_factory),
                          as_dict_inner(v, dict_factory))
                         for k, v in obj.items())
    else:
        return copy.deepcopy(obj)

omry · 2021-05-18T03:45:18Z

I am not 100% sure it's the same issue, but deserializing a dataclass right now is breaking the existing dataclass if it's defined in the same process (in the __main__ module)

import cloudpickle
from dataclasses import dataclass
import dataclasses


@dataclass
class Test:
    dim: int = 1

print(dataclasses.fields(Test))
_unused_deserialized_class = cloudpickle.loads(cloudpickle.dumps(Test))
print(dataclasses.fields(Test))

Output:

(Field(name='dim',type=<class 'int'>,default=1,default_factory=<dataclasses._MISSING_TYPE object at 0x7f2ecc1629d0>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD),)
()

yodahuang · 2021-09-09T07:22:35Z

In the meanwhile, an alternative solution to calling dataclasses.fields is to use __dataclass_fields__ field, though that returns a dict instead of tuple.

salim7 · 2022-01-06T13:35:42Z

Here is another workaround, that works exactly like dataclasses.fields:

def dataclass_fields(class_or_instance):
    """This function is based on dataclasses.fields(), but contains a workaround
    for https://github.com/cloudpipe/cloudpickle/issues/386
    """

    try:
        fields = getattr(class_or_instance, dataclasses._FIELDS)
    except AttributeError:
        raise TypeError('must be called with a dataclass type or instance')

    return tuple(f for f in fields.values() if f._field_type.name == dataclasses._FIELD.name)

SimiPixel · 2022-11-18T16:13:23Z

Recently discovered and posted a connected issue here.

Any workarounds that does not involve changing the way the dataclass is defined (can not change source code) but can be used "after the fact"?

rmorshea · 2023-08-18T23:26:20Z

Here's a solution using a subclass of CloudPickler:

import cloudpickle
import io
import dataclasses
from dataclasses import fields, dataclass, _FIELD_BASE


def _get_dataclass_field_sentinel(name):
    """Return a sentinel object for a dataclass field."""
    return getattr(dataclasses, name)


class PatchedCloudPickler(cloudpickle.CloudPickler):
    def reducer_override(self, obj):
        """Custom reducer for MyClass."""
        if isinstance(obj, _FIELD_BASE):
            return _get_dataclass_field_sentinel, (obj.name,)
        return super().reducer_override(obj)


def dumps(value, protocol=None):
    with io.BytesIO() as file:
        PatchedCloudPickler(file, protocol).dump(value)
        return file.getvalue()


@dataclass
class InClass:
    a: int
    b: int


OutClass = cloudpickle.loads(dumps(InClass))
assert fields(OutClass)

If you need to you can monkey-patch cloudpickle:

cloudpickle.fast_cloudpickle.CloudPickler = PatchedCloudPickler

If the isinstance check has negative performance implications it might be more optimal to check is _FIELD, is _FIELD_INITVAR etc.

rmorshea · 2023-08-22T15:57:16Z

I posted a potential fix here: #513

Would be great to get some feedback on it.

jebacpogotowietylkobukmozemnieleczyc mentioned this issue Mar 6, 2021

fields() from DataClasses not working properly facebookincubator/submitit#1612

Closed

jgbos mentioned this issue May 18, 2021

[Bug] Recent release possibly mutating the structured config object facebookresearch/hydra#1621

Closed

2 tasks

rsokl mentioned this issue Aug 12, 2021

hydra_multirun and losing ability to create configs mit-ll-responsible-ai/hydra-zen#58

Closed

This was referenced Feb 3, 2022

Joblib Parallel breaks dataclasses.asdict joblib/joblib#1185

Closed

dataclass doesn't work with loky backend joblib/joblib#1212

Closed

rsokl mentioned this issue May 5, 2022

Cloudpickle and data classes do not play well together #424

Open

jgbos mentioned this issue May 6, 2022

Updates to launch mit-ll-responsible-ai/hydra-zen#272

Merged

debakarr mentioned this issue Jun 6, 2022

Equality doesn't work after deserialization of dataclass uqfoundation/dill#500

Closed

rsokl mentioned this issue Nov 8, 2022

Support pickling BuildsFoo mit-ll-responsible-ai/hydra-zen#333

Closed

rmorshea mentioned this issue Aug 22, 2023

handle dataclass field type sentinels #513

Merged

ogrisel closed this as completed in #513 Oct 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cloudpickle breaks local dataclass #386

Cloudpickle breaks local dataclass #386

froody commented Jun 22, 2020 •

edited

Loading

pierreglaser commented Jun 23, 2020 •

edited

Loading

ogrisel commented Jul 2, 2020

avikchaudhuri commented Jul 24, 2020

tsiq-bertram commented Aug 25, 2020

jseppanen commented Oct 30, 2020

mickare commented Feb 15, 2021

jakubkwiatkowski commented Apr 7, 2021

omry commented May 18, 2021 •

edited

Loading

yodahuang commented Sep 9, 2021

salim7 commented Jan 6, 2022

SimiPixel commented Nov 18, 2022

rmorshea commented Aug 18, 2023 •

edited

Loading

rmorshea commented Aug 22, 2023

Cloudpickle breaks local dataclass #386

Cloudpickle breaks local dataclass #386

Comments

froody commented Jun 22, 2020 • edited Loading

pierreglaser commented Jun 23, 2020 • edited Loading

ogrisel commented Jul 2, 2020

avikchaudhuri commented Jul 24, 2020

tsiq-bertram commented Aug 25, 2020

jseppanen commented Oct 30, 2020

mickare commented Feb 15, 2021

jakubkwiatkowski commented Apr 7, 2021

omry commented May 18, 2021 • edited Loading

yodahuang commented Sep 9, 2021

salim7 commented Jan 6, 2022

SimiPixel commented Nov 18, 2022

rmorshea commented Aug 18, 2023 • edited Loading

rmorshea commented Aug 22, 2023

froody commented Jun 22, 2020 •

edited

Loading

pierreglaser commented Jun 23, 2020 •

edited

Loading

omry commented May 18, 2021 •

edited

Loading

rmorshea commented Aug 18, 2023 •

edited

Loading