Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Recent release possibly mutating the structured config object #1621

Closed
2 tasks done
jgbos opened this issue May 17, 2021 · 12 comments
Closed
2 tasks done

[Bug] Recent release possibly mutating the structured config object #1621

jgbos opened this issue May 17, 2021 · 12 comments
Labels
bug Something isn't working

Comments

@jgbos
Copy link
Contributor

jgbos commented May 17, 2021

🐛 Bug

Description

Just playing around with Hydra and noticed the behavior below. It only happens for the submitit and joblib launchers and not the basic launcher (that's all I've tested on).

Checklist

  • I checked on the latest version of Hydra
  • I created a minimal repro (See this for tips).

To reproduce

First I define a sample structured config.

from dataclasses import dataclass

from hydra.core.config_store import ConfigStore


@dataclass
class Test:
    _target_: str = "torch.nn.Softmax"
    dim: int = 1


cs = ConfigStore.instance()
cs.store(name="test_config", node=Test)

Running the following command outputs:

>>> from omegaconf import OmegaConf
>>> OmegaConf.create(Test)
{'_target_': 'torch.nn.Softmax', 'dim': 1}

Now if I run a multirun experiment with either submitit or joblib launchers (all I have to work with)

from hydra._internal.hydra import Hydra
from hydra._internal.utils import create_config_search_path
from hydra.core.global_hydra import GlobalHydra
from hydra.utils import instantiate

search_path = create_config_search_path(None)
hydra = Hydra.create_main_hydra2(task_name="test", config_search_path=search_path)
hydra.multirun(
    config_name="test_config",
    task_function=instantiate,
    overrides=["hydra/launcher=submitit_local"],
)
GlobalHydra.instance().clear()

and then try to create the object again:

>>> OmegaConf.create(Test)
{}
>>> OmegaConf.create(Test)._metadata
ContainerMetadata(ref_type=typing.Any, object_type=<class '__main__.Test'>, optional=True, key=None, flags={}, flags_root=False, resolver_cache=defaultdict(<class 'dict'>, {}), key_type=typing.Any, element_type=typing.Any)

The only way to create the object again is to restart my notebook.

Expected Behavior

I would expect no issues with creating the object after a run. It seems some sort of mutation is happening on the object, but I'm not sure where this is happening.

System information

  • Hydra Version : 1.1.0.rc1
  • Python version : 3.8.5
  • Virtual environment type and version : conda
  • Operating system : Linux
@jgbos jgbos added the bug Something isn't working label May 17, 2021
@jieru-hu
Copy link
Contributor

jieru-hu commented May 17, 2021

hi @jgbos thanks trying out 1.1 rc.

Could you provide a minimal repro with the hydra.main decorator on your application and run multirun from the commandline? Directly calling Hydra's internal API is not encouraged. Thanks.

@jgbos
Copy link
Contributor Author

jgbos commented May 17, 2021

@jieru-hu understood that it's not encouraged, I was just been playing around with Hydra and noticed this behavior and thought it would be of interest to you. I wouldn't see this from a command line as I wouldn't need to reuse the configuration after running multirun. I was just trying to point out that something in Hydra is either mutating the structured config or there is some caching not being cleared. I'm guessing that this isn't a desired behavior of Hydra no matter how it's executed.

@Jasha10
Copy link
Collaborator

Jasha10 commented May 17, 2021

Here is a minimal repro that does not call Hydra internals.

# my_app.py
from dataclasses import dataclass

import hydra
from hydra.core.config_store import ConfigStore
from omegaconf import OmegaConf

@dataclass
class Test:
    dim: int = 1

cs = ConfigStore.instance()
cs.store(name="test_config", node=Test)

@hydra.main(config_path=None, config_name="test_config")
def my_app(cfg):
    pass

if __name__ == "__main__":
    assert OmegaConf.create(Test) == {"dim": 1}  # passes
    my_app()
    assert OmegaConf.create(Test) == {"dim": 1}  # fails
$ python my_app.py --multirun hydra/launcher=submitit_local
[2021-05-17 13:31:03,056][HYDRA] Submitit 'local' sweep output dir : multirun/2021-05-17/13-31-02
[2021-05-17 13:31:03,058][HYDRA]        #0 :
Traceback (most recent call last):
  File "/home/jbss/tmp/my_app.py", line 22, in <module>
    assert OmegaConf.create(Test) == {"dim": 1}  # fails
AssertionError

@omry
Copy link
Collaborator

omry commented May 17, 2021

This happens in 1.0 as well.
@jgbos, do you have any reason to think this is a regression in 1.1?

Something is happening to the dataclass itself. After the run, dataclasses.fields(Test) returns an empty Tuple.

# my_app.py
import copy
from dataclasses import dataclass
import dataclasses

import hydra
from hydra.core.config_store import ConfigStore


@dataclass
class Test:
    dim: int = 1


cs = ConfigStore.instance()
cs.store(name="test_config", node=copy.deepcopy(Test))


@hydra.main(config_path=None, config_name="test_config")
def my_app(cfg):
    pass


if __name__ == "__main__":
    print("before", dataclasses.fields(Test))
    my_app()
    print("after", dataclasses.fields(Test))
/home/omry/miniconda3/envs/hydra38/bin/python /home/omry/dev/hydra/1.py -m hydra/launcher=joblib
before (Field(name='dim',type=<class 'int'>,default=1,default_factory=<dataclasses._MISSING_TYPE object at 0x7f7014ca2940>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD),)
[2021-05-17 12:48:16,551][HYDRA] Joblib.Parallel(n_jobs=-1,backend=loky,prefer=processes,require=None,verbose=0,timeout=None,pre_dispatch=2*n_jobs,batch_size=auto,temp_folder=None,max_nbytes=None,mmap_mode=r) is launching 1 jobs
[2021-05-17 12:48:16,551][HYDRA] Launching jobs, sweep output dir : multirun/2021-05-17/12-48-16
[2021-05-17 12:48:16,551][HYDRA] 	#0 : 
after ()

@jgbos
Copy link
Contributor Author

jgbos commented May 17, 2021

Hmm, interesting. I thought I saw some behavior change from 1.1.0.dev5 to 1.1.0.dev7, but running the code on older versions does show this same behavior. Here is what I was doing:

  • Run hydra.multirun like I did above (not from command line) to submit slurm jobs in the notebook. This allows me to access the JobReturn.return_value right away in the notebook for plotting, etc.
  • I could run the submission with the configuration multiple times without an issue before 1.1.0.dev7, but after upgrading it no longer works

I'll see if I can reproduce my "successful" runs on older versions of Hydra doing slurm submissions.

@omry
Copy link
Collaborator

omry commented May 17, 2021

I am seeing the same behavior in Hydra 1.0.6 (not 1.1 at all). There is something really weird going on here but I don't think it's new.

@omry
Copy link
Collaborator

omry commented May 17, 2021

A workaround seems to be to place the config class in a different module (not the main one).
This feels like a bug in one of the underlying libraries. The one library shared between both submitit and joblib is cloudpickle. Maybe it's doing something nasty.

I think a good plan here is to try to eliminate layers to determine where this is coming from. This is a pretty hard one. I am not sure what could be causing it.

# cfg.py
from dataclasses import dataclass


@dataclass
class Test:
    dim: int = 1
# my_app.py
from cfg import Test
import hydra
from hydra.core.config_store import ConfigStore
from omegaconf import OmegaConf


cs = ConfigStore.instance()
cs.store(name="test_config", node=Test)


@hydra.main(config_path=None, config_name="test_config")
def my_app(cfg):
    pass


if __name__ == "__main__":
    assert OmegaConf.create(Test) == {"dim": 1}  # passes
    my_app()
    assert OmegaConf.create(Test) == {"dim": 1}  # ALSO PASSES

@jgbos
Copy link
Contributor Author

jgbos commented May 18, 2021

I had a feeling it might be related to pickling last night given the launchers used. Just searched cloudpickle and found this issue:

cloudpipe/cloudpickle#386

Do you think this captures what's going on:

import cloudpickle
from dataclasses import dataclass
import dataclasses

@dataclass
class Test:
    dim: int = 1
        
print(dataclasses.fields(Test))
t2 = cloudpickle.loads(cloudpickle.dumps(Test))
print(dataclasses.fields(t2))

output:

(Field(name='dim',type=<class 'int'>,default=1,default_factory=<dataclasses._MISSING_TYPE object at 0x7f47977bdc40>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD),)
()

@Jasha10
Copy link
Collaborator

Jasha10 commented May 18, 2021

Do you think this captures what's going on:

Definitely looks related...

@omry
Copy link
Collaborator

omry commented May 18, 2021

It's worse than this:
Decoding the dataclass breaks the existing defined class:

import cloudpickle
from dataclasses import dataclass
import dataclasses


@dataclass
class Test:
    dim: int = 1


print(dataclasses.fields(Test))
_unused_deserialized_class = cloudpickle.loads(cloudpickle.dumps(Test))
print(dataclasses.fields(Test))

Output:

(Field(name='dim',type=<class 'int'>,default=1,default_factory=<dataclasses._MISSING_TYPE object at 0x7f2ecc1629d0>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD),)
()

@jgbos
Copy link
Contributor Author

jgbos commented May 18, 2021

Just to add one more example, similar to @omry's example loading from a separate file:

import dataclasses
from dataclasses import dataclass

import cloudpickle


def build_class():
    @dataclass
    class Test:
        dim: int = 1

    return Test


test = build_class()

print(dataclasses.fields(test))
_unused_deserialized_class = cloudpickle.loads(cloudpickle.dumps(test))
print(dataclasses.fields(build_class()))

Output

(Field(name='dim',type=<class 'int'>,default=1,default_factory=<dataclasses._MISSING_TYPE object at 0x7f51a0113b20>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD),)
(Field(name='dim',type=<class 'int'>,default=1,default_factory=<dataclasses._MISSING_TYPE object at 0x7f51a0113b20>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD),)

@omry
Copy link
Collaborator

omry commented May 18, 2021

Closing as this is not a Hydra bug. feel free to continue the discussion here.

@omry omry closed this as completed May 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants