Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dictionary intersection #490

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Conversation

groutr
Copy link
Contributor

@groutr groutr commented Jun 24, 2020

An efficient way to intersect dictionaries based on their keys. The motivating case was to replace the below line with something nicer and more general.

rv = {i: (d1[i], d2[i]) for i in d1.keys() & d2.keys()}

This function is generalized to compute the intersection of more than two dictionaries and handle generic mappings.

@eriknw, I think other operations on dictionary views would be useful. However, those other operations don't seem to fit the calling conventions for the functions here because they are only well-defined for two dictionaries (operations like difference, symmetric difference, etc). Union would be similar to merge, but preserves all the values from each input dictionary.

@eriknw
Copy link
Member

eriknw commented Oct 28, 2021

This seems like a useful and intuitive function to have.

With the addition of https://www.python.org/dev/peps/pep-0584/ in Python 3.9, is this still useful? Perhaps.

Advantages:

  • may be able to merge multiple mappings more efficiently
  • accepts factory= argument like other functions in dicttoolz

I'm curious how efficient we can do this in Cython.

@eriknw
Copy link
Member

eriknw commented Oct 28, 2021

oops, this is intersect d & e (not supported by dicts), not union d | e. my bad. I'll play around with this.

@groutr
Copy link
Contributor Author

groutr commented Oct 28, 2021

Regarding how efficiently this can be done in cython, it breaks down to how efficiently 1) an intersection of the keys can be computed and 2) pulling each of those common keys from each dictionary. I don't think it could be more efficient than O(nm) where n is the number of dictionaries and m is the number of common keys. This is assuming that inserting and retrieving from the mapping is O(1).

dicts = dicts[0]
factory = _get_factory(merge, kwargs)

dict_keys = map(operator.methodcaller('keys'), sorted(dicts, key=len))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I need to test without the sort. Sorting might be slowing things down here.

@eriknw
Copy link
Member

eriknw commented Oct 28, 2021

I wonder if this functionality would be better added to merge (and merge_with) by adding an optional keyword argument.

@groutr
Copy link
Contributor Author

groutr commented May 2, 2022

I wonder if this functionality would be better added to merge (and merge_with) by adding an optional keyword argument.

would that look something like merge_with(max, mappings, method='intersect', factory=dict)
The functionality could certainly be added to merge/merge_with. One thing I would want to think about is how complex the implementation might become with different merge methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants