Add dictionary intersection #490

groutr · 2020-06-24T15:07:20Z

An efficient way to intersect dictionaries based on their keys. The motivating case was to replace the below line with something nicer and more general.

rv = {i: (d1[i], d2[i]) for i in d1.keys() & d2.keys()}

This function is generalized to compute the intersection of more than two dictionaries and handle generic mappings.

@eriknw, I think other operations on dictionary views would be useful. However, those other operations don't seem to fit the calling conventions for the functions here because they are only well-defined for two dictionaries (operations like difference, symmetric difference, etc). Union would be similar to merge, but preserves all the values from each input dictionary.

itertoolz.get will always return an iterable if passed a list unlike itemgetter.

eriknw · 2021-10-28T04:38:12Z

This seems like a useful and intuitive function to have.

With the addition of https://www.python.org/dev/peps/pep-0584/ in Python 3.9, is this still useful? Perhaps.

Advantages:

may be able to merge multiple mappings more efficiently
accepts factory= argument like other functions in dicttoolz

I'm curious how efficient we can do this in Cython.

eriknw · 2021-10-28T15:14:51Z

oops, this is intersect d & e (not supported by dicts), not union d | e. my bad. I'll play around with this.

groutr · 2021-10-28T18:11:01Z

Regarding how efficiently this can be done in cython, it breaks down to how efficiently 1) an intersection of the keys can be computed and 2) pulling each of those common keys from each dictionary. I don't think it could be more efficient than O(nm) where n is the number of dictionaries and m is the number of common keys. This is assuming that inserting and retrieving from the mapping is O(1).

groutr · 2021-10-28T18:11:47Z

toolz/dicttoolz.py

+        dicts = dicts[0]
+    factory = _get_factory(merge, kwargs)
+
+    dict_keys = map(operator.methodcaller('keys'), sorted(dicts, key=len))


I think I need to test without the sort. Sorting might be slowing things down here.

eriknw · 2021-10-28T18:18:51Z

I wonder if this functionality would be better added to merge (and merge_with) by adding an optional keyword argument.

groutr · 2022-05-02T22:04:54Z

I wonder if this functionality would be better added to merge (and merge_with) by adding an optional keyword argument.

would that look something like merge_with(max, mappings, method='intersect', factory=dict)
The functionality could certainly be added to merge/merge_with. One thing I would want to think about is how complex the implementation might become with different merge methods.

groutr added 4 commits June 24, 2020 09:36

Add dictionary intersection.

48ad367

Add intersect to test suite.

bc3611c

Use itertoolz.get instead of operator.itemgetter

d6b95f2

itertoolz.get will always return an iterable if passed a list unlike itemgetter.

Add intersect exception for curried namespace.

7a624b3

groutr commented Oct 28, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dictionary intersection #490

Add dictionary intersection #490

groutr commented Jun 24, 2020

eriknw commented Oct 28, 2021

eriknw commented Oct 28, 2021

groutr commented Oct 28, 2021

groutr Oct 28, 2021

eriknw commented Oct 28, 2021

groutr commented May 2, 2022

Add dictionary intersection #490

Are you sure you want to change the base?

Add dictionary intersection #490

Conversation

groutr commented Jun 24, 2020

eriknw commented Oct 28, 2021

eriknw commented Oct 28, 2021

groutr commented Oct 28, 2021

groutr Oct 28, 2021

Choose a reason for hiding this comment

eriknw commented Oct 28, 2021

groutr commented May 2, 2022