22
loading...
This website collects cookies to deliver better user experience
==
operator, and it will work. dict
s (dict diff)dict
s that have floating-point numbers as valuesYeah! You could use the ==
operator, off course!
>>> a = {
'number': 1,
'list': ['one', 'two']
}
>>> b = {
'list': ['one', 'two'],
'number': 1
}
>>> a == b
True
False
but can we tell where do they differ?>>> a = {
'number': 1,
'list': ['one', 'two']
}
>>> b = {
'list': ['one', 'two'],
'number': 2
}
>>> a == b
False
Hum... Just False
doesn't tell us much...
str
's inside the list
. Let's say that we want to ignore their cases.>>> a = {
'number': 1,
'list': ['ONE', 'two']
}
>>> b = {
'list': ['one', 'two'],
'number': 1
}
>>> a == b
False
float
and we consider two floats to be the same if they have at least 3 significant digits equal? Put another way, we want to check if only 3 digits after the decimal point match.>>> a = {
'number': 1,
'list': ['one', 'two']
}
>>> b = {
'list': ['one', 'two'],
'number': 1.00001
}
>>> a == b
False
list
key->value from the check. Unless we create a new dictionary without it, there's no method to do that for you.Can't it get any worse?
numpy
array?>>> a = {
'number': 1,
'list': ['one', 'two'],
'array': np.ones(3)
}
>>> b = {
'list': ['one', 'two'],
'number': 1,
'array': np.ones(3)
}
>>> a == b
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-4-eeadcaeab874> in <module>
----> 1 a == b
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Damn it, what can we do then?
dict
s cannot perform advanced comparisons, there are only two forms of achieving that. You can either implement the functionality yourself or use a third party library. At some point in your life you probably heard about not reinventing the wheel. So that's precisely what we're going to do in this tutorial. deepdiff
, from zepworks. deepdiff
can pick up the difference between dictionaries, iterables, strings and other objects. It accomplishes that by searching for changes in a recursively manner.deepdiff
is not the only kid on the block, there's also Dictdiffer, developed by the folks at CERN. Dictdiffer
is also cool but lacks a lot of the features that make deepdiff
so interesting. In any case, I encourage you to look at both and determine which one works best for you.dict
s. Consider the following code snippet, but this time using deepdiff
.In [1]: from deepdiff import DeepDiff
In [2]: a = {
...: 'number': 1,
...: 'list': ['one', 'two']
...: }
In [3]: b = {
...: 'list': ['one', 'two'],
...: 'number': 2
...: }
In [4]: diff = DeepDiff(a, b)
In [5]: diff
Out[5]: {'values_changed': {"root['number']": {'new_value': 2, 'old_value': 1}}}
'number'
had value 1 but the new dict
, b, has a new value, 2."one"
as "ONE"
ignore_string_case=True
In [10]: a = {
...: 'number': 1,
...: 'list': ['ONE', 'two']
...: }
...:
In [11]: b = {
...: 'list': ['one', 'two'],
...: 'number': 1
...: }
In [12]: diff = DeepDiff(a, b, ignore_string_case=True)
In [13]: diff
Out[13]: {}
In [14]: diff = DeepDiff(a, b)
In [15]: diff
Out[15]:
{'values_changed': {"root['list'][0]": {'new_value': 'one',
'old_value': 'ONE'}}}
float
number that we only wanted to check if the first 3 significant digits were equal. With DeepDiff
it's possible to pass the exact number of digits AFTER the decimal point. Also, since float
s differ from int
's, we might want to ignore type comparison as well. We can solve that by setting ignore_numeric_type_changes=True
.In [16]: a = {
...: 'number': 1,
...: 'list': ['one', 'two']
...: }
In [17]: b = {
...: 'list': ['one', 'two'],
...: 'number': 1.00001
...: }
In [18]: diff = DeepDiff(a, b)
In [19]: diff
Out[19]:
{'type_changes': {"root['number']": {'old_type': int,
'new_type': float,
'old_value': 1,
'new_value': 1.00001}}}
In [24]: diff = DeepDiff(a, b, significant_digits=3, ignore_numeric_type_changes=True)
In [25]: diff
Out[25]: {}
numpy
array in it we failed miserably. Fortunately, DeepDiff
has our backs here. It supports numpy
objects by default!In [27]: import numpy as np
In [28]: a = {
...: 'number': 1,
...: 'list': ['one', 'two'],
...: 'array': np.ones(3)
...: }
In [29]: b = {
...: 'list': ['one', 'two'],
...: 'number': 1,
...: 'array': np.ones(3)
...: }
In [30]: diff = DeepDiff(a, b)
In [31]: diff
Out[31]: {}
What if the arrays are different?
In [28]: a = {
...: 'number': 1,
...: 'list': ['one', 'two'],
...: 'array': np.ones(3)
...: }
In [32]: b = {
...: 'list': ['one', 'two'],
...: 'number': 1,
...: 'array': np.array([1, 2, 3])
...: }
In [33]: diff = DeepDiff(a, b)
In [34]: diff
Out[34]:
{'type_changes': {"root['array']": {'old_type': numpy.float64,
'new_type': numpy.int64,
'old_value': array([1., 1., 1.]),
'new_value': array([1, 2, 3])}}}
datetime
objects. This kind of object has the following signature:class datetime.datetime(year, month, day, hour=0, minute=0, second=0, microsecond=0, tzinfo=None, *, fold=0)
dict
with datetime
objects, DeepDiff
allows us to compare only certain parts of it. For instance, if only care about year, month, and day, then we can truncate it.In [1]: import datetime
In [2]: from deepdiff import DeepDiff
In [3]: a = {
'list': ['one', 'two'],
'number': 1,
'date': datetime.datetime(2020, 6, 17, 22, 45, 34, 513371)
}
In [4]: b = {
'list': ['one', 'two'],
'number': 1,
'date': datetime.datetime(2020, 6, 17, 12, 12, 51, 115791)
}
In [5]: diff = DeepDiff(a, b, truncate_datetime='day')
In [6]: diff
Out[7]: {}
dict
s to store strings values. Having a better way of contrasting them can help us a lot! In this section I'm going to explain you another lovely feature, the str
diff.In [13]: from pprint import pprint
In [17]: b = {
...: 'number': 1,
...: 'text': 'hi,\n my awesome world!'
...: }
In [18]: a = {
...: 'number': 1,
...: 'text': 'hello, my\n dear\n world!'
...: }
In [20]: ddiff = DeepDiff(a, b, verbose_level=2)
In [21]: pprint(ddiff, indent=2)
{ 'values_changed': { "root['text']": { 'diff': '--- \n'
'+++ \n'
'@@ -1,3 +1,2 @@\n'
'-hello, my\n'
'- dear\n'
'- world!\n'
'+hi,\n'
'+ my awesome world!',
'new_value': 'hi,\n my awesome world!',
'old_value': 'hello, my\n'
' dear\n'
' world!'}}}
text
field.In [17]: b = {
...: 'number': 1,
...: 'text': 'hi,\n my awesome world!'
...: }
In [18]: a = {
...: 'number': 1,
...: 'text': 'hello, my\n dear\n world!'
...: }
In [26]: ddiff = DeepDiff(a, b, verbose_level=2, exclude_paths=["root['text']"])
...:
In [27]: ddiff
Out[27]: {}
DeepDiff
also allow you to pass a regex expression. Check this out: https://zepworks.com/deepdiff/current/exclude_paths.html#exclude-regex-paths.dict
's is a common use case since they can used to store almost any kind of data. As a result, having a proper tool to easy this effort is indispensable. DeepDiff
has many features and can do reasonably advanced comparisons. If you ever need to compare dict
's go check it out.