Diffprivlib v0.3 #18

naoise-h · 2020-05-26T16:42:45Z

The PR updates Diffprivlib to version 0.3. This update includes a number of new additions, as well as various fixes to existing functionality. This version of diffprivlib supports Python 3.5 through 3.8.

The updates are summarised as follows.

Added

BudgetAccountant class to keep track of privacy budget spent in a script (and associated notebook).
Budget class to allow easy comparison (with <, >, etc) between privacy budgets of the form (epsilon, delta).
count_nonzero, sum and nansum functions to calculate a differentially private count and sum on an array or list.
GaussianDiscrete mechanism, the discrete analogue to the popular Gaussian mechanism.
clip_to_bounds and clip_to_norm to clip input data to the given bounds/norm; used in tools and models as appropriate.

Changed

Breaking:

The form/syntax of the bounds parameter passed to tools and models has changed; it is now specified as a tuple of the form (min, max). min and max can be scalars or 1-dimensional arrays.
Bounds can typically be converted to the new form with new_bounds = ([l for l, _ in bounds], [u for _, u in bounds]).
All functions (other than histogram functions) that previously required a range parameter now requires bounds instead (e.g. models.LinearRegression, models.StandardScaler, tools.mean, etc.).

Non-breaking:

Diffprivlib now requires scikit-learn version 0.22 or later.
Geometric mechanism now has default sensitivity=1.This reflects the typical use of the geometric mechanism on count queries with sensitivity 1.
All mechanisms now support zero sensitivity.

Fixed

The publicly-exposed class counts in models.GaussianNB now satisfy differential privacy. The class_count_ attribute is therefore noisy, and care must be taken in relying on these values for testing or other purposes.
mean, std and var tools, and their NaN equivalents, no longer require numpy array inputs, and can take all array-like inputs (e.g. scalars, lists and tuples).
Sensitivity calculation when randomising scalar-valued var output.

Note: Although backward compatibility is broken by this release, I propose not incrementing the major version number to reflect the library's overall beta development status.

lgtm-com · 2020-05-26T16:56:07Z

This pull request introduces 1 alert when merging a0bd2fa into 17cb421 - view on LGTM.com

new alerts:

1 for Unnecessary 'else' clause in loop

marcosimioni

see my comments please - hope they make sense. I'll approve in the meanwhile, you can go ahead with Stefano's review once you're happy with mine. Cheers!

diffprivlib/mechanisms/geometric.py

diffprivlib/models/naive_bayes.py

diffprivlib/models/pca.py

diffprivlib/validation.py

marcosimioni · 2020-05-29T16:14:24Z

@naoise-h I left you a bunch of comments, and approved. But first, please ask @stefano81 for his review first, before merging. Thanks!!

- Naive Bayes error message consistency - Correct reading of bounds from data in PCA - None no longer permitted as input to check_bounds

stefano81

Please, check the comments.

diffprivlib/mechanisms/transforms/roundedinteger.py

diffprivlib/mechanisms/vector.py

diffprivlib/models/k_means.py

diffprivlib/models/standard_scaler.py

- Parenthesis in Vector mechanism to enhance readibility - Explaining use of new accountant in StandardScaler

marcosimioni

LGTM

stefano81

LGTM

naoise-h added 30 commits January 16, 2020 13:24

Removing support for Python 3.4

3b55c63

Switching to Python 3.8 for pylint checks

2e45d95

Updating notebooks to use score method

08763e8

Adding tests

2ecbac5

Adding coverage check to travis

48b3ef9

Adding Discrete Gaussian mechanism

382cc34

Fixing test

3909adc

Adding sensitivity and safe sampling of exp bernoulli

f94f20a

Test nits

a9644dd

Test nits

4acfc1b

Adding first prototype of BudgetAccountant and tests

abc82ec

Adding tests to accountant

0a9ce49

Make check_spend throw BudgetError if budget exceeded

51d3cb5

Changing test filename to match existing

02e4d3e

Adding change_accountant test, typo

90ebc93

Moving check_accountant to utils

880b5cc

Adding accountant to histograms

a354c73

Adding missing docstring

c3cdfba

Fixing LGTM alert, adding accountant test

68d820b

Removing redundant epsilon and delta check

ab02680

Consolidating slack checks

051a0f5

Refactoring BudgetAccountant and its implementation

210a0b4

Adding type-checker to check method

da1ead2

Adding a default accountant

2995d48

Adding min_epsilon to mitigate against floating point rounding errors

f622afe

Adding tests for BudgetAccountant and check_epsilon_delta

19fe28d

Adding support for with() statement in BudgetAccountant

bdfc3c4

Adding accounting to tools/utils

7cf2ffa

Adding accounting to models

da84136

Bypassing total() call in check() with infinite budget

4849c38

naoise-h added 2 commits May 26, 2020 14:58

Updating status to beta and other minor changes

7d51e17

Minor changes to 30seconds notebook and readme

a0bd2fa

naoise-h added 3 commits May 26, 2020 18:04

Replacing single-char variable names

6c82544

LGTM fix

6cf7f4b

Removing support for out and ddof in tools; fix line length

b64842a

naoise-h marked this pull request as ready for review May 26, 2020 20:13

Removing unused parameters

6d9c5f1

naoise-h requested review from marcosimioni and stefano81 May 27, 2020 07:58

naoise-h added 2 commits May 27, 2020 09:40

Updating LogisticRegression notebook

2e2f139

Making accountant attributes private

4c3d159

marcosimioni previously approved these changes May 29, 2020

View reviewed changes

naoise-h added 2 commits June 2, 2020 11:56

Switching to NotImplementedError for bias/variance

aa25286

Minor fixes from Marco's review

c91a29b

- Naive Bayes error message consistency - Correct reading of bounds from data in PCA - None no longer permitted as input to check_bounds

naoise-h dismissed marcosimioni’s stale review via c91a29b June 2, 2020 11:18

Adding count_nonzero to tools

413dbcf

stefano81 requested changes Jun 3, 2020

View reviewed changes

naoise-h added 3 commits June 10, 2020 15:16

Minor changes from Stefano's review

c7bd4f9

- Parenthesis in Vector mechanism to enhance readibility - Explaining use of new accountant in StandardScaler

Fixing Laplace variance bug for non-zero delta

d070798

Adding exploration notebook

33961ab

naoise-h requested a review from marcosimioni June 19, 2020 15:29

naoise-h added 3 commits June 22, 2020 11:22

Minor notebook change

370e193

Adding keepdims to count_nonzero in line with Numpy 1.19

6e1e6c6

Fix logistic_regression_path documentation, thanks Marco

5ce81a9

marcosimioni approved these changes Jun 22, 2020

View reviewed changes

stefano81 approved these changes Jun 22, 2020

View reviewed changes

naoise-h merged commit 731da4a into master Jun 26, 2020

naoise-h deleted the dev branch January 29, 2021 17:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Diffprivlib v0.3 #18

Diffprivlib v0.3 #18

naoise-h commented May 26, 2020 •

edited

Loading

lgtm-com bot commented May 26, 2020

marcosimioni left a comment

marcosimioni commented May 29, 2020

stefano81 left a comment

marcosimioni left a comment

stefano81 left a comment

Diffprivlib v0.3 #18

Diffprivlib v0.3 #18

Conversation

naoise-h commented May 26, 2020 • edited Loading

Added

Changed

Fixed

lgtm-com bot commented May 26, 2020

marcosimioni left a comment

Choose a reason for hiding this comment

marcosimioni commented May 29, 2020

stefano81 left a comment

Choose a reason for hiding this comment

marcosimioni left a comment

Choose a reason for hiding this comment

stefano81 left a comment

Choose a reason for hiding this comment

naoise-h commented May 26, 2020 •

edited

Loading