Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diffprivlib v0.3 #18

Merged
merged 99 commits into from
Jun 26, 2020
Merged

Diffprivlib v0.3 #18

merged 99 commits into from
Jun 26, 2020

Conversation

naoise-h
Copy link
Member

@naoise-h naoise-h commented May 26, 2020

The PR updates Diffprivlib to version 0.3. This update includes a number of new additions, as well as various fixes to existing functionality. This version of diffprivlib supports Python 3.5 through 3.8.

The updates are summarised as follows.

Added

  • BudgetAccountant class to keep track of privacy budget spent in a script (and associated notebook).
  • Budget class to allow easy comparison (with <, >, etc) between privacy budgets of the form (epsilon, delta).
  • count_nonzero, sum and nansum functions to calculate a differentially private count and sum on an array or list.
  • GaussianDiscrete mechanism, the discrete analogue to the popular Gaussian mechanism.
  • clip_to_bounds and clip_to_norm to clip input data to the given bounds/norm; used in tools and models as appropriate.

Changed

Breaking:

  • The form/syntax of the bounds parameter passed to tools and models has changed; it is now specified as a tuple of the form (min, max). min and max can be scalars or 1-dimensional arrays.
    Bounds can typically be converted to the new form with new_bounds = ([l for l, _ in bounds], [u for _, u in bounds]).
  • All functions (other than histogram functions) that previously required a range parameter now requires bounds instead (e.g. models.LinearRegression, models.StandardScaler, tools.mean, etc.).

Non-breaking:

  • Diffprivlib now requires scikit-learn version 0.22 or later.
  • Geometric mechanism now has default sensitivity=1.This reflects the typical use of the geometric mechanism on count queries with sensitivity 1.
  • All mechanisms now support zero sensitivity.

Fixed

  • The publicly-exposed class counts in models.GaussianNB now satisfy differential privacy. The class_count_ attribute is therefore noisy, and care must be taken in relying on these values for testing or other purposes.
  • mean, std and var tools, and their NaN equivalents, no longer require numpy array inputs, and can take all array-like inputs (e.g. scalars, lists and tuples).
  • Sensitivity calculation when randomising scalar-valued var output.

Note: Although backward compatibility is broken by this release, I propose not incrementing the major version number to reflect the library's overall beta development status.

@lgtm-com
Copy link

lgtm-com bot commented May 26, 2020

This pull request introduces 1 alert when merging a0bd2fa into 17cb421 - view on LGTM.com

new alerts:

  • 1 for Unnecessary 'else' clause in loop

@naoise-h naoise-h marked this pull request as ready for review May 26, 2020 20:13
marcosimioni
marcosimioni previously approved these changes May 29, 2020
Copy link

@marcosimioni marcosimioni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comments please - hope they make sense. I'll approve in the meanwhile, you can go ahead with Stefano's review once you're happy with mine. Cheers!

diffprivlib/mechanisms/geometric.py Outdated Show resolved Hide resolved
diffprivlib/mechanisms/geometric.py Outdated Show resolved Hide resolved
diffprivlib/models/naive_bayes.py Show resolved Hide resolved
diffprivlib/models/naive_bayes.py Show resolved Hide resolved
diffprivlib/models/pca.py Outdated Show resolved Hide resolved
diffprivlib/validation.py Outdated Show resolved Hide resolved
diffprivlib/validation.py Show resolved Hide resolved
@marcosimioni
Copy link

@naoise-h I left you a bunch of comments, and approved. But first, please ask @stefano81 for his review first, before merging. Thanks!!

- Naive Bayes error message consistency
- Correct reading of bounds from data in PCA
- None no longer permitted as input to check_bounds
Copy link
Member

@stefano81 stefano81 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, check the comments.

diffprivlib/mechanisms/vector.py Outdated Show resolved Hide resolved
diffprivlib/models/k_means.py Show resolved Hide resolved
diffprivlib/models/k_means.py Show resolved Hide resolved
diffprivlib/models/standard_scaler.py Outdated Show resolved Hide resolved
diffprivlib/models/standard_scaler.py Show resolved Hide resolved
diffprivlib/models/standard_scaler.py Show resolved Hide resolved
diffprivlib/models/standard_scaler.py Show resolved Hide resolved
- Parenthesis in Vector mechanism to enhance readibility
- Explaining use of new accountant in StandardScaler
Copy link

@marcosimioni marcosimioni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@stefano81 stefano81 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@naoise-h naoise-h merged commit 731da4a into master Jun 26, 2020
@naoise-h naoise-h deleted the dev branch January 29, 2021 17:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants