Skip to content

Commit

Permalink
Fix documentation for new syntax on GAN synthesizers (#479)
Browse files Browse the repository at this point in the history
* Syntax for GAN calls

Co-authored-by: Joshua <[email protected]>
  • Loading branch information
joshua-oss and joshua-oss committed Jul 27, 2022
1 parent 4fcfa00 commit 17030ab
Show file tree
Hide file tree
Showing 2 changed files with 39 additions and 66 deletions.
60 changes: 3 additions & 57 deletions synth/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@

Differentially private synthesizers for tabular data. Package includes:
* MWEM
* MST
* QUAIL
* DP-CTGAN
* PATE-CTGAN
Expand All @@ -19,66 +20,11 @@ pip install smartnoise-synth

## Using

### MWEM

```python
import snsynth
import pandas as pd
import numpy as np

pums = pd.read_csv(pums_csv_path, index_col=None) # in datasets/
pums = pums.drop(['income'], axis=1)
nf = pums.to_numpy().astype(int)

synth = snsynth.MWEMSynthesizer(epsilon=1.0, split_factor=nf.shape[1])
synth.fit(nf)

sample = synth.sample(10)
print(sample)
```
### DP-CTGAN

```python
import snsynth
import pandas as pd
import numpy as np

from snsynth.pytorch.nn import DPCTGAN
from snsynth.pytorch import PytorchDPSynthesizer

pums = pd.read_csv(pums_csv_path, index_col=None) # in datasets/
pums = pums.drop(['income'], axis=1)

synth = PytorchDPSynthesizer(1.0, DPCTGAN(), None)
synth.fit(pums, categorical_columns=pums.columns)

sample = synth.sample(10) # synthesize 10 rows
print(sample)
```

### PATE-CTGAN

```python
import snsynth
import pandas as pd
import numpy as np

from snsynth.pytorch.nn import PATECTGAN
from snsynth.pytorch import PytorchDPSynthesizer

pums = pd.read_csv(pums_csv_path, index_col=None) # in datasets/
pums = pums.drop(['income'], axis=1)

synth = PytorchDPSynthesizer(1.0, PATECTGAN(regularization='dragan'), None)
synth.fit(pums, categorical_columns=pums.columns)

sample = synth.sample(10) # synthesize 10 rows
print(sample)
```
Please see the [SmartNoise synthesizers documentation](https://docs.smartnoise.org/synth/index.html) for usage examples.

## Note on Inputs

MWEM, DP-CTGAN, and PATE-CTGAN require columns to be categorical. If you have columns with continuous values, you should discretize them before fitting. Take care to discretize in a way that does not reveal information about the distribution of the data.
MWEM and MST require columns to be categorical. If you have columns with continuous values, you should discretize them before fitting. Take care to discretize in a way that does not reveal information about the distribution of the data.

## Communication

Expand Down
45 changes: 36 additions & 9 deletions synth/docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Getting Started
MWEM
----

Multiplicative Weights Exponential Mechanism.
Multiplicative Weights Exponential Mechanism. From "`A Simple and Practical Algorithm for Differentially Private Data Release <https://www.cs.huji.ac.il/~katrina//papers/mwem-nips.pdf>`_".

.. code-block:: python
Expand All @@ -41,49 +41,76 @@ Multiplicative Weights Exponential Mechanism.
DP-CTGAN
--------

Conditional tabular GAN with differentially private stochastic gradient descent.
Conditional tabular GAN with differentially private stochastic gradient descent. From "`Modeling Tabular data using Conditional GAN <https://arxiv.org/abs/1907.00503>`_".

.. code-block:: python
imprt snsynth
import snsynth
import pandas as pd
import numpy as np
from snsynth.pytorch.nn import DPCTGAN
from snsynth.pytorch import PytorchDPSynthesizer
from snsynth.preprocessors.data_transformer import BaseTransformer
pums = pd.read_csv(pums_csv_path, index_col=None) # in datasets/
pums = pums.drop(['income'], axis=1)
synth = PytorchDPSynthesizer(1.0, DPCTGAN(), None)
synth.fit(pums, categorical_columns=pums.columns)
synth = PytorchDPSynthesizer(1.0, DPCTGAN())
synth.fit(pums, categorical_columns=list(pums.columns), transformer=BaseTransformer)
sample = synth.sample(10) # synthesize 10 rows
print(sample)
PATE-CTGAN
----------

Conditional tabular GAN using Private Aggregation of Teacher Ensembles.
Conditional tabular GAN using Private Aggregation of Teacher Ensembles. From "`PATE-GAN: Generating Synthetic Data with Differential Privacy Guarantees <https://openreview.net/pdf?id=S1zk9iRqF7>`_" and "`Modeling Tabular data using Conditional GAN <https://arxiv.org/abs/1907.00503>`_".

.. code-block:: python
imprt snsynth
import snsynth
import pandas as pd
import numpy as np
from snsynth.pytorch.nn import PATECTGAN
from snsynth.pytorch import PytorchDPSynthesizer
from snsynth.preprocessors.data_transformer import BaseTransformer
pums = pd.read_csv(pums_csv_path, index_col=None) # in datasets/
pums = pums.drop(['income'], axis=1)
synth = PytorchDPSynthesizer(1.0, PATECTGAN(regularization='dragan'))
synth.fit(pums, categorical_columns=list(pums.columns), transformer=BaseTransformer)
sample = synth.sample(10) # synthesize 10 rows
print(sample)
MST
---

MST achieves state of the art results for marginals over categorical data, and does well even with small source data. From McKenna et al. "`Winning the NIST Contest: A scalable and general approach to differentially private synthetic data <https://arxiv.org/abs/2108.04978>`_"

.. code-block:: python
import snsynth
import pandas as pd
import numpy as np
pums = pd.read_csv(pums_csv_path, index_col=None) # in datasets/
pums = pums.drop(['income'], axis=1)
synth = PytorchDPSynthesizer(1.0, PATECTGAN(regularization='dragan'), None)
synth.fit(pums, categorical_columns=pums.columns)
Domains = {"pums": "samples/mst_sample/pums-domain.json"} # in samples/mst_sample
synth = snsynth.MSTSynthesizer(domains_dict=Domains, domain="pums", epsilon=1.0)
synth.fit(pums)
sample = synth.sample(10) # synthesize 10 rows
print(sample)
For more, see the `sample notebook <https://github.com/opendp/smartnoise-sdk/tree/main/synth/samples/mst_sample>`_


This is version |version| of the guides, last built on |today|.

.. |opendp-logo| image:: _static/images/opendp-logo.png
Expand Down

0 comments on commit 17030ab

Please sign in to comment.