Skip to content

Commit

Permalink
initial implementation
Browse files Browse the repository at this point in the history
  • Loading branch information
parkerhancock committed Nov 9, 2023
1 parent 0bf192e commit 1b9522a
Show file tree
Hide file tree
Showing 86 changed files with 57,898 additions and 146,022 deletions.
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
[![PyPI](https://img.shields.io/pypi/v/patent-client?color=blue)](https://pypi.org/project/patent-client)
[![PyPI - Python Versions](https://img.shields.io/pypi/pyversions/patent-client)](https://pypi.org/project/patent-client)
[![PyPI - Downloads](https://img.shields.io/pypi/dm/patent-client?color=blue)](https://pypi.org/project/patent-client)
[![Pydantic v2](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/pydantic/pydantic/main/docs/badge/v2.json)](https://pydantic.dev)

# Summary

Expand All @@ -17,6 +18,7 @@ A powerful library for accessing intellectual property, featuring:
- 🐼 **Pandas Integration:** Results are easily castable to [Pandas Dataframes and Series][PANDAS].
- 🚀 **Performance:** Fetched data is retrieved using the [httpx][httpx] library with native HTTP/2 and asyncio support, and cached using the [hishel][hishel] library for super-fast queries, and [yankee][yankee] for data extraction.
- 🌐 **Async/Await Support:** All API's (optionally!) support the async/await syntax.
- 🔮 **Pydantic v2 Support:** All models retrieved are [Pydantic v2 models][pydantic] with all the goodness that comes with them!

Docs, including a fulsome Getting Started and User Guide are available on [Read the Docs](http://patent-client.readthedocs.io). The Examples folder includes examples of using `patent_client` for
many common IP tasks
Expand Down Expand Up @@ -52,6 +54,7 @@ many common IP tasks
[PTAB]: https://developer.uspto.gov/api-catalog/ptab-api-v2
[USPTO]: http://developer.uspto.gov
[GD]: https://globaldossier.uspto.gov
[pydantic]: https://docs.pydantic.dev/latest/


## Installation
Expand Down
107,339 changes: 53,891 additions & 53,448 deletions cassettes/README.md.yaml

Large diffs are not rendered by default.

73,870 changes: 0 additions & 73,870 deletions cassettes/README/README.md.yaml

This file was deleted.

1 change: 1 addition & 0 deletions docs/changelog.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
```{include} ../CHANGELOG.md
90 changes: 90 additions & 0 deletions docs/developer/cassettes/overview.md.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -647,4 +647,94 @@ interactions:
- Access-Control-Request-Headers
http_version: HTTP/2
status_code: 200
- request:
body: ''
headers:
accept:
- '*/*'
accept-encoding:
- gzip, deflate
connection:
- keep-alive
host:
- developer.uspto.gov
user-agent:
- Mozilla/5.0 Python Patent Clientbot/3.2.10 ([email protected])
method: GET
uri: https://developer.uspto.gov/ptab-api/documents?proceedingNumber=IPR2017-00001&recordStartNumber=0&recordTotalQuantity=1&sortOrderCategory=
response:
content: '{"aggregationData":{},"results":[{"filingPartyCategory":"PATENT OWN","documentNumber":"50","documentName":"IPR2017-00001NOAFWD.pdf","documentCategory":"Paper","documentTitleText":"Patent
Owner Notice of Appeal","proceedingNumber":"IPR2017-00001","documentFilingDate":"05-16-2018","documentTypeName":"Notice
of Appeal","proceedingTypeCategory":"AIA Trial","subproceedingTypeCategory":"IPR","documentIdentifier":"169664317","objectUuId":"workspace://SpacesStore/706413f4-6b98-4788-adbf-db7d101ae208;1.0","respondentTechnologyCenterNumber":"2600","respondentPatentOwnerName":"Petite
et al","respondentPartyName":"SIPCO, LLC","respondentGroupArtUnitNumber":"2612","respondentCounselName":"Thomas
Meagher","respondentGrantDate":"12-23-2008","respondentPatentNumber":"7468661","respondentApplicationNumberText":"11395685","petitionerPartyName":"Emerson
Electric Co.","petitionerCounselName":"Steven Pepe","additionalRespondentPartyDataBag":[]}],"recordTotalQuantity":109}'
headers:
access-control-allow-credentials:
- 'true'
access-control-allow-headers:
- accept, authorization, content-type, x-requested-with
access-control-allow-methods:
- GET, POST, OPTIONS, PUT
access-control-allow-origin:
- '*'
access-control-max-age:
- '1'
content-type:
- application/json
date:
- Thu, 09 Nov 2023 20:00:47 GMT
strict-transport-security:
- max-age=31536000;
vary:
- Origin
- Access-Control-Request-Method
- Access-Control-Request-Headers
http_version: HTTP/2
status_code: 200
- request:
body: ''
headers:
accept:
- '*/*'
accept-encoding:
- gzip, deflate
connection:
- keep-alive
host:
- developer.uspto.gov
user-agent:
- Mozilla/5.0 Python Patent Clientbot/3.2.10 ([email protected])
method: GET
uri: https://developer.uspto.gov/ptab-api/proceedings?proceedingNumber=IPR2017-00001&recordStartNumber=0&recordTotalQuantity=1&sortOrderCategory=
response:
content: '{"aggregationData":{},"results":[{"institutionDecisionDate":"03-27-2017","proceedingFilingDate":"10-01-2016","accordedFilingDate":"10-01-2016","proceedingStatusCategory":"FWD
Entered","proceedingNumber":"IPR2017-00001","proceedingLastModifiedDate":"06-13-2022","proceedingTypeCategory":"AIA
Trial","subproceedingTypeCategory":"IPR","respondentTechnologyCenterNumber":"2600","respondentPatentOwnerName":"Petite
et al","respondentPartyName":"SIPCO, LLC","respondentGroupArtUnitNumber":"2612","respondentCounselName":"Thomas
Meagher","respondentGrantDate":"12-23-2008","respondentPatentNumber":"7468661","respondentApplicationNumberText":"11395685","petitionerPartyName":"Emerson
Electric Co.","petitionerCounselName":"Steven Pepe","decisionDate":"03-16-2018","additionalRespondentPartyDataBag":[]}],"recordTotalQuantity":1}'
headers:
access-control-allow-credentials:
- 'true'
access-control-allow-headers:
- accept, authorization, content-type, x-requested-with
access-control-allow-methods:
- GET, POST, OPTIONS, PUT
access-control-allow-origin:
- '*'
access-control-max-age:
- '1'
content-type:
- application/json
date:
- Thu, 09 Nov 2023 20:00:47 GMT
strict-transport-security:
- max-age=31536000;
vary:
- Origin
- Access-Control-Request-Method
- Access-Control-Request-Headers
http_version: HTTP/2
status_code: 200
version: 1
145 changes: 116 additions & 29 deletions docs/developer/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,47 +3,134 @@
The goal of this project is to provide easy-to-use access to public patent data through a simple API.
The general idea is to implement a subset of the
[Django QuerySet API](https://docs.djangoproject.com/en/2.1/ref/models/querysets/). functionality for accessing
the various sources of patent data.
the various sources of patent data. This is a form of the [Active Record](https://en.wikipedia.org/wiki/Active_record_pattern)
pattern. To achieve this, the "record" is a Pydantic model, and it has a "manager" that is located at `.objects`.

To facilitate this, two base classes are provided as scaffolding for adding new APIs - *Manager* and *Model* (both located in the patent_client.util module). The
Django ORM implements its functionality using three classes - a Model class that models a single record in a database,
a Manager class that provides a generic way of accessing that data, and a QuerySet that allows for sorting / filtering of data.
Here, we omit the separate QuerySet and Manager, and instead use a single Manager class that handles both QuerySet and Manager
functions.

## Managers
## Basic Structure

The basic structure of Patent Client API wrapper looks like this:

- `some_api`
- `__init__.py`
- `api.py`
- `model.py`
- `manager.py`

The `model.py` file should contain [Pydantic v2](https://docs.pydantic.dev/latest/) models representing the output of the API, using `patent_client.util.pydantic_utils.BaseModel`
instead of `pydantic.BaseModel`. The `api.py` file should contain at least one class with methods that call various API functions. The actual structure of the `api.py` file does not matter, but each method should return instances of the Pydantic models defined in `model.py`. The `manager.py` contains a subclass of `patent_client.util.base.manager.Manager` and serves as a wrapper over the API classes in `apy.py` that implement the manager protocol below.

Other files can also be included in the API folder to support other functions. Common ones include:

When filtering, ordering, or values methods are called on a Manager, it returns a new Manager object with a combination of the arguments to the old manager and the new arguments. In this way, any given Manager is *immutable*. A base, blank manager (that would return all records), is attached to searchable models as Model.objects. Most searches will begin with a call to Model.objects.filter, which adds filtering critera to the manager. Like Django managers, they support .order_by, .limit, and .offset (and, in fact, slicing just passes the start and end on to those methods). Managers also support .values and .values_list. Unlike Django, managers also support additional conveinence functions, including:
- `session.py` - if any extensive customization of the base `PatentClientSession`` is necessary, it goes here.
- `convert.py` - if any data conversion is necessary from the API output to the Pydantic input, put that here.
- `query.py` - if complex logic is necessary to convert input to the manager's `.filter` method, put that here.

Each of these is discussed in more detail below.

## API & PatentClientSession

The `apy.py` file should use an instance of `patent_client.session.PatentClientSession` to access the methods of the API using only `async` methods. The `PatentClientSession` is a subclass of the `hishel.AsyncCacheClient` which is itself a subclass of `httpx.AsyncClient`. Patent Client uses this exclusively over the more popular
`requests` library because (1) an increasing number of API's require the use of HTTP/2, which is not supported by requests, and (2) httpx has support for `asyncio`. That said, if you're coming from a `requests` background, fear not! The httpx interface is nearly (but not entirely) identical to `requests`.

## Models

> - Manager.to_list - converts a manger to a list of models
> - Manager.to_pandas - converts a manager to a Pandas dataframe (if pandas is available), with all model attributes as columns
Models are Pydantic Models that subclass `patent_client.util.pydantic_util.BaseModel`. This special version of `BaseModel` automatically
detects the corresponding manager (discussed below) and adds some convenience functions. When used:

Managers also support addition operations. For example, to create an application list with all applications naming two assignees, you could do this:
- The `Model.objects` holds a manager that would retreive every Model in data source
- The `Model` supports a `.to_dict()` method to convert it to a dictionary, and a `to_pandas()` method to convert it to a Pandas series.

Models can use any Pydantic features, such as [computed fields](https://docs.pydantic.dev/2.0/usage/computed_fields/) for additional properties.
Models may also include:

- Relationships - that traverse a relationship to a related model.
- Downloaders - that download some sort of content related to the model.

### Relationships

You can create properties of a Model that link to another model using `patent_client.util.base.related.get_model`. With `get_model`, you can dynamically retrieve
another model, and then use an active record call on that model. `get_model` is preferred over importing the model directly to reduce the risk of circular imports.

Example:

```python
>> apps = USApplication.objects.filter(first_named_applicant='Company A') + USApplication.objects.filter(first_named_applicant='Company B')
class USApplication(BaseModel):
patent_number: str
...
@property
def patent(self):
return get_model("patent_client.Patent").objects.get(self.patent_number)
```

## Models
In that example, if you have a USApplication instance, you can get the corresponding patent at USApplication.patent.

Models are special dataclasses, with some additional functionality baked-in. Fields are present as attributes on the Model. Additionally:
### Downloaders

- The Model.objects holds a manager that would retreive every Model in data source
- The Model supports a .to_dict() method to convert it to a dictionary, and a to_pandas() method to convert it to a Pandas series.
Some models have downloads related to them, like Assignment PDF's or Patent and Publication documents. Downloaders should:

Models can also have custom functions and properties attached to them. These vary from model to model, but consist of:
- Be initially implemented as an asynchronous `.adownload` method that uses the `session.adownload` method on the related session.
- Have a companion `.download` method that simply aliases `.adownload` using `patent_client.util.asyncio_util.run_sync`
- Return a `pathlib.Path` object to the downloaded file.

## Managers

- Transformer methods - that calculate some property based on one or more Fields
- Relationships - that traverse a relationship to a related model
- Downloaders - that download some sort of content related to the model
When filtering, ordering, or values methods are called on a Manager, it returns a new Manager object with a combination of the arguments to the old manager and the new arguments. In this way, any given Manager is *immutable*. The key data in the Manager is in a `ManagerConfig` object stored at `Manager.config`.
Managers require subclassing `patent_client.util.base.Manager` and defining these methods:

Downloaders always return a tempfile.NamedTemporaryFile with the downloaded file contained therein.
`Manager._aget_results`

## Schemas
This method should return an `AsyncIterator` across the model results, based on the contents of the `ManagerConfig` object at `Manager.config`.

Each data source also has a module called a "Schema," which is a deserialization layer that converts raw data obtained by the manager into
models. In general, the data sources accessed by patent_client are either JSON or XML documents. Both use the Marshmallow library to apply
formatting corrections, renaming conventions, etc.
`Manager.alen`

This method should be an async method that returns the number of results to be retrieved by the manager, based on the contents of the `ManagerConfig` object.


### Manager Discovery
A base, blank manager (that would return all records), is attached to searchable models as Model.objects. This is done automatically when
a file is placed in a `model.py` module and there is a corresponding manager in a `manager.py` file. For example:

`model.py`
```python
from patent_client.util.pydantic_util import BaseModel
class Model(BaseModel):
# Some fields
```
`manager.py`
```python
from patent_client.util.base import Manager
class ModelManager(Manager):
# an implementation
```

No additional configuration is needed. If the API is particularly complex, such that `model` and `manager` are packages and not modules, this still works as long as the manager is
listed in the `__init__.py` of the `manager` module. For example:
`model/submodel.py``
```python
from patent_client.util.pydantic_util import BaseModel
class SubModel(BaseModel):
# Some fields
```
`manager/submanager.py`
```python
from patent_client.util.base import Manager
class SubModelManager(Manager):
# an implementation
```
Does not work, *unless* you also have this:
`manager/__init__.py`
```python
from .submanager import SubModelManager
```

Alternatively, you can also expressly define the location of a manager with a string at `__manager__`
`model.py`
```python
from patent_client.util.pydantic_util import BaseModel
class Model(BaseModel):
__manager__ = "patent_client.manager.ModelManager"
```

## Relationships

Expand Down Expand Up @@ -74,10 +161,10 @@ Once these relationships are in place, we can move from one record to the other
>>> from patent_client import PtabProceeding
>>> a = PtabProceeding.objects.get('IPR2017-00001') # doctest +SKIP

>>> a.documents[0]
PtabDocument(document_category='Paper', document_type_name='Notice of Appeal', document_number=50, document_name='IPR2017-00001NOAFWD.pdf', document_filing_date=datetime.date(2018, 5, 16), title=None)
>>> a.documents[0] # doctest +ELLIPSIS
PtabDocument(...)

>>> a.documents[0].proceeding
PtabProceeding(subproceeding_type_category='IPR', proceeding_number='IPR2017-00001', proceeding_status_category='FWD Entered', proceeding_type_category='AIA Trial', respondent_party_name='SIPCO, LLC')
>>> a.documents[0].proceeding # doctest +ELLIPSIS
PtabProceeding(...)

```
13 changes: 0 additions & 13 deletions docs/examples/README.md

This file was deleted.

51 changes: 16 additions & 35 deletions docs/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,43 +20,11 @@ register a new account (Free up to 4GB of data / month, which is usually more th

**Step 2:** Log in to EPO OPS, and click on "My Apps," add a new app, and write down the corresponding API *Consumer Key* and *Consumer Key Secret*.

**Step 3:** Import patent_client from anywhere. E.g.
**Step 3:** Set your environment variables as:

```console
$ python
Python 3.6.5 (default, Jul 12 2018, 11:37:09)
>>> import patent_client

```

This will set up an empty settings file, located at **~/.patent_client_config.yaml**. The config file is a YAML file containing settings for the project.

**Step 4:** Edit the config file to contain your user key and secret. E.g.

```yaml
DEFAULT:
BASE_DIR: ~/.patent_client
LOG_FILE: patent_client.log
LOG_LEVEL: INFO

EPO:
API_KEY: <Key Here>
API_SECRET: <Secret Here>
ITC:
USERNAME:
PASSWORD:

```

**Step 5:** PROFIT! Every time you import a model that requires EPO OPS access, you will automatically be logged on using that key and secret.

### Environment Variables (Less Recommended)

Alternatively, you can set the environment variables as:

```console
PATENT_CLIENT__EPO_API_KEY="<Consumer Key Here>"
PATENT_CLIENT__EPO_SECRET="<Consumer Key Secret Here>"
PATENT_CLIENT_EPO_API_KEY="<Consumer Key Here>"
PATENT_CLIENT_EPO_SECRET="<Consumer Key Secret Here>"
```

## Basic Use
Expand Down Expand Up @@ -187,4 +155,17 @@ Managers also behave like Django QuerySets, and support [values](https://docs.dj
'INTELLIGENT ASSISTANT',
'STYLUS FIRMWARE UPDATES'
]
```
### Async/Await

Patent Client also has optional `async/await` support for all methods that trigger I/O to an API endpoint.
To use the asyncio methods, simply use `async with` for iterators, and call any methods with a `a` prefix:

```python
apps = list()
async for app in USApplication.objects.filter(first_named_applicant="Google"):
apps.append(app)

app = await USApplication.objects.aget("16123456")

```
2 changes: 2 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,9 @@ getting_started
user_guide
examples
api
migration_guide
developer
changelog
```

Expand Down
6 changes: 6 additions & 0 deletions docs/migration_guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# v3 to v4 Migration Guide

If you are migrating from v3 to v4, there are a few things to be aware of:

1. The .json configuration method is no longer supported. You must use environment variables to change patent_client settings.
2. Model fields may have moved around. This new version has less data normalization across the API's than v3.
Loading

0 comments on commit 1b9522a

Please sign in to comment.