Skip to content

Commit

Permalink
Merge pull request #117 from ExpediaGroup/feast_merges_jul_18
Browse files Browse the repository at this point in the history
fix: Merging changes from open source feast until Jul 18, 2024
  • Loading branch information
EXPEbdodla committed Jul 18, 2024
2 parents 498119c + f1b1ced commit 99f332a
Show file tree
Hide file tree
Showing 107 changed files with 3,229 additions and 2,118 deletions.
16 changes: 14 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -176,7 +176,7 @@ test-python-universal-mssql:
sdk/python/tests


# To use Athena as an offline store, you need to create an Athena database and an S3 bucket on AWS.
# To use Athena as an offline store, you need to create an Athena database and an S3 bucket on AWS.
# https://docs.aws.amazon.com/athena/latest/ug/getting-started.html
# Modify environment variables ATHENA_REGION, ATHENA_DATA_SOURCE, ATHENA_DATABASE, ATHENA_WORKGROUP or
# ATHENA_S3_BUCKET_NAME according to your needs. If tests fail with the pytest -n 8 option, change the number to 1.
Expand All @@ -203,7 +203,7 @@ test-python-universal-athena:
not s3_registry and \
not test_snowflake" \
sdk/python/tests

test-python-universal-postgres-offline:
PYTHONPATH='.' \
FULL_REPO_CONFIGS_MODULE=sdk.python.feast.infra.offline_stores.contrib.postgres_repo_configuration \
Expand All @@ -221,6 +221,7 @@ test-python-universal-postgres-offline:
not test_push_features_to_offline_store and \
not gcs_registry and \
not s3_registry and \
not test_snowflake and \
not test_universal_types" \
sdk/python/tests

Expand Down Expand Up @@ -343,6 +344,17 @@ test-python-universal-cassandra-no-cloud-providers:
not test_snowflake" \
sdk/python/tests

test-python-universal-singlestore-online:
PYTHONPATH='.' \
FULL_REPO_CONFIGS_MODULE=sdk.python.feast.infra.online_stores.contrib.singlestore_repo_configuration \
PYTEST_PLUGINS=sdk.python.tests.integration.feature_repos.universal.online_store.singlestore \
python -m pytest -n 8 --integration \
-k "not test_universal_cli and \
not gcs_registry and \
not s3_registry and \
not test_snowflake" \
sdk/python/tests

test-python-universal:
python -m pytest -n 8 --integration sdk/python/tests

Expand Down
1 change: 1 addition & 0 deletions community/maintainers.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ In alphabetical order
| Shuchu Han | `shuchu` | [email protected] | Independent |
| Willem Pienaar | `woop` | [email protected] | Cleric |
| Zhiling Chen | `zhilingc` | [email protected] | GetGround |
| Tornike Gurgenidze | `tokoko` | [email protected] | Bank of Georgia |

## Emeritus Maintainers

Expand Down
8 changes: 4 additions & 4 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,10 +39,10 @@ Feast is likely **not** the right tool if you
### Feast does not _fully_ solve

* **reproducible model training / model backtesting / experiment management**: Feast captures feature and model metadata, but does not version-control datasets / labels or manage train / test splits. Other tools like [DVC](https://dvc.org/), [MLflow](https://www.mlflow.org/), and [Kubeflow](https://www.kubeflow.org/) are better suited for this.
* **batch + streaming feature engineering**: Feast primarily processes already transformed feature values (though it offers experimental light-weight transformations). Users usually integrate Feast with upstream systems (e.g. existing ETL/ELT pipelines). [Tecton](http://tecton.ai/) is a more fully featured feature platform which addresses these needs.
* **native streaming feature integration:** Feast enables users to push streaming features, but does not pull from streaming sources or manage streaming pipelines. [Tecton](http://tecton.ai/) is a more fully featured feature platform which orchestrates end to end streaming pipelines.
* **feature sharing**: Feast has experimental functionality to enable discovery and cataloguing of feature metadata with a [Feast web UI (alpha)](https://docs.feast.dev/reference/alpha-web-ui). Feast also has community contributed plugins with [DataHub](https://datahubproject.io/docs/generated/ingestion/sources/feast/) and [Amundsen](https://github.com/amundsen-io/amundsen/blob/4a9d60176767c4d68d1cad5b093320ea22e26a49/databuilder/databuilder/extractor/feast\_extractor.py). [Tecton](http://tecton.ai/) also more robustly addresses these needs.
* **lineage:** Feast helps tie feature values to model versions, but is not a complete solution for capturing end-to-end lineage from raw data sources to model versions. Feast also has community contributed plugins with [DataHub](https://datahubproject.io/docs/generated/ingestion/sources/feast/) and [Amundsen](https://github.com/amundsen-io/amundsen/blob/4a9d60176767c4d68d1cad5b093320ea22e26a49/databuilder/databuilder/extractor/feast\_extractor.py). [Tecton](http://tecton.ai/) captures more end-to-end lineage by also managing feature transformations.
* **batch + streaming feature engineering**: Feast primarily processes already transformed feature values but is investing in supporting batch and streaming transformations.
* **native streaming feature integration:** Feast enables users to push streaming features, but does not pull from streaming sources or manage streaming pipelines.
* **feature sharing**: Feast has experimental functionality to enable discovery and cataloguing of feature metadata with a [Feast web UI (alpha)](https://docs.feast.dev/reference/alpha-web-ui). Feast also has community contributed plugins with [DataHub](https://datahubproject.io/docs/generated/ingestion/sources/feast/) and [Amundsen](https://github.com/amundsen-io/amundsen/blob/4a9d60176767c4d68d1cad5b093320ea22e26a49/databuilder/databuilder/extractor/feast\_extractor.py).
* **lineage:** Feast helps tie feature values to model versions, but is not a complete solution for capturing end-to-end lineage from raw data sources to model versions. Feast also has community contributed plugins with [DataHub](https://datahubproject.io/docs/generated/ingestion/sources/feast/) and [Amundsen](https://github.com/amundsen-io/amundsen/blob/4a9d60176767c4d68d1cad5b093320ea22e26a49/databuilder/databuilder/extractor/feast\_extractor.py).
* **data quality / drift detection**: Feast has experimental integrations with [Great Expectations](https://greatexpectations.io/), but is not purpose built to solve data drift / data quality issues. This requires more sophisticated monitoring across data pipelines, served feature values, labels, and model versions.

## Example use cases
Expand Down
8 changes: 5 additions & 3 deletions docs/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@
* [Azure Synapse + Azure SQL (contrib)](reference/data-sources/mssql.md)
* [Offline stores](reference/offline-stores/README.md)
* [Overview](reference/offline-stores/overview.md)
* [File](reference/offline-stores/file.md)
* [Dask](reference/offline-stores/dask.md)
* [Snowflake](reference/offline-stores/snowflake.md)
* [BigQuery](reference/offline-stores/bigquery.md)
* [Redshift](reference/offline-stores/redshift.md)
Expand All @@ -103,6 +103,7 @@
* [Rockset (contrib)](reference/online-stores/rockset.md)
* [Hazelcast (contrib)](reference/online-stores/hazelcast.md)
* [ScyllaDB (contrib)](reference/online-stores/scylladb.md)
* [SingleStore (contrib)](reference/online-stores/singlestore.md)
* [Providers](reference/providers/README.md)
* [Local](reference/providers/local.md)
* [Google Cloud Platform](reference/providers/google-cloud-platform.md)
Expand All @@ -118,9 +119,10 @@
* [Feature servers](reference/feature-servers/README.md)
* [Python feature server](reference/feature-servers/python-feature-server.md)
* [\[Alpha\] Go feature server](reference/feature-servers/go-feature-server.md)
* [Offline Feature Server](reference/feature-servers/offline-feature-server)
* [Offline Feature Server](reference/feature-servers/offline-feature-server.md)
* [\[Beta\] Web UI](reference/alpha-web-ui.md)
* [\[Alpha\] On demand feature view](reference/alpha-on-demand-feature-view.md)
* [\[Beta\] On demand feature view](reference/beta-on-demand-feature-view.md)
* [\[Alpha\] Vector Database](reference/alpha-vector-database.md)
* [\[Alpha\] Data quality monitoring](reference/dqm.md)
* [Feast CLI reference](reference/feast-cli-commands.md)
* [Python API reference](http://rtd.feast.dev)
Expand Down
1 change: 0 additions & 1 deletion docs/reference/feature-servers/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@ Feast users can choose to retrieve features from a feature server, as opposed to

{% content-ref url="go-feature-server.md" %}
[go-feature-server.md](go-feature-server.md)
=======
{% endcontent-ref %}

{% content-ref url="offline-feature-server.md" %}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
# File offline store
# Dask offline store

## Description

The file offline store provides support for reading [FileSources](../data-sources/file.md).
It uses Dask as the compute engine.
The Dask offline store provides support for reading [FileSources](../data-sources/file.md).

{% hint style="warning" %}
All data is downloaded and joined using Python and therefore may not scale to production workloads.
Expand All @@ -17,28 +16,28 @@ project: my_feature_repo
registry: data/registry.db
provider: local
offline_store:
type: file
type: dask
```
{% endcode %}
The full set of configuration options is available in [FileOfflineStoreConfig](https://rtd.feast.dev/en/latest/#feast.infra.offline_stores.file.FileOfflineStoreConfig).
The full set of configuration options is available in [DaskOfflineStoreConfig](https://rtd.feast.dev/en/latest/#feast.infra.offline_stores.dask.DaskOfflineStoreConfig).
## Functionality Matrix
The set of functionality supported by offline stores is described in detail [here](overview.md#functionality).
Below is a matrix indicating which functionality is supported by the file offline store.
Below is a matrix indicating which functionality is supported by the dask offline store.
| | File |
| | Dask |
| :-------------------------------- | :-- |
| `get_historical_features` (point-in-time correct join) | yes |
| `pull_latest_from_table_or_query` (retrieve latest feature values) | yes |
| `pull_all_from_table_or_query` (retrieve a saved dataset) | yes |
| `offline_write_batch` (persist dataframes to offline store) | yes |
| `write_logged_features` (persist logged features to offline store) | yes |

Below is a matrix indicating which functionality is supported by `FileRetrievalJob`.
Below is a matrix indicating which functionality is supported by `DaskRetrievalJob`.

| | File |
| | Dask |
| --------------------------------- | --- |
| export to dataframe | yes |
| export to arrow table | yes |
Expand Down
6 changes: 3 additions & 3 deletions docs/reference/offline-stores/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,13 @@ The first three of these methods all return a `RetrievalJob` specific to an offl

## Functionality Matrix

There are currently four core offline store implementations: `FileOfflineStore`, `BigQueryOfflineStore`, `SnowflakeOfflineStore`, and `RedshiftOfflineStore`.
There are currently four core offline store implementations: `DaskOfflineStore`, `BigQueryOfflineStore`, `SnowflakeOfflineStore`, and `RedshiftOfflineStore`.
There are several additional implementations contributed by the Feast community (`PostgreSQLOfflineStore`, `SparkOfflineStore`, and `TrinoOfflineStore`), which are not guaranteed to be stable or to match the functionality of the core implementations.
Details for each specific offline store, such as how to configure it in a `feature_store.yaml`, can be found [here](README.md).

Below is a matrix indicating which offline stores support which methods.

| | File | BigQuery | Snowflake | Redshift | Postgres | Spark | Trino |
| | Dask | BigQuery | Snowflake | Redshift | Postgres | Spark | Trino |
| :-------------------------------- | :-- | :-- | :-- | :-- | :-- | :-- | :-- |
| `get_historical_features` | yes | yes | yes | yes | yes | yes | yes |
| `pull_latest_from_table_or_query` | yes | yes | yes | yes | yes | yes | yes |
Expand All @@ -42,7 +42,7 @@ Below is a matrix indicating which offline stores support which methods.

Below is a matrix indicating which `RetrievalJob`s support what functionality.

| | File | BigQuery | Snowflake | Redshift | Postgres | Spark | Trino | DuckDB |
| | Dask | BigQuery | Snowflake | Redshift | Postgres | Spark | Trino | DuckDB |
| --------------------------------- | --- | --- | --- | --- | --- | --- | --- | --- |
| export to dataframe | yes | yes | yes | yes | yes | yes | yes | yes |
| export to arrow table | yes | yes | yes | yes | yes | yes | yes | yes |
Expand Down
3 changes: 3 additions & 0 deletions docs/reference/online-stores/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,4 +64,7 @@ Please see [Online Store](../../getting-started/architecture-and-components/onli

{% content-ref url="remote.md" %}
[remote.md](remote.md)

{% content-ref url="singlestore.md" %}
[singlestore.md](singlestore.md)
{% endcontent-ref %}
51 changes: 51 additions & 0 deletions docs/reference/online-stores/singlestore.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# SingleStore online store (contrib)

## Description

The SingleStore online store provides support for materializing feature values into a SingleStore database for serving online features.

## Getting started
In order to use this online store, you'll need to run `pip install 'feast[singlestore]'`. You can get started by then running `feast init` and then setting the `feature_store.yaml` as described below.

## Example

{% code title="feature_store.yaml" %}
```yaml
project: my_feature_repo
registry: data/registry.db
provider: local
online_store:
type: singlestore
host: DB_HOST
port: DB_PORT
database: DB_NAME
user: DB_USERNAME
password: DB_PASSWORD
```
{% endcode %}
## Functionality Matrix
The set of functionality supported by online stores is described in detail [here](overview.md#functionality).
Below is a matrix indicating which functionality is supported by the SingleStore online store.
| | SingleStore |
| :-------------------------------------------------------- | :----------- |
| write feature values to the online store | yes |
| read feature values from the online store | yes |
| update infrastructure (e.g. tables) in the online store | yes |
| teardown infrastructure (e.g. tables) in the online store | yes |
| generate a plan of infrastructure changes | no |
| support for on-demand transforms | yes |
| readable by Python SDK | yes |
| readable by Java | no |
| readable by Go | no |
| support for entityless feature views | yes |
| support for concurrent writing to the same key | no |
| support for ttl (time to live) at retrieval | no |
| support for deleting expired data | no |
| collocated by feature view | yes |
| collocated by feature service | no |
| collocated by entity key | no |
To compare this set of functionality against other online stores, please see the full [functionality matrix](overview.md#functionality-matrix).
2 changes: 1 addition & 1 deletion docs/tutorials/using-scalable-registry.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ When this happens, your database is likely using what is referred to as an
in `SQLAlchemy` terminology. See your database's documentation for examples on
how to set its scheme in the Database URL.

`Psycopg2`, which is the database library leveraged by the online and offline
`Psycopg`, which is the database library leveraged by the online and offline
stores, is not impacted by the need to speak a particular dialect, and so the
following only applies to the registry.

Expand Down
Loading

0 comments on commit 99f332a

Please sign in to comment.