Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Apply cache to load proto registry for performance #3702

Merged
merged 3 commits into from
Sep 12, 2023
Merged

feat: Apply cache to load proto registry for performance #3702

merged 3 commits into from
Sep 12, 2023

Conversation

phil-park
Copy link
Contributor

@phil-park phil-park commented Jul 28, 2023

What this PR does / why we need it:
If you use File Registry, the registry file becomes heavy because materialize records are accumulated while a long period.
Even if the registry cache is enabled, performance degraded because each online feature lookup traverses the registry object and searches the meta.

As long as the registry_proto object does not change, you must avoid multiple lookups through the function-level cache.

Which issue(s) this PR fixes:

Fixes #3649 #3597

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: phil-park
To complete the pull request process, please assign zhilingc after the PR has been reviewed.
You can assign the PR to them by writing /assign @zhilingc in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@phil-park phil-park changed the title Apply lru_cache to retrieve registry for performance feat: Apply lru_cache to retrieve regfeistry for performance Jul 28, 2023
@phil-park phil-park changed the title feat: Apply lru_cache to retrieve regfeistry for performance feat: Apply lru_cache to retrieve registry for performance Jul 28, 2023
@phil-park phil-park changed the title feat: Apply lru_cache to retrieve registry for performance feat: Apply cache to load proto registry for performance Jul 28, 2023
@phil-park
Copy link
Contributor Author

/assign @zhilingc

Comment on lines +39 to +40
else:
cache_value = func(registry_proto, project)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you need to update cache_key here as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@achals I'm sorry, you're right. I added commit b8a373f

Comment on lines +27 to +29
def registry_proto_cache(func):
cache_key = None
cache_value = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand this.. If you use the same annotation for multiple methods then the cache value needs have different shapes, such as List[FeatureService] or List[FeatureView] right? Is that going to work as expected here? Can we add tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The context of the wrapper decorator is created whenever different argument functions (e.g list_feature_views, list_feature_services..) are called.
Therefore, when calling list_feature_views and when calling list_feature_services, different results are returned.
The operation of this cache is similar to Python's functools.lru_cache, but since the registryProto object is not hashable, it is implemented to hash using the object's id.

@achals
Copy link
Member

achals commented Aug 29, 2023

Okay, can you also sign off your commits? https://github.com/feast-dev/feast/pull/3702/checks?check_run_id=16298395815

@phil-park
Copy link
Contributor Author

I signed off the commit. Thank you for your review.

@phil-park
Copy link
Contributor Author

@achals Does this PR need further review?

Copy link
Collaborator

@felixwang9817 felixwang9817 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@felixwang9817 felixwang9817 merged commit 709c709 into feast-dev:master Sep 12, 2023
21 of 22 checks passed
james-crabtree-sp pushed a commit to sailpoint/feast that referenced this pull request Sep 14, 2023
)

* fix: Remove unused parameter

Signed-off-by: Jiwon Park <[email protected]>

* feat: Apply cache to load proto registry for performance

Signed-off-by: Jiwon Park <[email protected]>

* fix: Fix update key when cache missed

Signed-off-by: phil-park <[email protected]>

---------

Signed-off-by: Jiwon Park <[email protected]>
Signed-off-by: phil-park <[email protected]>
james-crabtree-sp pushed a commit to sailpoint/feast that referenced this pull request Sep 14, 2023
)

* fix: Remove unused parameter

Signed-off-by: Jiwon Park <[email protected]>

* feat: Apply cache to load proto registry for performance

Signed-off-by: Jiwon Park <[email protected]>

* fix: Fix update key when cache missed

Signed-off-by: phil-park <[email protected]>

---------

Signed-off-by: Jiwon Park <[email protected]>
Signed-off-by: phil-park <[email protected]>
JohnLemmonMedely pushed a commit to JohnLemmonMedely/feast that referenced this pull request Sep 15, 2023
)

* fix: Remove unused parameter

Signed-off-by: Jiwon Park <[email protected]>

* feat: Apply cache to load proto registry for performance

Signed-off-by: Jiwon Park <[email protected]>

* fix: Fix update key when cache missed

Signed-off-by: phil-park <[email protected]>

---------

Signed-off-by: Jiwon Park <[email protected]>
Signed-off-by: phil-park <[email protected]>
woop pushed a commit that referenced this pull request Jan 13, 2024
# [0.35.0](v0.34.0...v0.35.0) (2024-01-13)

### Bug Fixes

* Add async refresh to prevent synchronous refresh in main thread ([#3812](#3812)) ([9583ed6](9583ed6))
* Adopt connection pooling for HBase ([#3793](#3793)) ([b3852bf](b3852bf))
* Bytewax engine create configmap from object ([#3821](#3821)) ([25e9775](25e9775))
* Fix warnings from deprecated paths and update default log level ([#3757](#3757)) ([68a8737](68a8737))
* improve parsing bytewax job status ([5983f40](5983f40))
* make bytewax settings unexposed ([ae1bb8b](ae1bb8b))
* Make generated temp table name escaped ([#3797](#3797)) ([175d796](175d796))
* Pin numpy version to avoid spammy deprecation messages ([774ed33](774ed33))
* Redundant feature materialization and premature incremental materialization timestamp updates ([#3789](#3789)) ([417b16b](417b16b)), closes [#6](#6) [#7](#7)
* Resolve hbase hotspot issue when materializing ([#3790](#3790)) ([7376db8](7376db8))
* Set keepalives_idle None by default ([#3756](#3756)) ([8717e9b](8717e9b))
* Set upper bound for bigquery client due to its breaking changes ([2151c39](2151c39))
* UI project cannot handle fallback routes ([#3766](#3766)) ([96ece0f](96ece0f))
* update dependencies versions due to conflicts ([5dc0b24](5dc0b24))
* Update jackson and remove unnecessary logging ([#3809](#3809)) ([018d0ea](018d0ea))
* upgrade the pyarrow to latest v14.0.1 for CVE-2023-47248. ([052182b](052182b))

### Features

* Add get online feature rpc to gprc server ([#3815](#3815)) ([01db8cc](01db8cc))
* Add materialize and materialize-incremental rest endpoints ([#3761](#3761)) ([fa600fe](fa600fe)), closes [#3760](#3760)
* add redis sentinel support ([3387a15](3387a15))
* add redis sentinel support ([4337c89](4337c89))
* add redis sentinel support format lint ([aad8718](aad8718))
* Add support for `table_create_disposition` in bigquery job for offline store ([#3762](#3762)) ([6a728fe](6a728fe))
* Add support for in_cluster config and additional labels for bytewax materialization ([#3754](#3754)) ([2192e65](2192e65))
* Apply cache to load proto registry for performance ([#3702](#3702)) ([709c709](709c709))
* Make bytewax job write as mini-batches ([#3777](#3777)) ([9b0e5ce](9b0e5ce))
* Optimize bytewax pod resource with zero-copy ([9cf9d96](9cf9d96))
* Support GCS filesystem for bytewax engine ([#3774](#3774)) ([fb6b807](fb6b807))
tokoko pushed a commit to tokoko/feast that referenced this pull request Feb 6, 2024
# [0.35.0](feast-dev/feast@v0.34.0...v0.35.0) (2024-01-13)

### Bug Fixes

* Add async refresh to prevent synchronous refresh in main thread ([feast-dev#3812](feast-dev#3812)) ([9583ed6](feast-dev@9583ed6))
* Adopt connection pooling for HBase ([feast-dev#3793](feast-dev#3793)) ([b3852bf](feast-dev@b3852bf))
* Bytewax engine create configmap from object ([feast-dev#3821](feast-dev#3821)) ([25e9775](feast-dev@25e9775))
* Fix warnings from deprecated paths and update default log level ([feast-dev#3757](feast-dev#3757)) ([68a8737](feast-dev@68a8737))
* improve parsing bytewax job status ([5983f40](feast-dev@5983f40))
* make bytewax settings unexposed ([ae1bb8b](feast-dev@ae1bb8b))
* Make generated temp table name escaped ([feast-dev#3797](feast-dev#3797)) ([175d796](feast-dev@175d796))
* Pin numpy version to avoid spammy deprecation messages ([774ed33](feast-dev@774ed33))
* Redundant feature materialization and premature incremental materialization timestamp updates ([feast-dev#3789](feast-dev#3789)) ([417b16b](feast-dev@417b16b)), closes [feast-dev#6](feast-dev#6) [feast-dev#7](feast-dev#7)
* Resolve hbase hotspot issue when materializing ([feast-dev#3790](feast-dev#3790)) ([7376db8](feast-dev@7376db8))
* Set keepalives_idle None by default ([feast-dev#3756](feast-dev#3756)) ([8717e9b](feast-dev@8717e9b))
* Set upper bound for bigquery client due to its breaking changes ([2151c39](feast-dev@2151c39))
* UI project cannot handle fallback routes ([feast-dev#3766](feast-dev#3766)) ([96ece0f](feast-dev@96ece0f))
* update dependencies versions due to conflicts ([5dc0b24](feast-dev@5dc0b24))
* Update jackson and remove unnecessary logging ([feast-dev#3809](feast-dev#3809)) ([018d0ea](feast-dev@018d0ea))
* upgrade the pyarrow to latest v14.0.1 for CVE-2023-47248. ([052182b](feast-dev@052182b))

### Features

* Add get online feature rpc to gprc server ([feast-dev#3815](feast-dev#3815)) ([01db8cc](feast-dev@01db8cc))
* Add materialize and materialize-incremental rest endpoints ([feast-dev#3761](feast-dev#3761)) ([fa600fe](feast-dev@fa600fe)), closes [feast-dev#3760](feast-dev#3760)
* add redis sentinel support ([3387a15](feast-dev@3387a15))
* add redis sentinel support ([4337c89](feast-dev@4337c89))
* add redis sentinel support format lint ([aad8718](feast-dev@aad8718))
* Add support for `table_create_disposition` in bigquery job for offline store ([feast-dev#3762](feast-dev#3762)) ([6a728fe](feast-dev@6a728fe))
* Add support for in_cluster config and additional labels for bytewax materialization ([feast-dev#3754](feast-dev#3754)) ([2192e65](feast-dev@2192e65))
* Apply cache to load proto registry for performance ([feast-dev#3702](feast-dev#3702)) ([709c709](feast-dev@709c709))
* Make bytewax job write as mini-batches ([feast-dev#3777](feast-dev#3777)) ([9b0e5ce](feast-dev@9b0e5ce))
* Optimize bytewax pod resource with zero-copy ([9cf9d96](feast-dev@9cf9d96))
* Support GCS filesystem for bytewax engine ([feast-dev#3774](feast-dev#3774)) ([fb6b807](feast-dev@fb6b807))

Signed-off-by: tokoko <[email protected]>
zseta pushed a commit to zseta/feast that referenced this pull request Feb 7, 2024
)

* fix: Remove unused parameter

Signed-off-by: Jiwon Park <[email protected]>

* feat: Apply cache to load proto registry for performance

Signed-off-by: Jiwon Park <[email protected]>

* fix: Fix update key when cache missed

Signed-off-by: phil-park <[email protected]>

---------

Signed-off-by: Jiwon Park <[email protected]>
Signed-off-by: phil-park <[email protected]>
Signed-off-by: Attila Toth <[email protected]>
zseta pushed a commit to zseta/feast that referenced this pull request Feb 7, 2024
# [0.35.0](feast-dev/feast@v0.34.0...v0.35.0) (2024-01-13)

### Bug Fixes

* Add async refresh to prevent synchronous refresh in main thread ([feast-dev#3812](feast-dev#3812)) ([9583ed6](feast-dev@9583ed6))
* Adopt connection pooling for HBase ([feast-dev#3793](feast-dev#3793)) ([b3852bf](feast-dev@b3852bf))
* Bytewax engine create configmap from object ([feast-dev#3821](feast-dev#3821)) ([25e9775](feast-dev@25e9775))
* Fix warnings from deprecated paths and update default log level ([feast-dev#3757](feast-dev#3757)) ([68a8737](feast-dev@68a8737))
* improve parsing bytewax job status ([5983f40](feast-dev@5983f40))
* make bytewax settings unexposed ([ae1bb8b](feast-dev@ae1bb8b))
* Make generated temp table name escaped ([feast-dev#3797](feast-dev#3797)) ([175d796](feast-dev@175d796))
* Pin numpy version to avoid spammy deprecation messages ([774ed33](feast-dev@774ed33))
* Redundant feature materialization and premature incremental materialization timestamp updates ([feast-dev#3789](feast-dev#3789)) ([417b16b](feast-dev@417b16b)), closes [feast-dev#6](feast-dev#6) [feast-dev#7](feast-dev#7)
* Resolve hbase hotspot issue when materializing ([feast-dev#3790](feast-dev#3790)) ([7376db8](feast-dev@7376db8))
* Set keepalives_idle None by default ([feast-dev#3756](feast-dev#3756)) ([8717e9b](feast-dev@8717e9b))
* Set upper bound for bigquery client due to its breaking changes ([2151c39](feast-dev@2151c39))
* UI project cannot handle fallback routes ([feast-dev#3766](feast-dev#3766)) ([96ece0f](feast-dev@96ece0f))
* update dependencies versions due to conflicts ([5dc0b24](feast-dev@5dc0b24))
* Update jackson and remove unnecessary logging ([feast-dev#3809](feast-dev#3809)) ([018d0ea](feast-dev@018d0ea))
* upgrade the pyarrow to latest v14.0.1 for CVE-2023-47248. ([052182b](feast-dev@052182b))

### Features

* Add get online feature rpc to gprc server ([feast-dev#3815](feast-dev#3815)) ([01db8cc](feast-dev@01db8cc))
* Add materialize and materialize-incremental rest endpoints ([feast-dev#3761](feast-dev#3761)) ([fa600fe](feast-dev@fa600fe)), closes [feast-dev#3760](feast-dev#3760)
* add redis sentinel support ([3387a15](feast-dev@3387a15))
* add redis sentinel support ([4337c89](feast-dev@4337c89))
* add redis sentinel support format lint ([aad8718](feast-dev@aad8718))
* Add support for `table_create_disposition` in bigquery job for offline store ([feast-dev#3762](feast-dev#3762)) ([6a728fe](feast-dev@6a728fe))
* Add support for in_cluster config and additional labels for bytewax materialization ([feast-dev#3754](feast-dev#3754)) ([2192e65](feast-dev@2192e65))
* Apply cache to load proto registry for performance ([feast-dev#3702](feast-dev#3702)) ([709c709](feast-dev@709c709))
* Make bytewax job write as mini-batches ([feast-dev#3777](feast-dev#3777)) ([9b0e5ce](feast-dev@9b0e5ce))
* Optimize bytewax pod resource with zero-copy ([9cf9d96](feast-dev@9cf9d96))
* Support GCS filesystem for bytewax engine ([feast-dev#3774](feast-dev#3774)) ([fb6b807](feast-dev@fb6b807))

Signed-off-by: Attila Toth <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

performance issues of getting online features related to parsing protobuf data
4 participants