Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] master from kserve:master #405

Open
wants to merge 67 commits into
base: master
Choose a base branch
from
Open

Conversation

pull[bot]
Copy link

@pull pull bot commented Sep 12, 2024

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

calwoo and others added 30 commits June 21, 2024 06:18
* propagate trc bool across vllm init

Signed-off-by: Calvin Woo <[email protected]>
Signed-off-by: calvin d. woo <[email protected]>

* use args directly to avoid undefined var

Signed-off-by: Calvin Woo <[email protected]>
Signed-off-by: calvin d. woo <[email protected]>

* Remove trailing space

Signed-off-by: Dan Sun <[email protected]>
Signed-off-by: calvin d. woo <[email protected]>

* move params to newline

Signed-off-by: calvin d. woo <[email protected]>

---------

Signed-off-by: Calvin Woo <[email protected]>
Signed-off-by: calvin d. woo <[email protected]>
Signed-off-by: Dan Sun <[email protected]>
Co-authored-by: Dan Sun <[email protected]>
The KServe Python SDK README.md uses relative URLs that work well on GitHub but return a 404 error when visited on PyPI.

This change updates the README.md to use absolute URLs that work well on both GitHub and PyPI.

Signed-off-by: kevinbazira <[email protected]>
check empty model final.

Signed-off-by: HAO <[email protected]>
Co-authored-by: koshino17 <[email protected]>
* Fix No model ready error in multi model serving

- Fixes the regression introduced by #3275

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Mark transformer model ready in init method

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

---------

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
* Initial implementation of inference client

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Add tests

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Use Inference client for e2e tests

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

Upgrade pytest-asyncio to 0.23.4

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

Fix mutable object initialization in default parameters

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

Fix graph e2e tests

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

Fix pmml test

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Add explain, support dict response, use inference client for internal requests

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

Fix inference graph test and grpc headers

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

Remove v1 datamodels

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Introduce protocol in client config

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Support inference graph

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

remove logging configs

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

Update default timeout to 60 seconds

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Add retry config for grpc client

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

Fix infer model_name parameter

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Add tests for graph endpoint

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

debug

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

fix http client param mismatch

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

skip graph test

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

fix timeout in grpc client

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

Fix url construction

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

Fix explain

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* configure logger for e2e tests

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

Rebase master

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

Rebase master

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

Fix grpc retry config

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

Increase request timeout

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* configure logger for e2e tests

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

Rebase master

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

Rebase master

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

Fix grpc retry config

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

Increase request timeout

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Rebase

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Use fixtures for rest client

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

---------

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
* Fix model name not properly parsed by inference graph

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Handle single string arg with excess whitespace

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Handle duplicate arguments

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

---------

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
Signed-off-by: Dan Sun <[email protected]>
Co-authored-by: Dan Sun <[email protected]>
empty commit

Signed-off-by: Spolti <[email protected]>
Use add_generation_rompt for chat template

Signed-off-by: Dattu Sharma <[email protected]>
* Deduplicate the names for the additional domain names

Signed-off-by: Vincent Hou <[email protected]>

* Refactoring the functions

Signed-off-by: Vincent Hou <[email protected]>

---------

Signed-off-by: Vincent Hou <[email protected]>
virtual service case insensitive

Signed-off-by: Andrews Arokiam <[email protected]>
* Install packages needed for model load

Signed-off-by: Gavrish Prabhu <[email protected]>

* make all apt get into a single line

Signed-off-by: Gavrish Prabhu <[email protected]>

---------

Signed-off-by: Gavrish Prabhu <[email protected]>
…3789)

* Add readiness probe for mlserver in CI

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Increase memory limit for pmml test to prevent OOMKilled and read timeout error

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

---------

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
* Fix logprobs

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Fix a scenario where stream completion fails if echo is true and logprobs is nil

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Fix a scenario where completion fails if the prompt is token_ids and echo is set to true

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Respect tokenizer revision

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Add workaround for adding None to token_logprobs and top_logprobs

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

---------

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
agent watcher unit test is always flaky so increase timeout to make it stable

Signed-off-by: jooho lee <[email protected]>
Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
* Add tests for vLLM

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* resolve comments

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Uncomment tests for fixed bugs

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

---------

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
….3 (#3812)

* Upgrade serving runtime python version to 3.11 and debian to bookworm

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Upgrade poetry to 1.8.3

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Upgrade openjdk to 17 for pmml

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Fix 'AS' casing warning

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Fix pmml server

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

---------

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
* Bump vLLM to 0.5.3.post1

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Update makefile

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* approx probability comparison

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Set multiprocessing method to spawn

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

---------

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
…se 'spawn' for mutiprocessing (#3757)

* Refactor model server to let uvicorn handle multiple workers

- Refactored the ModelServer to let uvicorn handle multiple workers. This will remove the bottleneck of using 'fork' for multiprocessing

- Make FastAPI app instance easily accessible across the project so that users can easily add middlewares and custom exception handlers for custom models.

- Use uvloop eventpolicy

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Add middleware example

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Add e2e test

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Remove nest_asyncio in art explainer

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Remove uvloop

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Rebase master

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Fix python tests

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* revert art explainer

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Remove monkeypatch

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Remove redundant future exception logging

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

---------

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
* Make ray serve an optional dependency

Signed-off-by: Curtis Maddalozzo <[email protected]>
Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Unify the log configuration using kserve logger (#3577)

* Configure logging for serving runtimes

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Add pyyaml dependency

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* black format

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* fix pyproject.toml

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Rebase master

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* cleanup logger for e2e

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Modify logger format to include func name

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Log model download time.

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Rebase master

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Allow disabling logger configuration and deprecate logger related arg in model server

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Rebase master

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Resolve comments

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* pyyaml=^6.0.0 to fix build failure

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Remove logger related parameters from model server

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

---------

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* import model_server

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Fix lint

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Fix linting

Signed-off-by: Curtis Maddalozzo <[email protected]>
Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Rebase, minor fixes and add e2e test

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

---------

Signed-off-by: Curtis Maddalozzo <[email protected]>
Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
Co-authored-by: Curtis Maddalozzo <[email protected]>
Co-authored-by: Dan Sun <[email protected]>
* Update aif example

chore:	Update aif explainer example.
	- Bump KServer to 0.13.0, it will bring some library updates, plus, it fixes a few security alerts in this example.
	- update the scikit-learn package name

Signed-off-by: Spolti <[email protected]>

* move the local instructions to the README

Signed-off-by: Spolti <[email protected]>

* empty commit

Signed-off-by: Spolti <[email protected]>

---------

Signed-off-by: Spolti <[email protected]>
Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
These changes introduce the possibility to configure KServe with its own Istio local gateway, to partially decouple KServe from the Knative local gateway.

Typically, it is OK to re-use the already configured Knative local gateway for KServe uses (as long as configs do not conflict). However, there are cases where having a dedicated local gateway for KServe is beneficial. Just to give some examples:
* To have the ability to use strict mTLS in Istio
* To reduce some pressure on the Knative local gateway by having a dedicated gateway deployment (it still would hit Knative gateway, but only once, rather than twice)
* To be able to configure TLS on cluster-local hostnames (Knative support is still experimental)

To have a dedicated Gateway in KServe, similar configurations to Knative are need to be done. At the very least, and if not having a dedicated gateway deployment, a v1/Service and an Istio Gateway resource need to be created for KServe. Such resources would need to be configured in _localGateway_ and _localGatewayService_. KServe still needs to rely on Knative routing for the KSVCs it creates. Thus, after handling an incoming request and resolving its target, it needs to be forwarded to be handled by Knative. This is the reason for introducing a new `knativeLocalGatewayService` in the ConfigMap.

The removed `ingressService` seems to be unused. Apparently, it became unused when the v1alpa1 API of the InferenceServices was deprecated and removed.

Signed-off-by: Edgar Hernández <[email protected]>
* Add support for Azure DNS zone endpoints

Signed-off-by: tjandy98 <[email protected]>

* Add test cases for Azure Blob and File Share URI pattern matching

Signed-off-by: tjandy98 <[email protected]>

* flake8

Signed-off-by: tjandy98 <[email protected]>

* black

Signed-off-by: tjandy98 <[email protected]>

---------

Signed-off-by: tjandy98 <[email protected]>
* Add logging request feature for vLLM

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Add log request feature for huggingface

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

---------

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
@pull pull bot added ⤵️ pull merge-conflict Resolve conflicts manually labels Sep 12, 2024
@openshift-merge-robot
Copy link

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@terrytangyuan
Copy link
Member

Rejecting this PR since it's 405.

* bump to vllm 0.6.0

Signed-off-by: yxia216 <[email protected]>

* lock

Signed-off-by: yxia216 <[email protected]>

---------

Signed-off-by: yxia216 <[email protected]>
@pull pull bot removed the needs-rebase label Sep 15, 2024
…on (#3885)

* Set the volume mount's readonly annotation based on the ISVC annotation

Signed-off-by: Hannah DeFazio <[email protected]>

* Add test case where readonly is unset, check values

Signed-off-by: Hannah DeFazio <[email protected]>

* Use StorageInitializerVolumeName constant

Signed-off-by: Hannah DeFazio <[email protected]>

* Set the readonly value for the storage-initializer

Signed-off-by: Hannah DeFazio <[email protected]>

* Add tests for direct pvc volume mount use case

Signed-off-by: Hannah DeFazio <[email protected]>

---------

Signed-off-by: Hannah DeFazio <[email protected]>
Co-authored-by: Spolti <[email protected]>
@openshift-merge-robot
Copy link

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@pull pull bot removed the needs-rebase label Sep 15, 2024
* add /dev/shm volume to hfserver.

Signed-off-by: Lize Cai <[email protected]>

* update helm chart docs

Signed-off-by: Lize Cai <[email protected]>

* add flag to enable devshm.

Signed-off-by: Lize Cai <[email protected]>

---------

Signed-off-by: Lize Cai <[email protected]>
@openshift-merge-robot
Copy link

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
@openshift-merge-robot
Copy link

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

greenmoon55 and others added 2 commits September 17, 2024 11:05
* new model cache cr

Signed-off-by: Jin Dong <[email protected]>

* update crd

Signed-off-by: Jin Dong <[email protected]>

* Fix genereted python tests

Signed-off-by: Jin Dong <[email protected]>

* Fix test failure

Signed-off-by: Jin Dong <[email protected]>

* Make nodegroup a list field in model cache cr

Signed-off-by: Jin Dong <[email protected]>

* fix lint

Signed-off-by: Jin Dong <[email protected]>

* minor updates to model cache cr

Signed-off-by: Jin Dong <[email protected]>

* Add usecase field to cluster storage container

Signed-off-by: Jin Dong <[email protected]>

* Fix test failures

Signed-off-by: Jin Dong <[email protected]>

* Change variable name

Signed-off-by: Jin Dong <[email protected]>

* Fix lint

Signed-off-by: Jin Dong <[email protected]>

* Fix default storage container cr

Signed-off-by: Jin Dong <[email protected]>

* fix defualt.yaml

Signed-off-by: Jin Dong <[email protected]>

* Remove storagelimit field from node group

Signed-off-by: Jin Dong <[email protected]>

* Fix python code

Signed-off-by: Jin Dong <[email protected]>

* Change some fields

Signed-off-by: Jin Dong <[email protected]>

* Rename crd

Signed-off-by: Jin Dong <[email protected]>

* Fix lint error in python test files

Signed-off-by: Jin Dong <[email protected]>

* Rename CR

Signed-off-by: Jin Dong <[email protected]>

* Add status to local model node group

Signed-off-by: Jin Dong <[email protected]>

* Add missing node status

Signed-off-by: Jin Dong <[email protected]>

* Remove files related to ClusterLocalNodeGroup

Signed-off-by: Jin Dong <[email protected]>

* Add default value for workload type

Signed-off-by: Jin Dong <[email protected]>

* Fix StorageContainerSpec WorkloadType default value

Signed-off-by: Jin Dong <[email protected]>

* nodegroups -> nodegroup

Signed-off-by: Jin Dong <[email protected]>

* Add comments

Signed-off-by: Jin Dong <[email protected]>

* Add back storageLimit

Signed-off-by: Jin Dong <[email protected]>

* Update charts/kserve-crd/templates/serving.kserve.io_clusterstoragecontainers.yaml

Signed-off-by: Jin Dong <[email protected]>

---------

Signed-off-by: Jin Dong <[email protected]>
@openshift-merge-robot
Copy link

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-merge-robot
Copy link

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

sivanantha321 and others added 4 commits September 18, 2024 22:21
Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
* Initial TLS bundle support

Signed-off-by: Rui Vieira <[email protected]>

* Move CA bundle volume and mount path to constants

Signed-off-by: Rui Vieira <[email protected]>

* Rename loggerConfigTls to loggerTLSConfig

Signed-off-by: Rui Vieira <[email protected]>

* Rename TlsCertName to CertName

Signed-off-by: Rui Vieira <[email protected]>

* Add Logger option skip TLS verification

Also:
- Fixed incorrect cert name argument name (`--log-tls-cert` is now `--logger-ca-cert-file`)

Signed-off-by: Rui Vieira <[email protected]>

* Correct case

Change CABundle and CACertfile to caBundle and caCertFile.

Signed-off-by: Rui Vieira <[email protected]>

* Fix linting errors

- Restore newline at the end of charts/kserve-resources/README.md
- Remove import of github.com/kserve/kserve/pkg/constants from `worker.go` and replace with local constant for the CA mount path
- `InsecureSkipVerify: logReq.TlsSkipVerify` was triggering gosec's G402 with "potential 'true' for `logReq.TlsSkipVerify`". Since this value is allowed to be true, this specific line was excluded from the checks and an explanatory comment added
- Remove import of `k8s.io/utils/ptr` and replace with a pointer `&optionalVolume`

Signed-off-by: Rui Vieira <[email protected]>

* Fix import sort order on `worker.go`

Signed-off-by: Rui Vieira <[email protected]>

---------

Signed-off-by: Rui Vieira <[email protected]>
* Fix explainer not working with path based routing

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Add test

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Add explainer e2e test for path based routing

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Rebase master

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

---------

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
…3944)

Fix broken ingress test and update go mod

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: New/Backlog
Development

Successfully merging this pull request may close these issues.