[RayCluster][Fix] Add expectations of RayCluster #2150

Eikykun · 2024-05-16T13:39:50Z

Why are these changes needed?

This PR attempts to address issues #715 and #1936 by adding expectation capabilities to ensure the pod is in the desired state during the next Reconcile following pod deletion/creation.

Similar solutions can be referred to at:

Related issue number

[Bug] RayCluster controller operator does not handle stale informer cache #715
[Bug] Ray cluster terminates more worker pods than the amount of replica scale down requested #1936

Checks

I've made sure the tests are passing.
Testing Strategy
- Unit tests
- Manual tests
- This PR is not tested :(

…iation with Expectations

kevin85421 · 2024-05-18T23:06:49Z

Hi @Eikykun, thank you for the PR! I will review it next week. Are you on Ray Slack? We can iterate more quickly there since this is a large PR. My Slack handle is "Kai-Hsun Chen (ray team)". Thanks!

kevin85421 · 2024-05-28T05:10:48Z

I will review this PR tomorrow.

kevin85421 · 2024-05-29T03:58:22Z

cc @rueian Would you mind giving this PR a review? I think I don't have enough time to review it today. Thanks!

ray-operator/controllers/ray/expectations/active_expectation.go

rueian · 2024-05-29T13:18:43Z

ray-operator/controllers/ray/expectations/active_expectation.go

+	defer func() {
+		if satisfied {
+			ae.subjects.Delete(expectation)
+		}
+	}()
+
+	satisfied, err = expectation.(*ActiveExpectation).isSatisfied()


There are many read-after-write operations in the ActiveExpectations. Should we use a mutex to wrap these operations? For example, will the above ae.subjects.Delete(expectation) delete an unsatisfied expectation?

The subjects of ActiveExpectations are ThreadSafeStore, utilizing store provided by k8s.io/client-go/tools/cache. Therefore, operations on ActiveExpectations.subjects are thread-safe.
For ActiveExpectations.subjects items, the ActiveExpectation also utilizes a ThreadSafeStore.

rueian · 2024-05-29T13:21:40Z

ray-operator/controllers/ray/expectations/active_expectation.go

+		return fmt.Errorf("fail to get active expectation item for %s when expecting: %s", key, err)
+	}
+
+	ae.recordTimestamp = time.Now()


Should we use a mutex for updating the recordTimestamp?

Should we use a mutex for updating the recordTimestamp?

Thanks for your review. 😺

The use of a mutex for ActiveExpectation's recordTimestamp depends on the context of the ActiveExpectations. Currently, it is only employed in the controller's Reconcile func. Within the controller, multiple workers reconcile concurrently, but only one worker handles a given reconcile.Request at any given time.

Whether the recordTimestamp is used with a mutex in ActiveExpectation depends on its usage context. It is currently only used within the Reconcile func in the controller, where multiple workers are running in parallel. However, for the same ReconcileRequest, only a single worker handles it at any given time.

As a result, only one goroutine handles the ActiveExpectations associated with the same RayCluster, ensuring there won't be any concurrency issues with reading and writing.

However, this can cause issues if ActiveExpectations are used externally by something like an EventHandler. But we aren't seeing this use case currently.

Co-authored-by: Rueian <[email protected]> Signed-off-by: Chaer <[email protected]>

ray-operator/controllers/ray/raycluster_controller.go

rueian · 2024-05-30T15:02:19Z

Just wondering if the client-go's workqueue ensures that no more than one consumer can process an equivalent reconcile.Request at any given time, why don't we clear the related informer cache when needed?

Co-authored-by: Rueian <[email protected]> Signed-off-by: Chaer <[email protected]>

ray-operator/controllers/ray/raycluster_controller.go

Eikykun · 2024-06-03T04:12:54Z

Just wondering if the client-go's workqueue ensures that no more than one consumer can process an equivalent reconcile.Request at any given time, why don't we clear the related informer cache when needed?

Apologies, I'm not quite clear about what "related informer cache" refers to.

Signed-off-by: Chaer <[email protected]>

rueian · 2024-06-08T06:33:09Z

Just wondering if the client-go's workqueue ensures that no more than one consumer can process an equivalent reconcile.Request at any given time, why don't we clear the related informer cache when needed?

Apologies, I'm not quite clear about what "related informer cache" refers to.

According to #715, the root cause is the stale informer cache, so I am wondering if the issue can be solved by fixing the cache, for example doing a manual Resync somehow.

kevin85421 · 2024-06-11T06:20:48Z

I am reviewing this PR now. I will try to review this PR an iteration every 1 or 2 days.

kevin85421

I just reviewed a small part of this PR. I will try to do another iteration tomorrow.

ray-operator/controllers/ray/raycluster_controller.go

kevin85421 · 2024-06-12T07:39:08Z

ray-operator/controllers/ray/expectations/active_expectation.go

+		resource := ResourceInitializers[i.Kind]()
+		if err := i.Get(context.TODO(), types.NamespacedName{Namespace: namespace, Name: i.Name}, resource); err == nil {
+			return true, nil
+		} else if errors.IsNotFound(err) && i.RecordTimestamp.Add(30*time.Second).Before(time.Now()) {


What does this mean? Do you mean:

(1) The Pod is not found in the informer cache.
(2) KubeRay has already submitted a Create request to the K8s API server at t=RecordTimestamp. If the Create request was made more than 30 seconds ago, we assume it satisfies the expectation.

I can't understand (2). If we sent a request 30 seconds ago and the informer still hasn't received information about the Pod, there are two possibilities:

(a) There are delays between the K8s API server and the informer cache.

(b) The creation failed.

For case (a), it is OK for the function to say that the expectation is satisfied. However, for case (b), what will happen if the creation fails and we tell the KubeRay operator it is satisfied?

Case(b) may not occur here? Because only after the pod is successfully created, the pod is expected to be in the cache.

if err := r.Create(ctx, &pod); err != nil { return err } rayClusterExpectation.ExpectCreateHeadPod(key, pod.Namespace, pod.Name)

ray-operator/controllers/ray/expectations/active_expectation.go

kevin85421 · 2024-06-12T07:50:57Z

Btw, @Eikykun would you mind rebasing with the master branch and resolving the conflict? Thanks!

Eikykun · 2024-06-12T09:35:30Z

According to #715, the root cause is the stale informer cache, so I am wondering if the issue can be solved by fixing the cache, for example doing a manual Resync somehow.

Gotit. From a problem-solving standpoint, if we don't rely on an informer in the controller and directly query the ApiServer for pods, the cache consistency issue with etcd wouldn't occur. However, this approach would increase network traffic and affect reconciliation efficiency.
As far as I understand, the Resync() method in DeltaFIFO is not intended to ensure cache consistency with etcd, but rather to prevent event loss by means of periodic reconciliation.

Eikykun · 2024-06-12T09:41:05Z

Btw, @Eikykun would you mind rebasing with the master branch and resolving the conflict? Thanks!

thanks for your review, I will review the pr issue and resolve the conflicts later.

Signed-off-by: Chaer <[email protected]>

kevin85421 · 2024-06-12T17:14:45Z

@Eikykun would you mind installing pre-commit https://github.com/ray-project/kuberay/blob/master/ray-operator/DEVELOPMENT.md and fixing the linter issues? Thanks!

kevin85421

At a quick glance, it seems that we create an ActiveExpectationItem for each Pod's creation, deletion, or update. I have some concerns about the scalability bottleneck caused by the memory usage. In ReplicaSet's source code, it seems only track the number of Pods expect to be created or deleted per ReplicaSet.

kevin85421 · 2024-06-17T04:02:01Z

At a quick glance, it seems that we create an ActiveExpectationItem for each Pod's creation, deletion, or update. I have some concerns about the scalability bottleneck caused by the memory usage. In ReplicaSet's source code, it seems only track the number of Pods expect to be created or deleted per ReplicaSet.

Follow up for ^

Eikykun · 2024-06-18T04:34:30Z

At a quick glance, it seems that we create an ActiveExpectationItem for each Pod's creation, deletion, or update. I have some concerns about the scalability bottleneck caused by the memory usage. In ReplicaSet's source code, it seems only track the number of Pods expect to be created or deleted per ReplicaSet.

Sorry, I didn't have time to reply a few days ago.

ActiveExpectationItem is removed after fulfilling its expectations. Therefore, the memory usage depends on how many pods that are being created or deleted are not yet synchronized to the cache. It might not actually consume much memory? Also, ControllerExpectations caches each pod's UID: https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/controller_utils.go#L364
Therefore, I'm not quite sure which one is lighter, ActiveExpectationItem or ControllerExpectations.

I started with ControllerExpectations in RayCluster from the beginning. But I'm a bit unsure why I switched to ActiveExpectationItem; perhaps it was more complicated. ControllerExpectations requires using PodEventHandler to handle Observed logic. RayCluster needs to implement PodEventHandler logic separately.

Signed-off-by: Chaer <[email protected]>

Eikykun force-pushed the 240516-exp branch from 9457fb4 to 169770d Compare May 16, 2024 13:44

[RayCluster][Fix] Ensure resource validity during RayCluster reconcil…

10120e3

…iation with Expectations

Eikykun force-pushed the 240516-exp branch from 169770d to 10120e3 Compare May 16, 2024 13:46

kevin85421 self-assigned this May 16, 2024

kevin85421 self-requested a review May 16, 2024 17:58

rueian reviewed May 29, 2024

View reviewed changes

Eikykun and others added 2 commits May 30, 2024 10:30

Update ray-operator/controllers/ray/expectations/active_expectation.go

6d546d4

Co-authored-by: Rueian <[email protected]> Signed-off-by: Chaer <[email protected]>

delete unused var

5010d8a

rueian reviewed May 30, 2024

View reviewed changes

ray-operator/controllers/ray/raycluster_controller.go Outdated Show resolved Hide resolved

kevin85421 assigned rueian Jun 1, 2024

Update ray-operator/controllers/ray/raycluster_controller.go

aa9f9e6

Co-authored-by: Rueian <[email protected]> Signed-off-by: Chaer <[email protected]>

Eikykun commented Jun 3, 2024

View reviewed changes

ray-operator/controllers/ray/raycluster_controller.go Outdated Show resolved Hide resolved

Eikykun and others added 2 commits June 3, 2024 15:07

fix ut

c1261c6

Merge branch 'master' into 240516-exp

3d7a939

Signed-off-by: Chaer <[email protected]>

kevin85421 reviewed Jun 12, 2024

View reviewed changes

Merge branch 'master' into 240516-exp

f6232c9

Signed-off-by: Chaer <[email protected]>

kevin85421 reviewed Jun 13, 2024

View reviewed changes

issue fix

d3dd340

kevin85421 unassigned rueian Jun 24, 2024

Merge branch 'master' into 240516-exp

4e5b501

Signed-off-by: Chaer <[email protected]>

kevin85421 added the 1.3.0 label Sep 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RayCluster][Fix] Add expectations of RayCluster #2150

[RayCluster][Fix] Add expectations of RayCluster #2150

Eikykun commented May 16, 2024 •

edited

Loading

kevin85421 commented May 18, 2024

kevin85421 commented May 28, 2024

kevin85421 commented May 29, 2024

rueian May 29, 2024

Eikykun May 30, 2024

rueian May 30, 2024

rueian May 29, 2024

Eikykun May 30, 2024

rueian May 30, 2024

rueian commented May 30, 2024

Eikykun commented Jun 3, 2024

rueian commented Jun 8, 2024

kevin85421 commented Jun 11, 2024

kevin85421 left a comment

kevin85421 Jun 12, 2024

Eikykun Jun 13, 2024

kevin85421 commented Jun 12, 2024

Eikykun commented Jun 12, 2024

Eikykun commented Jun 12, 2024

kevin85421 commented Jun 12, 2024

kevin85421 left a comment

kevin85421 commented Jun 17, 2024

Eikykun commented Jun 18, 2024

[RayCluster][Fix] Add expectations of RayCluster #2150

Are you sure you want to change the base?

[RayCluster][Fix] Add expectations of RayCluster #2150

Conversation

Eikykun commented May 16, 2024 • edited Loading

Why are these changes needed?

Related issue number

Checks

kevin85421 commented May 18, 2024

kevin85421 commented May 28, 2024

kevin85421 commented May 29, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rueian commented May 30, 2024

Eikykun commented Jun 3, 2024

rueian commented Jun 8, 2024

kevin85421 commented Jun 11, 2024

kevin85421 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kevin85421 commented Jun 12, 2024

Eikykun commented Jun 12, 2024

Eikykun commented Jun 12, 2024

kevin85421 commented Jun 12, 2024

kevin85421 left a comment

Choose a reason for hiding this comment

kevin85421 commented Jun 17, 2024

Eikykun commented Jun 18, 2024

Eikykun commented May 16, 2024 •

edited

Loading