-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow force deletion of resources that have specific cleanup annotation #13869
Comments
I think this sounds great. Users occasionally (and occasionally very loudly) complain about not having an easy way to wipe away CRs like these when they no longer want the data. I think using an annotation is a good choice (compared to the CephCluster's The only concern I have is related to a more broad discussion about cleaning up CSI OMAP artifacts that are associated with some of these CRs. If CSI's OMAP resources are not stored in the same RADOS namespaces and/or CephFS SubvolGroups that go along with each CSI tenant, those cleanup operations will have to be managed by some other process. |
There is currently a PR in progress in the kubectl plugin to cleanup the OMAP details. If the plugin can do it, I would expect Rook could as well, but certainly need to understand that more... |
@BlaineEXE One thing we can do is create a job that takes care of the entire cleanup process and have Rook reconcile that job for each delete operation on annotated resources, and only after successful completion continue with the existing cleanup procedure |
Annotation can still be a security issue, what if it is added by mistake and we dont want to hinder rook operator for it as that might be critical and delete user data, We can have a job like |
This is a workable suggestion and I think it is always good to discuss alternate implementations for features. This helps us have clarity about what we want our user workflows to look like and how we want to balance different factors. Personally, I think that what Travis proposes is likely to be better for end users for a few reasons that are somewhat interrelated:
When taking all of these considerations into account, I think the annotation method has better clarity and simplicity for users, which enables them to make fewer mistakes if/when they decide to use it. |
Agreed on all those points, thanks @BlaineEXE. I'd propose we go with the approach to force deletion based on the annotation. The question then just becomes how to implement the forced cleanup for each type of resource. For subvolumegroups, we already have a PR in progress in the plugin for deleting stale subvolumes: rook/kubectl-rook-ceph#237. Finding stale subvolumes is very similar to this scenario of deleting all subvolumes in the subvolumegroup to be deleted. This should give a base implementation and we can look at sharing code in the operator for the force deletion scenario. |
+1 for this.
I think we should extend subvolumegroup CR to have an option to also create radosnamespace which will be used by cephcsi to maintain omap data. With isolation now at both subvolume level and omap level, we don't have to worry about identifying and cleaning up omap of specific subvolume from a subvolumegroup(which is rather very tedious). This approach and assumption needs to tested and verified to work before we extend subvolumegroup CR. |
Why not use the radosnamespace CR? https://github.com/rook/rook/blob/master/deploy/examples/radosnamespace.yaml |
you need rbd.radosnamespace section specified in the entry created by the svg CR for cephcsi to use the radosnamespace. |
While using a rados namespace sounds like the right approach, note that we will also need the ability to clean up existing clusters that don't have that change, so we shouldn't block the initial implementation of the cleanup on that feature. |
@travisn One suggestion would be use the existing That way, we can extend the existing cleanup policy to delete the required resources and don't have to deal with a new annotation Moreover, I don't think (may be wrong) that user would only want to delete only specific resources by adding the We will end up having a single option/user confirmation to cleanup data. The current flow of cluster tear down looks roughly like below:
|
@sp98 This is the exact requirement. Allow a customer to select a specific resource to force-delete and to mark it for force deletion a second before actually deleting it. So for the majority of the time customer's data is protected from deletion |
@sp98 This scenario came up for cleaning up the CephFilesystemSubvolumeGroups and CephRadosNamespace CRs. We want to force delete these resources instead of blocking the finalizer. The entire cluster is not being cleaned up, just those resources. The same pattern could apply to a CephObjectStore that has buckets, or a CephFilesystem with subvolumesgroups, etc. We don't want to use a CRD setting to control the forced cleanup. The annotation seems best. Let's discuss. |
To capture our conversation on the implementation for forced deletion of the CephFilesystemSubvolumeGroup CR...
The force deletion job will call a CLI command on the
|
Also to consider is moving the omap cleanup code from the kubectl-rook-ceph repo (in progress with rook/kubectl-rook-ceph#237) into the core rook repo, then the kubectl-rook-ceph repo can reference the rook code for omap cleanup |
Can we use Currently few things are not considered in the cleanup code
More details are covered at rook/kubectl-rook-ceph#251 and rook/kubectl-rook-ceph#211 once this is addressed we are good, will check if am missing anything else. |
This is proposed as a new command inside Rook (not kubectl-rook-ceph plugin) that could be called as a job.
If we are force deleting the subvolumegroup, I assume that means we should force delete all of those resources owned by the svg, including snapshots and other volumes. It might just mean multiple PRs to get these changes done if it makes sense. |
If everything is in the rook codebase why should we call a CLI command? why can't the job just invoke the go code itself? |
A K8s job needs a CLI entry point.
Yes, the plan is also for force deleting rados namespaces, that comment just captured discussion specific to subvolumegroups. |
I understand @travisn In my eyes, the job and the operator can be coupled as the job is an extension of the operator reconciliation responsibilities and is just an implementation detail for the reconcile execution flow |
Agreed it is an implementation detail. Rook internally has a number of CLI commands. For example, the OSD purge job calls such a CLI command. Internally, the Go packages are factored so the operator code is separate from ceph or other commands. |
Is this a bug report or feature request?
What should the feature do:
Rook has always been designed to to preserve the cluster and the data, adding finalizers to the critical resources and CRs so that the cluster will not be accidentally deleted. When the resources are deleted, the operator will refuse to remove the finalizer if there exist underlying ceph resources related to that CR. This is good for preservation of the data and the cluster, but it is also difficult to work with when it is desired to force cleanup some resource types, or the entire cluster.
Let's consider force deleting a Rook resource and its underlying Ceph resources under two conditions:
ceph.rook.io/force-deletion
Resources to consider cleaning up:
For example, if a CephFilesystemSubVolumeGroup CR has added the annotation
ceph.rook.io/force-deletion
and the CR is marked for deletion, the operator will get the deletion event. Rook will proceed to delete all subvolumes belonging to the subvolumegroup. When that cleanup is completed in ceph, Rook will remove the finalizer and allow the CR to be deleted.If there are many Ceph resources to cleanup, we may want to consider either a goroutine, or launching a K8s job for the long-running operation.
The cluster CR currently implements a different approach to force deletion of the entire ceph cluster. There is a cleanup policy setting that must be added to the cluster CR. If that setting is found, then Rook will force delete the cluster and kick off a job on each node to clean up the data on disk from mons and wipe the disks. This scenario is still valid and would still be supported.
What is use case behind this feature:
Enable better design for cleaning up resources.
The text was updated successfully, but these errors were encountered: