Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RHELC-1124] Refactor applock to have a more robust API. #979

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

jochapma
Copy link
Contributor

@jochapma jochapma commented Nov 2, 2023

[RHELC-1124] Refactor ApplicationLock to have a more consistent API

Jira Issues: RHELC-1124

Checklist

  • PR has been tested manually in a VM (either author or reviewer)
  • Jira issue has been made public if possible
  • [RHELC-] is part of the PR title
  • GitHub label has been added to help with Release notes
  • PR title explains the change from the user's point of view
  • Code and tests are documented properly
  • The commits are squashed to as few commits as possible (without losing data)
  • When merged: Jira issue has been updated to Release Pending if relevant

Copy link

codecov bot commented Nov 2, 2023

Codecov Report

Attention: Patch coverage is 83.78378% with 6 lines in your changes missing coverage. Please review.

Project coverage is 95.54%. Comparing base (05e0d66) to head (577ec86).
Report is 150 commits behind head on main.

Files with missing lines Patch % Lines
convert2rhel/applock.py 81.25% 5 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #979   +/-   ##
=======================================
  Coverage   95.54%   95.54%           
=======================================
  Files          53       53           
  Lines        4642     4648    +6     
  Branches      815      816    +1     
=======================================
+ Hits         4435     4441    +6     
- Misses        129      132    +3     
+ Partials       78       75    -3     
Flag Coverage Δ
centos-linux-7 90.71% <83.78%> (+0.01%) ⬆️
centos-linux-8 91.64% <83.78%> (+0.01%) ⬆️
centos-linux-9 91.68% <83.78%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jochapma jochapma changed the title Refactored applock to have a more robust API. [RHELC-1124] Refactor applock to have a more robust API. Nov 2, 2023
@jochapma jochapma added the tests-run-tier0 PR ready to run the essential test suit. Equivalent to `/packit test --labels tier0`. label Nov 2, 2023
@has-bot
Copy link
Member

has-bot commented Nov 2, 2023

/packit test --labels tier0


@oamg/conversions-qe please review results and provide ack.

:returns: the file contents as an integer, or None if it doesn't exist
:raises: ApplicationLockedError if the contents are corrupt
"""
if os.path.exists(self._pidfile):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: To avoid races, it's better to do a try: except: around the open() rather than check for file existence first.

Comment on lines 62 to 64
# Our process ID
self._pid = os.getpid()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Our process ID
self._pid = os.getpid()
# Our process ID. We save this when the lock is created so it will be consistent even if we check from inside a fork.
self._pid = os.getpid()


pid = self._read_pidfile()
if pid == self._pid:
return
if self._pid_exists(pid):
raise ApplicationLockedError("%s locked by process %d" % (self._pidfile, pid))
# The lock file was created by a process that has exited;
# remove it and try again.
loggerinst.info("Cleaning up lock held by exited process %d." % pid)
os.unlink(self._pidfile)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a race condition here too, where something else may have already unlinked the file so we need to catch the exception.

Also.... I think the race could also mean that self._pidfile might not contain the same pid as we checked with _pid_exists (meaning between us reading the pid and unlinking the file, another process has unlinked the original and created their new one in its place.) Is there some way to fix that with some combination of counting links to the lockfile, adding one new link to the file, and checking whether the count of links matches how many we think it should at each step of the way? (And if it doesn't match, having some sort of time.sleep(random) before retrying?) (I'm not sure if there is a way or not. I think there might be but a quick internet search didn't turn up a recipe so I might be overly optimisitc. If not, it's probably better to err on the side of telling the user to check whether the lock is stale rather than overwriting the lock).

@Venefilyn
Copy link
Member

@jochapma there's 3 merge commits in the PR. If you can, drop them and rebase with main instead

convert2rhel/applock.py Fixed Show fixed Hide fixed
convert2rhel/applock.py Fixed Show resolved Hide resolved
convert2rhel/applock.py Outdated Show resolved Hide resolved
convert2rhel/applock.py Outdated Show resolved Hide resolved
@jochapma jochapma added the bug-fix A bug has been fixed label May 7, 2024
@jochapma jochapma marked this pull request as ready for review May 7, 2024 14:02
Copy link
Member

@Venefilyn Venefilyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall seems fine, some exceptions being re-raised but not with the same exception. There is a file open that is never closed, which could be an issue

convert2rhel/applock.py Show resolved Hide resolved
convert2rhel/applock.py Fixed Show resolved Hide resolved
# In Python 3 this could be changed to FileNotFoundError.
if exc.errno == errno.ENOENT:
return None
raise
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, we should be raising exception again

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to leave the file open and close it later explicitly because the file lock is tied to the file descriptor. The file lock is required to plug a race condition in which the pid file is overwritten by another process between the point where we read its contents and unlink it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bare raise statement reraises the current exception; @abadger requested that I use this idiom.

Comment on lines 182 to 186
except OSError as exc:
# In Python 3 this could be changed to FileNotFoundError.
if exc.errno == errno.ENOENT:
return
raise
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't covered by tests

"""Release the advisory file lock we hold on the PID file
and close the open file descriptor."""
if self._pidfile_fp:
fcntl.flock(self._pidfile_fp, fcntl.LOCK_UN)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do? Comments would be nice

Copy link
Member

@Venefilyn Venefilyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smaller comments, will look at tests next. The flags didn't seen intuitive or something we see every day so having comments of what the flags does would be nice

convert2rhel/applock.py Outdated Show resolved Hide resolved
convert2rhel/applock.py Outdated Show resolved Hide resolved
convert2rhel/applock.py Outdated Show resolved Hide resolved
convert2rhel/applock.py Outdated Show resolved Hide resolved
convert2rhel/applock.py Outdated Show resolved Hide resolved
Comment on lines 175 to 187
with open(self._pidfile, "r") as filep:
fcntl.flock(filep, fcntl.LOCK_EX)
try:
file_contents = filep.read()
pid = int(file_contents.rstrip())
if pid == self._pid:
return
if self._pid_exists(pid):
raise ApplicationLockedError("%s locked by process %d" % (self._pidfile, pid))
# The lock file was created by a process that has exited;
# remove it and try again.
loggerinst.info("Cleaning up lock held by exited process %d." % pid)
self._safe_unlink()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we not reuse the logic of is_locked()? Otherwise we might change it in the future on one part and miss the other etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was unable to find a way to avoid a race condition in unlink without keeping a file pointer open across multiple methods, which all my reviewers didn't like. So I removed the automatic cleanup; this also has the benefit of simplifying everything considerably, a good idea since this is my last week.

What this does is add another "can't happen" case (if the lock file somehow doesn't get cleaned up), but that can be easily remedied manually.

In the course of doing that, I addressed the is_locked() reuse.

Co-authored-by: Freya Gustavsson <[email protected]>
jochapma and others added 5 commits August 28, 2024 09:34
Co-authored-by: Freya Gustavsson <[email protected]>
Co-authored-by: Freya Gustavsson <[email protected]>
Co-authored-by: Freya Gustavsson <[email protected]>
Co-authored-by: Freya Gustavsson <[email protected]>
@jochapma jochapma requested a review from a team as a code owner August 28, 2024 17:34
@jochapma
Copy link
Contributor Author

/packit build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-fix A bug has been fixed tests-run-tier0 PR ready to run the essential test suit. Equivalent to `/packit test --labels tier0`.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants