Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expose scheduled shutdown times #3110

Open
anarcat opened this issue Sep 4, 2024 · 3 comments
Open

expose scheduled shutdown times #3110

anarcat opened this issue Sep 4, 2024 · 3 comments

Comments

@anarcat
Copy link
Contributor

anarcat commented Sep 4, 2024

I've been struggling with porting a monitoring check from Nagios to Prometheus. What it does is raise a flag if there's a shutdown scheduled on a server. It does this through this horrendous NRPE check:

command[dsa2_shutdown]=if /usr/lib/nagios/plugins/check_procs -w 1: -u root -C shutdown > /dev/null || /usr/lib/nagios/plugins/check_procs -w 1: -u root -a /lib/systemd/systemd-shutdownd > /dev/null || ( busctl get-property org.freedesktop.login1 /org/freedesktop/login1 org.freedesktop.login1.Manager ScheduledShutdown 2> /dev/null | sed 's/[^"]*"//;s/".*//' | grep -v dry- | grep . ); then echo 'system-in-shutdown'; else echo 'no shutdown running' ; fi

i hope you can unsee this one day.

we can probably get rid of all the check_procs stuff there and assume systemd, at least that's what we're asserting it, which turns this into something like:

busctl get-property org.freedesktop.login1 /org/freedesktop/login1 org.freedesktop.login1.Manager ScheduledShutdown

and in fact, I wrote a Python script that would extract a metric out of that nicely:

#!/usr/bin/python3

import logging
import shlex
from subprocess import CalledProcessError, PIPE, run


def test_parse_dbus():
    no_sched = '(st) "" 18446744073709551615'
    assert parse_dbus(no_sched) == ("", 0)
    sched_reboot = '(st) "reboot" 1725477267406843'
    assert parse_dbus(sched_reboot) == ("reboot", 1725477267.406843)
    sched_reboot_round = '(st) "reboot" 1725477267506843'
    assert parse_dbus(sched_reboot_round) == ("reboot", 1725477267.506843)
    # theoritical: i've seen the metric "0" with the label "suspend"
    # before adding this test. i couldn't reproduce by suspending my
    # laptop, so i'm not sure wtf happened there.
    sched_suspend = '(st) "suspend" 0'
    assert parse_dbus(sched_suspend) == ("", 0)
    garbage = '(st) "reboot" 1725477267506843 jfdklafjds'
    assert parse_dbus(garbage) == ("", 0)
    assert parse_dbus("(st) ...") == ("", 0)
    assert parse_dbus("") == ("", 0)


def parse_dbus(output: str) -> tuple[str, float]:
    logging.debug("parsing DBus output: %s", output)
    try:
        _, kind, timestamp_str = output.split(maxsplit=2)
    except ValueError as exc:
        logging.warning("could not parse DBus output: %r (%s)", output, exc)
        return "", 0
    kind = kind.replace('"', "")
    try:
        timestamp = int(timestamp_str) / 1000000
    except ValueError as exc:
        logging.warning(
            "could not parse DBus timestamp: %r (%s)",
            timestamp_str,
            exc,
        )
        return "", 0
    logging.debug("found kind %r, timestamp %r", kind, timestamp)
    if kind and timestamp:
        return kind, timestamp
    else:
        return "", 0


def main():
    cmd = shlex.split(
        "busctl get-property org.freedesktop.login1 /org/freedesktop/login1 org.freedesktop.login1.Manager ScheduledShutdown"  # noqa: E501
    )
    try:
        proc = run(cmd, check=True, stdout=PIPE, encoding="ascii")
    except CalledProcessError as exc:
        logging.warning("could not call command %r: %s", shlex.join(cmd), exc)
        kind, timestamp = "", 0
    else:
        kind, timestamp = parse_dbus(proc.stdout)
    print("# HELP node_shutdown_scheduled_timestamp_seconds time of the next scheduled reboot, or zero")
    print("# TYPE node_shutdown_scheduled_timestamp_seconds gauge")
    if timestamp:
        print(
            "node_shutdown_scheduled_timestamp_seconds{kind=%s} %s" % (kind, timestamp)
        )
    else:
        print("node_shutdown_scheduled_timestamp_seconds 0")


if __name__ == "__main__":
    main()

the problem is there's nowhere to call this thing from: shutdown(8) doesn't have any post hooks, and i don't think systemd will fire any specific service when a shutdown is scheduled... there are some dbus signal sent around though, namely ScheduledShutdown which we can get with:

busctl get-property org.freedesktop.login1 /org/freedesktop/login1 org.freedesktop.login1.Manager ScheduledShutdown

... which is essentially what we're doing above.

But i figured a better place to do this would be in the node exporter itself, since it's already a daemon just sitting there.

@SuperQ
Copy link
Member

SuperQ commented Sep 5, 2024

Getting that property should be reasonably easy to do in the systemd collector.

@SuperQ
Copy link
Member

SuperQ commented Sep 5, 2024

I created #3111 as a draft. It doesn't work. I don't think the dbus API we have supports that generic call.

@anarcat
Copy link
Contributor Author

anarcat commented Sep 5, 2024

awesome work, thanks! i've followed up there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants