Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cockpit: Implement PCP metrics channel in the Python bridge #20049

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jelly
Copy link
Member

@jelly jelly commented Feb 16, 2024

Current ToDo's

  • test failure on Arch exception
  • TF failures, create a TEST VM without python3-pcp
  • Aks PCP if they wanna upload a wheel.
  • tox
  • investigate coverage once more
  • Debian-stable/testing lack python3-pcp when running test/image-prepare -q debian-testing it fails on pytest.

Follow ups, I'd suggest tackling:

  • Dropping the cockpit-pcp package
  • On demand install in metrics page, change it to installing pcp python3-pcp instead of cockpit-pcp
  • Dropping the cockpit-pcp C code et all
  • Deal with code which checks for the existence of cockpit-pcp (release-blocker)
  • Consider fetch group API https://gist.github.com/jelly/6f4d683fcd3f8797945f5f90bf0fe5d6

@jelly jelly added the no-test For doc/workflow changes, or experiments which don't need a full CI run, label Feb 16, 2024
src/cockpit/channels/pcp.py Fixed Show fixed Hide fixed
src/cockpit/channels/pcp.py Fixed Show fixed Hide fixed
src/cockpit/channels/pcp.py Fixed Show fixed Hide fixed
src/cockpit/channels/pcp.py Fixed Show fixed Hide fixed
@jelly jelly force-pushed the cockpit-pcp-python branch 2 times, most recently from 2821ebd to 920768d Compare February 21, 2024 11:53
src/cockpit/channels/pcp.py Fixed Show fixed Hide fixed
src/cockpit/channels/pcp.py Fixed Show fixed Hide fixed
src/cockpit/channels/pcp.py Fixed Show fixed Hide fixed
src/cockpit/channels/pcp.py Fixed Show fixed Hide fixed
src/cockpit/channels/pcp.py Fixed Show fixed Hide fixed
@jelly
Copy link
Member Author

jelly commented Jun 12, 2024

Tox is meh :( There is no pcp wheel so need to build it and the PyPi version is outdated 5.0 😞

src/cockpit/channels/pcp.py Fixed Show fixed Hide fixed
src/cockpit/channels/pcp.py Fixed Show fixed Hide fixed
@jelly jelly force-pushed the cockpit-pcp-python branch 3 times, most recently from 2f49acb to e303bd2 Compare July 5, 2024 14:51
src/cockpit/channels/pcp.py Fixed Show fixed Hide fixed
test/pytest/test_pcp.py Fixed Show fixed Hide fixed
src/cockpit/channels/pcp.py Fixed Show fixed Hide fixed
test/pytest/test_pcp.py Fixed Show fixed Hide fixed
test/pytest/test_pcp.py Fixed Show fixed Hide fixed
@jelly jelly force-pushed the cockpit-pcp-python branch 2 times, most recently from 877b872 to 34af402 Compare July 23, 2024 18:09
src/cockpit/channels/pcp.py Fixed Show fixed Hide fixed
src/cockpit/channels/pcp.py Fixed Show fixed Hide fixed
test/pytest/test_pcp.py Fixed Show fixed Hide fixed
test/pytest/test_pcp.py Fixed Show fixed Hide fixed
@jelly jelly removed the no-test For doc/workflow changes, or experiments which don't need a full CI run, label Sep 11, 2024
@jelly jelly force-pushed the cockpit-pcp-python branch 6 times, most recently from 4ebf17d to 31985ad Compare September 12, 2024 15:35
@jelly
Copy link
Member Author

jelly commented Sep 12, 2024

It is green now \o/

@@ -0,0 +1,670 @@
# This file is part of Cockpit.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename the file to pcpmetrics.py?

@martinpitt
Copy link
Member

I checked /var/lib/pcp/config/pmlogconf/tools/cockpit - that's the only remaining file in cockpit-pcp; all enabled metrics are already present in a multitude of other configs in /etc/pcp/pmlogconf , so we can just drop this. It's also strange and against our principles to change the system config with installing cockpit packages. So independently of this PR we can already drop that.

Assuming that the cockpit-pcp binary and the manifest go away, we then just need to change our on-demand installation from cockpit-pcp to pcp to cover the remaining purpose of our rpm (pull in the dependency).

@jelly
Copy link
Member Author

jelly commented Sep 18, 2024

I checked /var/lib/pcp/config/pmlogconf/tools/cockpit - that's the only remaining file in cockpit-pcp; all enabled metrics are already present in a multitude of other configs in /etc/pcp/pmlogconf , so we can just drop this. It's also strange and against our principles to change the system config with installing cockpit packages. So independently of this PR we can already drop that.

Assuming that the cockpit-pcp binary and the manifest go away, we then just need to change our on-demand installation from cockpit-pcp to pcp to cover the remaining purpose of our rpm (pull in the dependency).

See => #21020

@jelly
Copy link
Member Author

jelly commented Sep 18, 2024

Looking again at the fetchGroup API, this looks useful but it is so freaking broken, for some reason it returns no instances for the first metric.. (why!)

#!/usr/bin/python

import cpmapi as c_api
from pcp import pmapi

from test.pytest.test_pcp import instances_archive

# archive_path = "/tmp/pytest-of-jelle/pytest-current/disk-archives0"
archive_path = "/var/log/pcp/pmlogger/t14s/20240723.index"
# archive_path = "/home/jelle/projects/cockpit/cockpit-pcp-python/testEnableNoData/archive3/20240719.09.40.index"
pmfg = pmapi.fetchgroup(c_api.PM_CONTEXT_ARCHIVE, archive_path)
context = pmfg.get_context()

# To work out:
# - omit instances
# - scale

kernel_all_load = pmfg.extend_indom("kernel.all.load", c_api.PM_TYPE_FLOAT)
mem_physmem = pmfg.extend_item("mem.physmem")
mem_avail = pmfg.extend_indom("mem.util.available", c_api.PM_TYPE_FLOAT)
swap_pagesout = pmfg.extend_indom("swap.pagesout", c_api.PM_TYPE_FLOAT)
network_interface_total_bytes = pmfg.extend_indom("network.interface.total.bytes")
t = pmfg.extend_timestamp()

for _ in range(5):
    pmfg.fetch()
    print("time: %s" % t())
    for icode, iname, value in kernel_all_load():
        print('kernel.all.load %s %s=%s' % (icode, iname, value()))
    print("mem.physmem: %s" % mem_physmem())
    print(mem_avail())
    for icode, iname, value in mem_avail():
        if iname is not None:
            print('mem.util.available %s %s=%s' % (icode, iname, value()))
    print("total.bytes %s" % network_interface_total_bytes())
    for icode, iname, value in network_interface_total_bytes():
        if iname is not None:
            print('network.interface.total.bytes %s %s=%s' % (icode, iname, value()))

Maybe something for a follow up, it seems nice

[jelle@t14s][~/projects/cockpit/cockpit-pcp-python]%python3 fetchgroup-pcp.py
time: 2024-07-23 16:05:32.873173
mem.physmem: 32620024
[]
total.bytes []
time: 2024-07-23 16:05:32.894293
kernel.all.load 1 1 minute=1.9800000190734863
kernel.all.load 5 5 minute=1.4700000286102295
kernel.all.load 15 15 minute=1.309999942779541
mem.physmem: 32620024
[(4294967295, None, <function fetchgroup.fetchgroup_indom.__call__.<locals>.<lambda>.<locals>.<lambda> at 0x715ea1f02c00>)]
total.bytes [(0, 'lo', <function fetchgroup.fetchgroup_indom.__call__.<locals>.<lambda>.<locals>.<lambda> at 0x715ea1f02ca0>), (1, 'enp0s31f6', <function fetchgroup.fetchgroup_indom.__call__.<locals>.<lambda>.<locals>.<lambda> at 0x715ea1f02e80>), (2, 'wlp0s20f3', <function fetchgroup.fetchgroup_indom.__call__.<locals>.<lambda>.<locals>.<lambda> at 0x715ea1f02f20>), (3, 'enp0s20f0u2u6', <function fetchgroup.fetchgroup_indom.__call__.<locals>.<lambda>.<locals>.<lambda> at 0x715ea1f02fc0>), (14, 'virbr0', <function fetchgroup.fetchgroup_indom.__call__.<locals>.<lambda>.<locals>.<lambda> at 0x715ea1f03060>), (19, 'virbr1', <function fetchgroup.fetchgroup_indom.__call__.<locals>.<lambda>.<locals>.<lambda> at 0x715ea1f03100>)]
Traceback (most recent call last):
  File "/home/jelle/projects/cockpit/cockpit-pcp-python/fetchgroup-pcp.py", line 38, in <module>
    print('network.interface.total.bytes %s %s=%s' % (icode, iname, value()))
                                                                    ^^^^^^^
  File "/usr/lib/python3.12/site-packages/pcp/pmapi.py", line 3102, in <lambda>
    (lambda i: (lambda: decode_one(self, i)))(i)))
                        ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/pcp/pmapi.py", line 3096, in decode_one
    raise pmErr(self.stss[i])
pcp.pmapi.pmErr: PM_ERR_VALUE Missing metric value(s)

@jelly jelly marked this pull request as ready for review September 18, 2024 19:27
@jelly
Copy link
Member Author

jelly commented Sep 18, 2024

@martinpitt @mvollmer an initial review would be appreciated, I am not super fond of the spaghetti soup, combining direct and archived metrics together kinda makes it messy

This is also the reason we can't easily combine the internalmetrics and the PCP metrics, the archive metrics are too different. Plus we have no Sampler() class

Copy link
Member

@martinpitt martinpitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jelly , great work! I have a lot of little things, but honestly I'm not too worried about the code structure -- there is a lot of bit shifting, data type conversion, and plumbing, and I trust that all of this is necessary. But the overall structure is relatively easy to follow.

I'm really looking forward to landing this!

pyproject.toml Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
tools/debian/rules Outdated Show resolved Hide resolved
test/pytest/test_pcp.py Outdated Show resolved Hide resolved
test/pytest/test_pcp.py Show resolved Hide resolved
src/cockpit/channels/pcp.py Outdated Show resolved Hide resolved
src/cockpit/channels/pcp.py Outdated Show resolved Hide resolved
src/cockpit/channels/pcp.py Show resolved Hide resolved
src/cockpit/channels/pcp.py Show resolved Hide resolved
src/cockpit/channels/pcp.py Outdated Show resolved Hide resolved
@martinpitt martinpitt changed the title Cockpit pcp python cockpit: Implement PCP metrics channel in the Python bridge Sep 19, 2024
@jelly jelly force-pushed the cockpit-pcp-python branch 6 times, most recently from b9bee81 to fdf20b7 Compare September 23, 2024 10:09
The Python bridge still used the separate `cockpit-pcp` bridge for
metrics gathering. For us to remove the full C bridge implementation the
separate PCP bridge also has to be rewritten.

This rewrite is a more or less Python copy of the C implementation using
the Python PCP module. Even though the Python PCP module offers a
"higher level" fetchGroup API but preliminary testing has found this has
some issues with changing multi instance values and instances can not be
omitted with a fetch group (but this can be done in our own code)

Co-Authored-By: Allison Karlitskaya <[email protected]>
Co-Authored-By: Tomas Matus <[email protected]>
@jelly
Copy link
Member Author

jelly commented Sep 23, 2024

This seems to go green! @martinpitt I re-added the manifest as the JS code uses this to detect if PCP is available and if it's missing all PCP tests simply fail.

A follow up PR will take care of that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants