Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup parca-agent for ci-runs #187

Closed
wants to merge 1 commit into from

Conversation

SimonRichardson
Copy link
Member

Start collecting profiling data for ci-runs (specifically looking at longer gating test runs). We want the data for 2 reasons:

  1. See if there are any potential bottlenecks in terms of performance that we can see when the jujud and jujud-controller is running.
  2. We can use profile guided optimisations (PGO) from the runs to apply to the binaries.

This currently only applies to lxd deployments. For other providers we will need to install it via cloud-init or bake it into an image.

This is an experiment to see if it's useful or not.

@SimonRichardson SimonRichardson self-assigned this May 22, 2024
# Hide the setting of the bearer token to the parca agent.
set +x
{
sudo snap set parca-agent remote-store-bearer-token="${{PARCA_BEARER_TOKEN}}"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The $PARCA_BEARER_TOKEN is setup in the credentials.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to set up an || true in case it fails? I also don't know if this sort of thing ends up in bash history, etc.

fi

sudo snap install parca-agent --classic
sudo snap set parca-agent metadata-external-labels="machine=ci-run-${{BOOTSTRAP_PROVIDER}}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought, we could add some labels like TEST_RUNNER_NAME to make it easier to correlate to issues we eventually find?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, I think we just care about the provider/substrate rather what test is running. Otherwise we might start gaming for a specific test and not juju as a whole.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AIUI, this actually sets it up on the host whenever we have an lxd run, is that correct? So this isn't setting up parca-agent in lxd, it is setting up parca-agent on hosts that run lxd.
I have a slight concern about side effects if you are trying to run the CI suite locally. In general, we might want to have a CI step for Jenkins ephemeral bots to install and configure parca-agent, but I'm kind of against doing it as a side effect of running a test.

Copy link
Member

@nvinuesa nvinuesa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks!

Start collecting profiling data for ci-runs (specifically looking
at longer gating test runs). We want the data for 2 reasons:

 1. See if there are any potential bottlenecks in terms of performance
    that we can see when the jujud and jujud-controller is running.
 2. We can use profile guided optimisations (PGO) from the runs to
    apply to the binaries.

This currently only applies to lxd deployments. For other providers
we will need to install it via cloud-init or bake it into an image.

This is an experiment to see if it's useful or not.
Copy link
Member

@manadart manadart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is going to be of limited use.

Test scenarios won't give enough quality signal to inform profile-guided optimisation.

The controllers are so short-lived and so narrow in focus, that we won't see the kind of issues (growth over time for example) that profiling in-theatre shows.

@SimonRichardson
Copy link
Member Author

The controllers are so short-lived and so narrow in focus

The problem is, we don't as a Juju team run long-running controllers. Most of our work is driven by features and tearing down controllers is a natural practice. I would say that even some information is better than no information.

I will mention that the tests are inherently backwards in the way they run. The concept was to keep one singular controller runner per substrate for the lifetime of the gating tests. At the moment they're destroyed better each test. Making that change alone would ensure that we're recreating the lifetime of a controller with models added and removed, although in a condensed time frame. That would at least elevate the "narrow in focus" concern (which I do agree with).

I should have stated that this was intended to be a prototype for LXD to ensure that we are surfacing the parca-agent information to polar signals.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants