Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add basic role and integration for Grafana Loki #1099

Merged
merged 5 commits into from
Sep 18, 2024

Conversation

sysvinit
Copy link
Member

@sysvinit sysvinit commented Sep 10, 2024

This change introduces minimal, bare-bones platform integration for Grafana Loki, intended to serve as a basis for further development.

This includes:

  • A role which can be assigned to VMs, which installs and provisions an instance of Loki. Currently this is statically configured to use filesystem-backed storage.
  • Automatic configuration of promtail, a log-shipping client for Loki, when a Loki server is discovered in the same resource group. Currently only a single Loki server is supported. promtail is configured to read the system journal in JSON format, as this reads annotations and metadata which would otherwise be lost if reading only the plain journal messages. The Loki query language LogQL supports decoding JSON log messages with the json filter.
  • Initial integration with Grafana in the existing statshost role. Hosts with the statshost-master role for a per-RG Grafana instance which also have the loki role enabled will have a data source added to allow Grafana to read and query data from the local Loki instance.

@flyingcircusio/release-managers

Release process

Impact: internal.

Changelog: none. This role is still under development, and is not yet intended for public use.

PR release workflow (internal)

  • PR has internal ticket
  • internal issue ID (PL-…) part of branch name
  • internal issue ID mentioned in PR description text
  • ticket is on Platform agile board
  • ticket state set to Pull request ready
  • if ticket is more urgent than within the next few days, directly contact a member of the Platform team

Design notes

  • Provide a feature toggle if the change might need to be adjusted/reverted quickly depending on context. Consider whether the default should be on or off. Example: rate limiting.
    • The log retention time is intentionally configurable, as this may differ between use cases and applications.
  • All customer-facing features and (NixOS) options need to be discoverable from documentation. Add or update relevant documentation such that hosted and guided customers can understand it as well.
    • No documentation yet, this is a development preview which is subject to change.

Security implications

This is a basic first implementation, we're reviewing and considering overall security measures in PL-133025.

At this point we're limiting retention to 30 days by default (overridable per loki instance). We are relying on the SRV firewall as usual and find unauthed/cleartext submission acceptable for now.

To reduce impact if attackers have access on a non-loki machine, we're limiting access to the loki API to selected endpoints.

  • Security requirements tested? (EVIDENCE)
    • Integrations tested incrementally in a development environment to ensure correct functioning.
    • data separation is handled with regular dev/whq/rzob and also RG separation for customer involvement
    • attack surface area minimisation: no auth used, but covered by RG firewalling, service only on SRV
    • loki runs as separate user

Simple configuration with static filesystem configuration.

PL-132981
This automatically enables promtail when a Loki server is present in
the same resource group. Currently only a single Loki server is
supported.

PL-132981
If the loki role is also enabled on the same host as statshost-master,
then add a Loki data source to the Grafana configuration.

PL-132981
@sysvinit sysvinit marked this pull request as ready for review September 16, 2024 13:02
Copy link
Contributor

@ctheune ctheune left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks like a great simple start.

Limit access to the Loki API from external machines to only the data
ingestion and querying endpoints. Configure Grafana to communciate
with the Loki API over localhost without restrictions.

PL-132981
Provide a minimal set of options for providing configuration for an S3
endpoint used for storing log data, and for providing the required
storage schedule entries in order to use the configured S3 endpoint.

PL-132891
@ctheune ctheune merged commit 5895757 into fc-24.05-dev Sep 18, 2024
2 checks passed
@ctheune ctheune deleted the PL-132981-loki-basic-role branch September 18, 2024 09:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants