Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sliding Sync: Slight optimization when fetching state for the room (get_events_as_list(...)) #17718

Open
wants to merge 9 commits into
base: develop
Choose a base branch
from

Conversation

MadLittleMods
Copy link
Collaborator

@MadLittleMods MadLittleMods commented Sep 16, 2024

Spawning from @kegsay pointing out that the Sliding Sync endpoint doesn't handle a large room with a lot of state well on initial sync (requesting all state via required_state: [ ["*","*"] ]) (it just takes forever).

After investigating further, the slow part is just get_events_as_list(...) fetching all of the current state ID's out for the room (which can be 100k+ events for rooms with a lot of membership). This is just a slow thing in Synapse in general and the same thing happens in Sync v2 or the /state endpoint.


The only idea I had to improve things was to use batch_iter to only try fetching a fixed amount at a time instead of working with large maps, lists, and sets. This doesn't seem to have much effect though.

There is already a batch_iter(event_ids, 200) in _fetch_event_rows(...) for when we actually have to touch the database and that's inside a queue to deduplicate work.

I did notice one slight optimization to use get_events_as_list(...) directly instead of get_events(...). get_events(...) just turns the result from get_events_as_list(...) into a dict and since we're just iterating over the events, we don't need the dict/map.

Dev notes

Run the new tests

SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.storage.databases.main.test_events_worker.GetEventsTestCase

SYNAPSE_POSTGRES=1 SYNAPSE_POSTGRES_USER=postgres SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.storage.databases.main.test_events_worker.GetEventsTestCase

Enable Jaeger tracing in Twisted Trial tests:

Run Jaeger locally with Docker: https://www.jaegertracing.io/docs/1.6/getting-started/#all-in-one-docker-image

Add the following to tests/unittest.py in setUp(...):

        from synapse.logging.opentracing import init_tracer

        # Start the tracer
        init_tracer(self.hs)  # noqa

Add the following to your tests (in tests/rest/client/test_asdf.py for example):

    def default_config(self) -> JsonDict:
        config = super().default_config()
        config["opentracing"] = {
            "enabled": True,
            "jaeger_config": {
                "sampler": {"type": "const", "param": 1},
                "logging": False,
            },
        }
        return config

Add a sleep to the end of the test you're running so the traces have a chance to flow to Jaeger before the process exits.

import time
time.sleep(6)

If you're not tracing an endpoint/servlet (which will already have a logging context and tracing span), you will need to wrap your function under test in a logging context for the tracing to be happy. Also be sure to add @trace decorators to what you want to trace downstream.

def test_asdf(self) -> None:
    with LoggingContext("test") as _ctx:
        my_custom_function()

Run your tests.

Visit http://localhost:16686/ and choose the test master service and find the traces you're interesting in.

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Code style is correct
    (run the linters)

@MadLittleMods MadLittleMods changed the title Sliding Sync: Investigate get_events_as_list(...) taking a long time Sliding Sync: Investigate get_events_as_list(...) taking a long time for large rooms Sep 16, 2024
Comment on lines +446 to +450
events = await self.store.get_events_as_list(list(state_ids.values()))

state_map = {}
for key, event_id in state_ids.items():
event = event_map.get(event_id)
if event:
state_map[key] = event
for event in events:
state_map[(event.type, event.state_key)] = event
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One slight optimization to use get_events_as_list(...) directly instead of get_events(...). get_events(...) just turns the result from get_events_as_list(...) into a dict and since we're just iterating over the events, we don't need the dict/map.

def prepare(self, reactor: MemoryReactor, clock: Clock, hs: HomeServer) -> None:
self.store: EventsWorkerStore = hs.get_datastores().main

def test_get_lots_of_messages(self) -> None:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New test because I used it as a test bench and get consistent traces (previous iteration with tracing). The tracing stuff has been removed.

Feel free to nit to remove the whole thing.

@cancellable
async def get_unredacted_events_from_cache_or_db(
self,
event_ids: Iterable[str],
event_ids: Collection[str],
Copy link
Collaborator Author

@MadLittleMods MadLittleMods Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Collection because we iterate over event_ids multiple times (even before I added in the tracing tag)

async def _get_events_from_external_cache(
self, events: Iterable[str], update_metrics: bool = True
self, events: Collection[str], update_metrics: bool = True
Copy link
Collaborator Author

@MadLittleMods MadLittleMods Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one was a proper Iterable before and only used once but updated to be a Collection now that we also read from it for the tracing tag.

@MadLittleMods MadLittleMods changed the title Sliding Sync: Investigate get_events_as_list(...) taking a long time for large rooms Sliding Sync: Slight optimization when fetching state for the room (get_events_as_list(...)) Sep 17, 2024
@MadLittleMods MadLittleMods marked this pull request as ready for review September 17, 2024 00:38
@MadLittleMods MadLittleMods requested a review from a team as a code owner September 17, 2024 00:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant