Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent state events returned by series of incremental /sync calls #17430

Open
hughns opened this issue Jul 10, 2024 · 0 comments
Open

Inconsistent state events returned by series of incremental /sync calls #17430

hughns opened this issue Jul 10, 2024 · 0 comments

Comments

@hughns
Copy link
Member

hughns commented Jul 10, 2024

This issue is based on element-hq/element-call#2458 (comment)

Synapse has been observed sending inconsistent room state events for incremental syncs when looking at the timeline and state arrays for a particular room.

The HAR of the sync requests
https://call-unstable.ems.host/_matrix/client/v3/sync?filter=0&timeout=30000&set_presence=online&since=s1002136_1_12_1333_4049_1_426_5717_0_1
{
  "next_batch": "s1002138_1_12_1333_4050_1_426_5717_0_1",
        "timeline": {
          "events": [
            {
              "type": "org.matrix.msc3401.call.member",
              "sender": "@hughns2:call-unstable.ems.host",
              "state_key": "@hughns2:call-unstable.ems.host",
              "unsigned": {
                "replaces_state": "$91MdSs1OcBO5xguVL2zeNhBQlo3hAt_DyFaEZkW8SfQ",
              },
              "event_id": "$q5m5Fk6Q0WuGC6PAR5QqrfnFPnXqz30amvy3vpf5NqM"
            }

---------------

https://call-unstable.ems.host/_matrix/client/v3/sync?filter=0&timeout=30000&set_presence=online&since=s1002138_1_12_1333_4050_1_426_5717_0_1
{
  "next_batch": "s1002140_1_12_1334_4051_1_426_5717_0_1",
        "state": {
          "events": [
            {
              "type": "org.matrix.msc3401.call.member",
              "sender": "@hughns2:call-unstable.ems.host",
              "state_key": "@hughns2:call-unstable.ems.host",
              "unsigned": {
                "replaces_state": "$9mFjm_Ox58jeAvdSL8OWlSShagn65niqyLLAHX23QGs",
              },
              "event_id": "$91MdSs1OcBO5xguVL2zeNhBQlo3hAt_DyFaEZkW8SfQ"
            }
          ]
        },
        "ephemeral": {
          "events": [
            {
              "type": "m.receipt",
              "content": {
                "$q5m5Fk6Q0WuGC6PAR5QqrfnFPnXqz30amvy3vpf5NqM": {
                  "m.read": {
                    "@hughns2:call-unstable.ems.host": {
                      "ts": 1719851861973,
                      "thread_id": "main"
                    }
                  }


----------------

https://call-unstable.ems.host/_matrix/client/v3/sync?filter=0&timeout=30000&set_presence=online&since=s1002143_1_12_1335_4052_1_426_5717_0_1
{
  "next_batch": "s1002146_1_12_1335_4052_1_426_5717_0_1",
        "timeline": {
          "events": [
            {
              "type": "org.matrix.msc3401.call.member",
              "sender": "@hughns2:call-unstable.ems.host",
              "state_key": "@hughns2:call-unstable.ems.host",
              "unsigned": {
                "replaces_state": "$q5m5Fk6Q0WuGC6PAR5QqrfnFPnXqz30amvy3vpf5NqM",
                }
              "event_id": "$aemXSHbzKzePfTDzuX00mRFZJmJIJFdQ3I55EPllOUY"
            }

-----------------------------------

https://call-unstable.ems.host/_matrix/client/v3/sync?filter=0&timeout=30000&set_presence=online&since=s1002146_1_12_1335_4052_1_426_5717_0_1
{
  "next_batch": "s1002148_1_12_1335_4052_1_426_5717_0_1",
        "state": {
          "events": [
            {
              "type": "org.matrix.msc3401.call.member",
              "sender": "@hughns2:call-unstable.ems.host",
              "state_key": "@hughns2:call-unstable.ems.host",
              "origin_server_ts": 1719851861110,
              "unsigned": {
                "replaces_state": "$91MdSs1OcBO5xguVL2zeNhBQlo3hAt_DyFaEZkW8SfQ",
              },
              "event_id": "$q5m5Fk6Q0WuGC6PAR5QqrfnFPnXqz30amvy3vpf5NqM"
            }


----------------

https://call-unstable.ems.host/_matrix/client/v3/sync?filter=0&timeout=30000&set_presence=online&since=s1002148_1_12_1335_4052_1_426_5717_0_1
{
  "next_batch": "s1002149_1_12_1335_4052_1_426_5717_0_1",
        "timeline": {
          "events": [
            {
              "type": "org.matrix.msc3401.call.member",
              "sender": "@hughns2:call-unstable.ems.host",
              "state_key": "@hughns2:call-unstable.ems.host",
              "unsigned": {
                "replaces_state": "$aemXSHbzKzePfTDzuX00mRFZJmJIJFdQ3I55EPllOUY",
              },
              "event_id": "$8LDSegFL8fkNUaNI_Jc2pgXla17t2FUmsE6JusnYCyU"
            }
          ],

--------------------

This can be summarised as the following changes:

timeline: $91M -> $q5m
state:    $9mF -> $91M # but, $91M has already been replaced by $q5m 
timeline: $q5m -> $aem
state:    $91M -> $q5m # but, $q5m has already been replaced by $aem
timeline: $aem -> $8LD

Because this is an incremental sync, there is no need for the state array to contain anything as the timeline has all the events anyhow. By sending the out of date state array back to the client it confuses the client.

Looking at the DB we get
select stream_ordering, event_id, type, state_key, (select array_agg(prev_event_id) from event_edges where events.event_id = event_edges.event_id), (select state_group from event_to_state_groups as sg where events.event_id = sg.event_id)  from events where room_id = '!totKtECNNKyhzLOiNl:call-unstable.ems.host' and 1002136 < stream_ordering and stream_ordering <= 1002149 order by stream_ordering;

 stream_ordering |                   event_id                   |              type              |            state_key            |                                          array_agg                                          | state_group 
-----------------+----------------------------------------------+--------------------------------+---------------------------------+---------------------------------------------------------------------------------------------+-------------
         1002136 | $YPhOcZiyzEJyMHsrPFtJKxCljsZzQZOFjOE2k7_9am0 | org.matrix.msc3401.call.member | @hughns:call-unstable.ems.host  | {$89_MkkYqF-ViNH0qGkqmQyb5LPnslb4VqQarURn0iUA}                                              |       37615
         1002138 | $q5m5Fk6Q0WuGC6PAR5QqrfnFPnXqz30amvy3vpf5NqM | org.matrix.msc3401.call.member | @hughns2:call-unstable.ems.host | {$YPhOcZiyzEJyMHsrPFtJKxCljsZzQZOFjOE2k7_9am0}                                              |       37616
         1002139 | $tIhui6kBfzQDGV5FaZZJn_yKsPqeUwM22JeUIikdFRw | m.room.encrypted               |                                 | {$YPhOcZiyzEJyMHsrPFtJKxCljsZzQZOFjOE2k7_9am0}                                              |       37615
         1002140 | $bTqD6YwK-cwSv4-F6G7_BlWgItaUUssQ5bRFAET-ReA | m.room.encrypted               |                                 | {$q5m5Fk6Q0WuGC6PAR5QqrfnFPnXqz30amvy3vpf5NqM,$tIhui6kBfzQDGV5FaZZJn_yKsPqeUwM22JeUIikdFRw} |       37616
         1002146 | $aemXSHbzKzePfTDzuX00mRFZJmJIJFdQ3I55EPllOUY | org.matrix.msc3401.call.member | @hughns2:call-unstable.ems.host | {$bTqD6YwK-cwSv4-F6G7_BlWgItaUUssQ5bRFAET-ReA}                                              |       37617
         1002147 | $ugzpWM2F6VRnEcpnp0dYSzfXnKKRyTCHH-TQc9dcwcM | m.room.encrypted               |                                 | {$bTqD6YwK-cwSv4-F6G7_BlWgItaUUssQ5bRFAET-ReA}                                              |       37616
         1002148 | $xH5t5qKv0tQO7_wUdxoatp9pz-YsV_wnNSLxntLYuqg | m.room.encrypted               |                                 | {$aemXSHbzKzePfTDzuX00mRFZJmJIJFdQ3I55EPllOUY,$ugzpWM2F6VRnEcpnp0dYSzfXnKKRyTCHH-TQc9dcwcM} |       37617
         1002149 | $8LDSegFL8fkNUaNI_Jc2pgXla17t2FUmsE6JusnYCyU | org.matrix.msc3401.call.member | @hughns2:call-unstable.ems.host | {$xH5t5qKv0tQO7_wUdxoatp9pz-YsV_wnNSLxntLYuqg}                                              |       37618
(8 rows)

Which is roughly:

weird_sync dot

Where bottom is most recent and background colour is the state groups.

If you look at the DB output, which is ordered by stream_ordering you can see that the state keeps bouncing back and forth. This is likely confusing the algorithm for computing the state deltas to send to clients, due to poor handling of these cases in Synapse. The fix for this is probably to move to tracking state changes via current state, but that is not a small change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant