Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for DB migrations #225

Merged
merged 24 commits into from
Aug 4, 2023
Merged

Add support for DB migrations #225

merged 24 commits into from
Aug 4, 2023

Conversation

S7evinK
Copy link
Collaborator

@S7evinK S7evinK commented Jul 26, 2023

This doesn't magically resolve performance issues just yet, as we are still hitting the database quite often to do updates.
Setting a fillfactor may reduce the load on the database, since the indexes possibly don't need to be updated, if there is still space in the block. (For reference - https://www.cybertec-postgresql.com/en/hot-updates-in-postgresql-for-better-performance/)

As for the claim that DeepEqual uses fewer allocations, benchmark results from my machine:

$ POSTGRES_DB=syncv3_test go test -bench=. -count 5 -run=^# -benchmem
goos: linux
goarch: amd64
pkg: github.com/matrix-org/sliding-sync/state
cpu: 12th Gen Intel(R) Core(TM) i5-12500H
BenchmarkReflectDeepEqual-16            13089988               101.9 ns/op           192 B/op          2 allocs/op
BenchmarkReflectDeepEqual-16            11241507                97.74 ns/op          192 B/op          2 allocs/op
BenchmarkReflectDeepEqual-16            12605713                97.81 ns/op          192 B/op          2 allocs/op
BenchmarkReflectDeepEqual-16            11592146               104.6 ns/op           192 B/op          2 allocs/op
BenchmarkReflectDeepEqual-16            11399110               101.0 ns/op           192 B/op          2 allocs/op
BenchmarkJSONMarhsalBytesEqual-16         866895              1161 ns/op             928 B/op         20 allocs/op
BenchmarkJSONMarhsalBytesEqual-16         975664              1180 ns/op             928 B/op         20 allocs/op
BenchmarkJSONMarhsalBytesEqual-16        1000000              1182 ns/op             928 B/op         20 allocs/op
BenchmarkJSONMarhsalBytesEqual-16        1000000              1192 ns/op             928 B/op         20 allocs/op
BenchmarkJSONMarhsalBytesEqual-16        1024558              1137 ns/op             928 B/op         20 allocs/op
PASS

This PR also embeds https://github.com/pressly/goose into the syncv3 binary, allowing automatic database migrations (solving #229)
New migrations can be created using

$ syncv3 migrate create $migrationName sql
$ syncv3 migrate create $migrationName go

SQL migrations work out of the box, go migrations, once the first is added, need to import _ "github.com/matrix-org/sliding-sync/state/migrations" in v3.go.

the migrate command has all the sub-commands mentioned on the README of goose.

@S7evinK S7evinK marked this pull request as ready for review July 28, 2023 15:13
Copy link
Member

@kegsay kegsay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need migration tests like what we do with dendrite. For now it will suffice to just use the v0.99.4 tag and then the checkout, to make sure things start up fine.

args := flags.Args()

if len(args) < 2 {
flags.Usage()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to document usage in README or somewhere.

state/device_data_table.go Show resolved Hide resolved
// The update to the DB would be a no-op; don't bother with it.
// This helps reduce write usage and the contention on the unique index for
// the device_data table.
return nil
}

// re-marshal and write
data, err := json.Marshal(writeBack)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we were going to use JSONB to stop shuttling across MBs of data every time?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we can only reduce the data we send when swapping the device lists?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Swapping or appending surely?

@kegsay kegsay changed the title Set a fillfactor of 90%, use DeepEqual instead of marshal -> bytes.Equal Add support for DB migrations Aug 2, 2023
@S7evinK S7evinK requested a review from kegsay August 3, 2023 11:35
Copy link
Member

@kegsay kegsay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thank you!

return nil
}

func downJSONB(ctx context.Context, tx *sql.Tx) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably run this in CI too, but I won't block this PR on it.

Copy link
Member

@kegsay kegsay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please land this PR and do any further changes in a different PR.

@@ -119,6 +120,22 @@ func (t *DeviceDataTable) Upsert(dd *internal.DeviceData) (err error) {
}
tempDD.DeviceLists = tempDD.DeviceLists.Combine(dd.DeviceLists)

// we already got something in the database - update by just sending the new data
if len(row.Data) > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't envisage this. I thought we'd do something more like:

if dd.OTKCounts != nil {
    txn.Exec(`UPDATE syncv3_device_data SET json_set(data, {otk}, $1) WHERE user_id = $2 AND device_id = $3`, dd.OTKCounts, userID, deviceID)
}

...and so on.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only problem with this is, that we are hitting the database for every change in dd. So if we have changes to OTKCounts and FallbackKeyTypes, we'd hit the DB twice, instead of once. One more hit to change the DeviceLists.

Going to have a look at how we can possibly avoid the SELECT first. (but in a different PR)

state/device_data_table_test.go Outdated Show resolved Hide resolved
@S7evinK S7evinK merged commit 34f3711 into main Aug 4, 2023
7 checks passed
@S7evinK S7evinK deleted the s7evink/devicedata branch August 4, 2023 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants