You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We've had a multiple cases now where there's a migration that is backwards-compatible, meaning that the new version can run fine without the migration having been run, with the worst case being that some results are slightly wrong or minor degradation of functionality. Many of them are even forward-compatible as well, so the old version of lemmy can also deal with the new schema.
This migration purely updates many rows with a new value. Currently this migration runs during a full downtime of lemmy, resulting in 30min+ where the instance is unavailable. In addition, afterwards it needs to catch up with the federation resulting in high instance load and potentially slowness for a while.
Describe the solution you'd like.
The example migration above could be written in a way that it does not impact site operation, by processing the data in batches. For example:
CREATE OR REPLACE PROCEDURE fix_controversy_rank_2024_07_01 (start_id bigint, end_id bigint)
AS $$
DECLARE
batch_size int :=10 ^ 4;
BEGIN
LOOP
RAISE NOTICE 'Processing controversy rank starting from ID: %', start_id;
UPDATE
post_aggregates
SET
controversy_rank = CASE WHEN downvotes <=0OR upvotes <=0 THEN
0
ELSE
(upvotes + downvotes) ^ CASE WHEN upvotes > downvotes THEN
downvotes::float / upvotes::float
ELSE
upvotes::float / downvotes::float
END
END
WHERE
upvotes >0AND downvotes >0AND id BETWEEN start_id AND start_id + batch_size;
-- go to next batch
start_id := start_id + batch_size;
EXIT
WHEN start_id > end_id;
COMMIT; -- this commit is very important so that each batch happens in a separate transaction-- Exit when start_id exceeds end_id
END LOOP;
END;
$$
LANGUAGE plpgsql;
DO $$
DECLARE
end_id bigint;
BEGINSELECT
COALESCE(MAX(id), 1) INTO end_id
FROM post_aggregates;
RAISE NOTICE 'Processing batch starting from ID 1 to ID: %', end_id;
CALL fix_controversy_rank_2024_07_01 (1::bigint, end_id::bigint);
END
$$;
Then the issue becomes, how do we run it? The problem is that we will have some migrations that can run online, but some that need to be offline. The diesel migration runner can not make a distinction between these. In #4673, @dullbananas is trying to add a custom migration runner which might be useful for this.
We could put online migration into a separate directory or add something like -- lemmy-online-migration -- to the top of the files. Those files would be run separately and after server start, instead of before it.
Describe alternatives you've considered.
we can keep it as is, resulting in more and more downtimes as tables increase in size
we could remove some of those migrations and instead publish in the changelog something like "please run this query manually at your convenience"
Additional context
No response
The text was updated successfully, but these errors were encountered:
Migrations should create procedures in a separate schema called "run_after_migrations", and the lemmy server should spawn a task that runs and drops the procedures. It will need to be specified that these procedures must be idempotent because some procedures like this controversy rank fixer need to commit changes before the procedure is dropped. Migrations can update or delete an old procedure if it becomes incompatible with the current schema.
Requirements
Is your proposal related to a problem?
We've had a multiple cases now where there's a migration that is backwards-compatible, meaning that the new version can run fine without the migration having been run, with the worst case being that some results are slightly wrong or minor degradation of functionality. Many of them are even forward-compatible as well, so the old version of lemmy can also deal with the new schema.
The most recent example in 0.19.6: https://github.com/LemmyNet/lemmy/blob/4018049d10ef84968b31e03d385fdd6874e5782b/migrations/2024-07-01-014711_exponential_controversy/up.sql#L1-L16
This migration purely updates many rows with a new value. Currently this migration runs during a full downtime of lemmy, resulting in 30min+ where the instance is unavailable. In addition, afterwards it needs to catch up with the federation resulting in high instance load and potentially slowness for a while.
Describe the solution you'd like.
The example migration above could be written in a way that it does not impact site operation, by processing the data in batches. For example:
Then the issue becomes, how do we run it? The problem is that we will have some migrations that can run online, but some that need to be offline. The diesel migration runner can not make a distinction between these. In #4673, @dullbananas is trying to add a custom migration runner which might be useful for this.
We could put online migration into a separate directory or add something like
-- lemmy-online-migration --
to the top of the files. Those files would be run separately and after server start, instead of before it.Describe alternatives you've considered.
Additional context
No response
The text was updated successfully, but these errors were encountered: