Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix attnum mismatch in pg_statistic after drop column and database restore #90

Merged
merged 19 commits into from
Aug 17, 2024

Conversation

whitehawk
Copy link

@whitehawk whitehawk commented Jun 28, 2024

Fix attnum mismatch in pg_statistic after drop column and database restore

Problem:
In case a column (or columns) was dropped from a table, after backup and restore
cycle, second backup may contain pg_statistic values with a type not matching
the data (for ex. "array_in('{"test"}', '"timestamp"'::regtype::oid, -1)").

Root cause:
Records from pg_statistic were backed up with their actual values, including
attnums for the columns. But during restore, all tables are restored from
scratch. So, if a column is dropped in the table before backup, after the
restore attnum values in the pg_attribute table differ from the attnum
values that were before the backup (if the dropped column is not the one with
the highest attnum). But the restored values of pg_statistic had old attnums.
So, pg_statistics was not consistent with pg_attribute, and one of the side
effects was that on next backup the type for pg_statistic value was taken from
other column (and some pg_statistic were just lost).

Fix:
On the restore phase the column name is used to obtain actual attnum value from
the restored catalog and this value is now used when restoring statistics. The
'AttNumber' field is removed from the AttributeStatistic struct to prevent
its future use.

@whitehawk whitehawk changed the title DRAFT Adbdev 5861 Fix attnum mismatch in pg_statistic after drop column and database restore Jun 28, 2024
@whitehawk
Copy link
Author

NOTE: If one wonders about possible performance impact of the delta, here is some execution time measurement done. On a db with 2000 tables (with 120 columns each), the backup time with statistics is following:

with the fix:

real	1m7,734s
user	0m25,875s
sys	0m16,997s

without the fix:

real	1m0,412s
user	0m22,166s
sys	0m14,506s

which doesn't look dramatic.

The code for creation test tables is following:
CREATE OR REPLACE FUNCTION create_many_tables()
RETURNS VOID
AS $$
BEGIN

	for counter in 1..2000 loop
		EXECUTE FORMAT($fmt$DROP TABLE IF EXISTS test_table_%1$s;$fmt$,
			counter);
			
		EXECUTE FORMAT($fmt$CREATE TABLE test_table_%1$s 
			(
				field_block_A_1		int8,
				field_block_A_2		varchar,
				field_block_A_3		int8,
				field_block_A_4		int8,
				field_block_A_5		int8,
				field_block_A_6		varchar,
				field_block_A_7		timestamp,
				field_block_A_8		varchar,
				field_block_A_9		int8,
				field_block_A_10	numeric,
				field_block_A_11	int8,
				field_block_A_12	int4,
				field_block_A_13	numeric,
				field_block_A_14	int4,
				field_block_A_15	varchar,
				field_block_A_16	int4,
				field_block_A_17	int8,
				field_block_A_18	int4,
				field_block_A_19	varchar,
				field_block_A_20	int4,
				field_block_A_21	date,
				field_block_A_22	int4,
				field_block_A_23	int8,
				field_block_A_24	int4,
				field_block_A_25	varchar,
				field_block_A_26	int4,
				field_block_A_27	varchar,
				field_block_A_28	int4,
				field_block_A_29	numeric,
				field_block_A_30	int4,
				field_block_A_31	varchar,
				field_block_A_32	int4,
				field_block_A_33	varchar,
				field_block_A_34	int4,
				field_block_A_35	numeric,
				field_block_A_36	int4,
				field_block_A_37	varchar,
				field_block_A_38	int4,
				field_block_A_39	int4,
				field_block_A_40	int4,
				field_block_B_1		int8,
				field_block_B_2		varchar,
				field_block_B_3		int8,
				field_block_B_4		int8,
				field_block_B_5		int8,
				field_block_B_6		varchar,
				field_block_B_7		timestamp,
				field_block_B_8		varchar,
				field_block_B_9		int8,
				field_block_B_10	numeric,
				field_block_B_11	int8,
				field_block_B_12	int4,
				field_block_B_13	numeric,
				field_block_B_14	int4,
				field_block_B_15	varchar,
				field_block_B_16	int4,
				field_block_B_17	int8,
				field_block_B_18	int4,
				field_block_B_19	varchar,
				field_block_B_20	int4,
				field_block_B_21	date,
				field_block_B_22	int4,
				field_block_B_23	int8,
				field_block_B_24	int4,
				field_block_B_25	varchar,
				field_block_B_26	int4,
				field_block_B_27	varchar,
				field_block_B_28	int4,
				field_block_B_29	numeric,
				field_block_B_30	int4,
				field_block_B_31	varchar,
				field_block_B_32	int4,
				field_block_B_33	varchar,
				field_block_B_34	int4,
				field_block_B_35	numeric,
				field_block_B_36	int4,
				field_block_B_37	varchar,
				field_block_B_38	int4,
				field_block_B_39	int4,
				field_block_B_40	int4,
				field_block_C_1		int8,
				field_block_C_2		varchar,
				field_block_C_3		int8,
				field_block_C_4		int8,
				field_block_C_5		int8,
				field_block_C_6		varchar,
				field_block_C_7		timestamp,
				field_block_C_8		varchar,
				field_block_C_9		int8,
				field_block_C_10	numeric,
				field_block_C_11	int8,
				field_block_C_12	int4,
				field_block_C_13	numeric,
				field_block_C_14	int4,
				field_block_C_15	varchar,
				field_block_C_16	int4,
				field_block_C_17	int8,
				field_block_C_18	int4,
				field_block_C_19	varchar,
				field_block_C_20	int4,
				field_block_C_21	date,
				field_block_C_22	int4,
				field_block_C_23	int8,
				field_block_C_24	int4,
				field_block_C_25	varchar,
				field_block_C_26	int4,
				field_block_C_27	varchar,
				field_block_C_28	int4,
				field_block_C_29	numeric,
				field_block_C_30	int4,
				field_block_C_31	varchar,
				field_block_C_32	int4,
				field_block_C_33	varchar,
				field_block_C_34	int4,
				field_block_C_35	numeric,
				field_block_C_36	int4,
				field_block_C_37	varchar,
				field_block_C_38	int4,
				field_block_C_39	int4,
				field_block_C_40	int4
			) DISTRIBUTED BY (field_block_A_1);$fmt$,
			counter);
			
		EXECUTE FORMAT($fmt$INSERT INTO test_table_%1$s 
			VALUES
			(
			1, 'val_field_block_A_2', 3, 4, 5, 'val_field_block_A_6', '2021-01-01', 'val_field_block_A_8', 9, 10,
			11, 12, 13, 14, 'val_field_block_A_15', 16, 17, 18, 'val_field_block_A_19', 20, '2021-01-02', 22, 23,
			24, 'val_field_block_A_25', 26, 'val_field_block_A_27', 28, 29, 30, 'field_block_A_31', 32,
			'val_field_block_A_33', 34, 35, 36, 'val_field_block_A_37', 38, 39, 40,

			100, 'val_field_block_B_2', 103, 104, 105, 'val_field_block_B_6', '2021-02-01', 'val_field_block_B_8', 109, 110,
			111, 112, 113, 114, 'val_field_block_B_15', 116, 117, 118, 'val_field_block_B_19', 120, '2021-02-02', 122, 123,
			124, 'val_field_block_B_25', 126, 'val_field_block_B_27', 128, 129, 130, 'field_block_B_31', 132,
			'val_field_block_B_33', 134, 135, 136, 'val_field_block_B_37', 138, 139, 140,

			200, 'val_field_block_C_2', 203, 204, 205, 'val_field_block_C_6', '2021-03-01', 'val_field_block_C_8', 209, 210,
			211, 212, 213, 214, 'val_field_block_C_15', 216, 217, 218, 'val_field_block_C_19', 220, '2021-03-02', 222, 223,
			224, 'val_field_block_C_25', 226, 'val_field_block_C_27', 228, 229, 230, 'field_block_C_31', 232,
			'val_field_block_C_33', 234, 235, 236, 'val_field_block_C_37', 238, 239, 240
			),
			(
			1, 'val_field_block_A_2', 3, 4, 5, 'val_field_block_A_6', '2021-01-01', 'val_field_block_A_8', 9, 10,
			11, 12, 13, 14, 'val_field_block_A_15', 16, 17, 18, 'val_field_block_A_19', 20, '2021-01-02', 22, 23,
			24, 'val_field_block_A_25', 26, 'val_field_block_A_27', 28, 29, 30, 'field_block_A_31', 32,
			'val_field_block_A_33', 34, 35, 36, 'val_field_block_A_37', 38, 39, 40,

			100, 'val_field_block_B_2', 103, 104, 105, 'val_field_block_B_6', '2021-02-01', 'val_field_block_B_8', 109, 110,
			111, 112, 113, 114, 'val_field_block_B_15', 116, 117, 118, 'val_field_block_B_19', 120, '2021-02-02', 122, 123,
			124, 'val_field_block_B_25', 126, 'val_field_block_B_27', 128, 129, 130, 'field_block_B_31', 132,
			'val_field_block_B_33', 134, 135, 136, 'val_field_block_B_37', 138, 139, 140,

			200, 'val_field_block_C_2', 203, 204, 205, 'val_field_block_C_6', '2021-03-01', 'val_field_block_C_8', 209, 210,
			211, 212, 213, 214, 'val_field_block_C_15', 216, 217, 218, 'val_field_block_C_19', 220, '2021-03-02', 222, 223,
			224, 'val_field_block_C_25', 226, 'val_field_block_C_27', 228, 229, 230, 'field_block_C_31', 232,
			'val_field_block_C_33', 234, 235, 236, 'val_field_block_C_37', 238, 239, 240
			),
			(
			1, 'val_field_block_A_2', 3, 4, 5, 'val_field_block_A_6', '2021-01-01', 'val_field_block_A_8', 9, 10,
			11, 12, 13, 14, 'val_field_block_A_15', 16, 17, 18, 'val_field_block_A_19', 20, '2021-01-02', 22, 23,
			24, 'val_field_block_A_25', 26, 'val_field_block_A_27', 28, 29, 30, 'field_block_A_31', 32,
			'val_field_block_A_33', 34, 35, 36, 'val_field_block_A_37', 38, 39, 40,

			100, 'val_field_block_B_2', 103, 104, 105, 'val_field_block_B_6', '2021-02-01', 'val_field_block_B_8', 109, 110,
			111, 112, 113, 114, 'val_field_block_B_15', 116, 117, 118, 'val_field_block_B_19', 120, '2021-02-02', 122, 123,
			124, 'val_field_block_B_25', 126, 'val_field_block_B_27', 128, 129, 130, 'field_block_B_31', 132,
			'val_field_block_B_33', 134, 135, 136, 'val_field_block_B_37', 138, 139, 140,

			200, 'val_field_block_C_2', 203, 204, 205, 'val_field_block_C_6', '2021-03-01', 'val_field_block_C_8', 209, 210,
			211, 212, 213, 214, 'val_field_block_C_15', 216, 217, 218, 'val_field_block_C_19', 220, '2021-03-02', 222, 223,
			224, 'val_field_block_C_25', 226, 'val_field_block_C_27', 228, 229, 230, 'field_block_C_31', 232,
			'val_field_block_C_33', 234, 235, 236, 'val_field_block_C_37', 238, 239, 240
			)
			;$fmt$,
			counter);
	end loop;

END$$
LANGUAGE plpgsql VOLATILE
EXECUTE ON MASTER;

@whitehawk whitehawk marked this pull request as ready for review June 28, 2024 11:24
red1452
red1452 previously approved these changes Jul 25, 2024
@RekGRpth

This comment was marked as resolved.

restore/restore.go Outdated Show resolved Hide resolved
backup/statistics.go Outdated Show resolved Hide resolved
@swork12
Copy link

swork12 commented Aug 5, 2024

bender build

backup/statistics.go Outdated Show resolved Hide resolved
restore/restore.go Outdated Show resolved Hide resolved
backup/statistics.go Outdated Show resolved Hide resolved
backup/statistics.go Outdated Show resolved Hide resolved
backup/statistics.go Outdated Show resolved Hide resolved
restore/restore.go Outdated Show resolved Hide resolved
restore/restore.go Outdated Show resolved Hide resolved
@@ -415,6 +415,10 @@ func editStatementsRedirectSchema(statements []toc.StatementWithType, redirectSc
permissionsRE := regexp.MustCompile(fmt.Sprintf(`(?m)(^(?:REVOKE|GRANT) .+ ON .+?) (%s)((\..+)? (?:FROM|TO) .+)`, schemaMatch))
// This expression matches an ATTACH PARTITION statement and captures both the parent and child schema names
attachRE := regexp.MustCompile(fmt.Sprintf(`(ALTER TABLE(?: ONLY)?) (%[1]s)(\..+ ATTACH PARTITION) (%[1]s)(\..+)`, schemaMatch))
// This expression matches a '<schema>.<table>'::regclass::oid expression
regclassOidRE := regexp.MustCompile(fmt.Sprintf(`'(%s)((\.[^']+)'\:\:regclass\:\:oid)`, schemaMatch))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this regex does not catch table name with single quote

create schema qwe;
create schema rty;
CREATE TABLE qwe."fo'o"(i int);
INSERT INTO qwe."fo'o" VALUES (1), (2), (3);
gpbackup --backup-dir ~/gpbackup --debug --dbname gpadmin --with-stats --single-backup-dir --include-schema qwe
gprestore --timestamp 20240809084649 --debug --backup-dir ~/gpbackup --on-error-continue --redirect-db gpadmin --redirect-schema rty --include-schema qwe --with-stats
20240809:08:49:44 gprestore:gpadmin:gpdb6:060212-[DEBUG]:-Restore Command: [gprestore --timestamp 20240809084649 --debug --backup-dir /home/gpadmin/gpbackup --on-error-continue --redirect-db gpadmin --redirect-schema rty --include-schema qwe --with-stats]
20240809:08:49:44 gprestore:gpadmin:gpdb6:060212-[INFO]:-Restore Key = 20240809084649
20240809:08:49:44 gprestore:gpadmin:gpdb6:060212-[INFO]:-gpbackup version = 1.30.5_arenadata15+dev.17.g597f3e4b
20240809:08:49:44 gprestore:gpadmin:gpdb6:060212-[INFO]:-gprestore version = 1.30.5_arenadata15+dev.17.g597f3e4b
20240809:08:49:44 gprestore:gpadmin:gpdb6:060212-[INFO]:-Greenplum Database Version = 6.27.1_arenadata56+dev.36.g6ad72daf603 build dev
20240809:08:49:44 gprestore:gpadmin:gpdb6:060212-[DEBUG]:-Gathering information on backup directories
20240809:08:49:44 gprestore:gpadmin:gpdb6:060212-[DEBUG]:-Verifying backup directories exist
20240809:08:49:44 gprestore:gpadmin:gpdb6:060212-[DEBUG]:-Metadata will be restored from /home/gpadmin/gpbackup/backups/20240809/20240809084649/gpbackup_20240809084649_metadata.sql
20240809:08:49:44 gprestore:gpadmin:gpdb6:060212-[INFO]:-Restoring pre-data metadata
20240809:08:49:44 gprestore:gpadmin:gpdb6:060212-[DEBUG]:-Executing statement: CREATE FUNCTION rty.f() RETURNS void AS
$$begin


DELETE FROM pg_statistic WHERE false;
end;$$
LANGUAGE plpgsql NO SQL; on connection: 0
20240809:08:49:44 gprestore:gpadmin:gpdb6:060212-[DEBUG]:-Pre-data objects restored:  10% (1/6)
20240809:08:49:44 gprestore:gpadmin:gpdb6:060212-[DEBUG]:-Executing statement: ALTER FUNCTION rty.f() OWNER TO gpadmin; on connection: 0
20240809:08:49:44 gprestore:gpadmin:gpdb6:060212-[DEBUG]:-Pre-data objects restored:  30% (2/6)
20240809:08:49:44 gprestore:gpadmin:gpdb6:060212-[DEBUG]:-Executing statement: CREATE TABLE rty."fo'o" (
	i integer
) DISTRIBUTED BY (i); on connection: 0
20240809:08:49:44 gprestore:gpadmin:gpdb6:060212-[DEBUG]:-Pre-data objects restored:  50% (3/6)
20240809:08:49:44 gprestore:gpadmin:gpdb6:060212-[DEBUG]:-Executing statement: ALTER TABLE rty."fo'o" OWNER TO gpadmin; on connection: 0
20240809:08:49:44 gprestore:gpadmin:gpdb6:060212-[DEBUG]:-Pre-data objects restored:  60% (4/6)
20240809:08:49:44 gprestore:gpadmin:gpdb6:060212-[DEBUG]:-Executing statement: CREATE VIEW rty.my AS  SELECT count(*) AS count
   FROM pg_statistic; on connection: 0
20240809:08:49:44 gprestore:gpadmin:gpdb6:060212-[DEBUG]:-Pre-data objects restored:  80% (5/6)
20240809:08:49:44 gprestore:gpadmin:gpdb6:060212-[DEBUG]:-Executing statement: ALTER VIEW rty.my OWNER TO gpadmin; on connection: 0
20240809:08:49:44 gprestore:gpadmin:gpdb6:060212-[DEBUG]:-Pre-data objects restored:  100% (6/6)
20240809:08:49:44 gprestore:gpadmin:gpdb6:060212-[INFO]:-Pre-data metadata restore complete
20240809:08:49:44 gprestore:gpadmin:gpdb6:060212-[DEBUG]:-Verifying backup file count
20240809:08:49:44 gprestore:gpadmin:gpdb6:060212-[DEBUG]:-Executing statement: SET client_encoding = 'UTF8'; on connection: 0
20240809:08:49:44 gprestore:gpadmin:gpdb6:060212-[DEBUG]:-Executing statement: SET client_encoding = 'UTF8'; on connection: 0
20240809:08:49:44 gprestore:gpadmin:gpdb6:060212-[DEBUG]:-Restoring data for 1 tables from backup with timestamp: 20240809084649
20240809:08:49:44 gprestore:gpadmin:gpdb6:060212-[DEBUG]:-Executing statement: SET client_encoding = 'UTF8'; on connection: 0
20240809:08:49:44 gprestore:gpadmin:gpdb6:060212-[DEBUG]:-Executing statement: SET client_encoding = 'UTF8'; on connection: 0
20240809:08:49:44 gprestore:gpadmin:gpdb6:060212-[DEBUG]:-Reading from /home/gpadmin/gpbackup/backups/20240809/20240809084649/gpbackup_<SEGID>_20240809084649_16387.gz
20240809:08:49:44 gprestore:gpadmin:gpdb6:060212-[DEBUG]:-Executing "COPY rty."fo'o"(i) FROM PROGRAM 'cat /home/gpadmin/gpbackup/backups/20240809/20240809084649/gpbackup_<SEGID>_20240809084649_16387.gz | gzip -d -c' WITH CSV DELIMITER ',' ON SEGMENT;" on master
20240809:08:49:44 gprestore:gpadmin:gpdb6:060212-[DEBUG]:-Restored data to table rty."fo'o" from file (table 1 of 1)
20240809:08:49:44 gprestore:gpadmin:gpdb6:060212-[INFO]:-Data restore complete
20240809:08:49:44 gprestore:gpadmin:gpdb6:060212-[INFO]:-Restoring post-data metadata
20240809:08:49:44 gprestore:gpadmin:gpdb6:060212-[INFO]:-Post-data metadata restore complete
20240809:08:49:44 gprestore:gpadmin:gpdb6:060212-[INFO]:-Restoring query planner statistics from /home/gpadmin/gpbackup/backups/20240809/20240809084649/gpbackup_20240809084649_statistics.sql
20240809:08:49:44 gprestore:gpadmin:gpdb6:060212-[DEBUG]:-Executing statement: UPDATE pg_class
SET
	relpages = 2::int,
	reltuples = 3.000000::real
WHERE oid = 'qwe."fo''o"'::regclass::oid; on connection: 0
20240809:08:49:44 gprestore:gpadmin:gpdb6:060212-[DEBUG]:-Executing statement: DELETE FROM pg_statistic WHERE (starelid, staattnum) IN
	(SELECT attrelid, attnum FROM pg_attribute WHERE attrelid = 'qwe."fo''o"'::regclass::oid AND attname = 'i'); on connection: 0
20240809:08:49:44 gprestore:gpadmin:gpdb6:060212-[DEBUG]:-Executing statement: INSERT INTO pg_statistic SELECT
	attrelid,
	attnum,
	false::boolean,
	0.000000::real,
	4::integer,
	-1.000000::real,
	2::smallint,
	3::smallint,
	0::smallint,
	0::smallint,
	0::smallint,
	97::oid,
	97::oid,
	0::oid,
	0::oid,
	0::oid,
	NULL::real[],
	'{"-0.5"}'::real[],
	NULL::real[],
	NULL::real[],
	NULL::real[],
	array_in('{"1","2","3"}', 'int4'::regtype::oid, -1),
	NULL,
	NULL,
	NULL,
	NULL
FROM pg_attribute WHERE attrelid = 'qwe."fo''o"'::regclass::oid AND attname = 'i'; on connection: 0
20240809:08:49:44 gprestore:gpadmin:gpdb6:060212-[INFO]:-Query planner statistics restore complete
20240809:08:49:44 gprestore:gpadmin:gpdb6:060212-[INFO]:-Found neither /usr/local/bin/gp_email_contacts.yaml nor /home/gpadmin/gp_email_contacts.yaml
20240809:08:49:44 gprestore:gpadmin:gpdb6:060212-[INFO]:-Email containing gprestore report /home/gpadmin/gpbackup/backups/20240809/20240809084649/gprestore_20240809084649_20240809084944_report will not be sent
20240809:08:49:44 gprestore:gpadmin:gpdb6:060212-[INFO]:-Beginning cleanup
20240809:08:49:44 gprestore:gpadmin:gpdb6:060212-[INFO]:-Cleanup complete
20240809:08:49:44 gprestore:gpadmin:gpdb6:060212-[INFO]:-Restore completed successfully

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@Stolb27 Stolb27 enabled auto-merge (squash) August 16, 2024 06:48
@Stolb27 Stolb27 merged commit 21ba1a1 into master Aug 17, 2024
2 checks passed
@Stolb27 Stolb27 deleted the ADBDEV-5861 branch August 17, 2024 01:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants