Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PGUpgrade failing #3999

Open
5 tasks
todeb opened this issue Sep 19, 2024 · 3 comments
Open
5 tasks

PGUpgrade failing #3999

todeb opened this issue Sep 19, 2024 · 3 comments

Comments

@todeb
Copy link

todeb commented Sep 19, 2024

Please ensure you do the following when reporting a bug:

  • Provide a concise description of what the bug is.
  • Provide information about your environment.
  • Provide clear steps to reproduce the bug.
  • Attach applicable logs. Please do not attach screenshots showing logs unless you are unable to copy and paste the log data.
  • Ensure any code / output examples are properly formatted for legibility.

Note that some logs needed to troubleshoot may be found in the /pgdata/<CLUSTERNAME>/pg_log directory on your Postgres instance.

An incomplete bug report can lead to delays in resolving the issue or the closing of a ticket, so please be as detailed as possible.

If you are looking for general support, please view the support page for where you can ask questions.

Thanks for reporting the issue, we're looking forward to helping you!

Overview

Pgupgrade fail when upgrading from postgres 13 to 16.
Images used:
crunchy-postgres:ubi8-13.8-1
crunchy-postgres:ubi8-16.3-1

Environment

Please provide the following details:

  • Platform: (Kubernetes, OpenShift, Rancher, GKE, EKS, AKS etc.) Kubernetes
  • Platform Version: (e.g. 1.20.3, 4.7.0) v1.25.16
  • PGO Image Tag: (e.g. ubi8-5.x.y-0) ubi8-5.6.0-0
  • Postgres Version (e.g. 15) 13 -> 16
  • Storage: (e.g. hostpath, nfs, or the name of your storage class) nfs

Steps to Reproduce

  1. create a cluster with: crunchy-postgres:ubi8-13.8-1
  2. follow doc for upgrading: https://access.crunchydata.com/documentation/postgres-operator/latest/guides/major-postgres-version-upgrade
  3. pgupgrade failed log:
Performing PostgreSQL upgrade from version "13" to "16" ...

Step 1: Making new pgdata directory...

Step 2: Initializing new pgdata directory...

The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.utf-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are enabled.

fixing permissions on existing directory /pgdata/pg16 ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
sh: line 1:    29 Bus error               (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=100 -c shared_buffers=1000 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
sh: line 1:    31 Bus error               (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=50 -c shared_buffers=500 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
sh: line 1:    33 Bus error               (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=40 -c shared_buffers=400 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
sh: line 1:    35 Bus error               (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=30 -c shared_buffers=300 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
sh: line 1:    37 Bus error               (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=20 -c shared_buffers=200 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
selecting default max_connections ... 20
sh: line 1:    39 Bus error               (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=20 -c shared_buffers=16384 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
sh: line 1:    41 Bus error               (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=20 -c shared_buffers=8192 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
sh: line 1:    43 Bus error               (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=20 -c shared_buffers=4096 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
sh: line 1:    45 Bus error               (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=20 -c shared_buffers=3584 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
sh: line 1:    47 Bus error               (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=20 -c shared_buffers=3072 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
sh: line 1:    49 Bus error               (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=20 -c shared_buffers=2560 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
sh: line 1:    51 Bus error               (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=20 -c shared_buffers=2048 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
sh: line 1:    53 Bus error               (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=20 -c shared_buffers=1536 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
sh: line 1:    55 Bus error               (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=20 -c shared_buffers=1000 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
sh: line 1:    57 Bus error               (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=20 -c shared_buffers=900 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
sh: line 1:    59 Bus error               (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=20 -c shared_buffers=800 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
sh: line 1:    61 Bus error               (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=20 -c shared_buffers=700 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
sh: line 1:    63 Bus error               (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=20 -c shared_buffers=600 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
sh: line 1:    65 Bus error               (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=20 -c shared_buffers=500 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
sh: line 1:    67 Bus error               (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=20 -c shared_buffers=400 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
sh: line 1:    69 Bus error               (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=20 -c shared_buffers=300 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
sh: line 1:    71 Bus error               (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=20 -c shared_buffers=200 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
sh: line 1:    73 Bus error               (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=20 -c shared_buffers=100 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
sh: line 1:    75 Bus error               (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=20 -c shared_buffers=50 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
selecting default shared_buffers ... 400kB
selecting default time zone ... UTC
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok


Success. You can now start the database server using:

    /usr/pgsql-16/bin/pg_ctl -D /pgdata/pg16 -l logfile start

initdb: warning: enabling "trust" authentication for local connections
initdb: hint: You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb.

Step 3: Setting the expected permissions on the old pgdata directory...

Step 4: Copying shared_preload_libraries setting to new postgresql.conf file...

upgrade: line 18: /usr/pgsql-13/bin/postgres: No such file or directory
Step 5: Running pg_upgrade check...


check for "/usr/pgsql-13/bin" failed: No such file or directory
Failure, exiting

real    0m0.039s
user    0m0.004s
sys     0m0.008s

New postgres 16 cannot be started log:

k logs pgsql-ngt-test-pgha1-54bb-0 -n database-services
Defaulted container "database" out of: database, replication-cert-copy, postgres-startup (init), nss-wrapper-init (init)
2024-09-19 10:52:08,477 INFO: No PostgreSQL configuration items changed, nothing to reload.
2024-09-19 10:52:08,629 WARNING: Postgresql is not running.
2024-09-19 10:52:08,629 INFO: Lock owner: None; I am pgsql-ngt-test-pgha1-54bb-0
2024-09-19 10:52:08,637 INFO: pg_controldata:
  pg_control version number: 1300
  Catalog version number: 202307071
  Database system identifier: 7416304783840694348
  Database cluster state: shut down
  pg_control last modified: Thu Sep 19 10:50:25 2024
  Latest checkpoint location: 0/186DFB0
  Latest checkpoint's REDO location: 0/186DFB0
  Latest checkpoint's REDO WAL file: 000000010000000000000001
  Latest checkpoint's TimeLineID: 1
  Latest checkpoint's PrevTimeLineID: 1
  Latest checkpoint's full_page_writes: on
  Latest checkpoint's NextXID: 0:730
  Latest checkpoint's NextOID: 13545
  Latest checkpoint's NextMultiXactId: 1
  Latest checkpoint's NextMultiOffset: 0
  Latest checkpoint's oldestXID: 722
  Latest checkpoint's oldestXID's DB: 1
  Latest checkpoint's oldestActiveXID: 0
  Latest checkpoint's oldestMultiXid: 1
  Latest checkpoint's oldestMulti's DB: 1
  Latest checkpoint's oldestCommitTsXid: 0
  Latest checkpoint's newestCommitTsXid: 0
  Time of latest checkpoint: Thu Sep 19 10:50:25 2024
  Fake LSN counter for unlogged rels: 0/3E8
  Minimum recovery ending location: 0/0
  Min recovery ending loc's timeline: 0
  Backup start location: 0/0
  Backup end location: 0/0
  End-of-backup record required: no
  wal_level setting: replica
  wal_log_hints setting: off
  max_connections setting: 20
  max_worker_processes setting: 8
  max_wal_senders setting: 10
  max_prepared_xacts setting: 0
  max_locks_per_xact setting: 64
  track_commit_timestamp setting: off
  Maximum data alignment: 8
  Database block size: 8192
  Blocks per segment of large relation: 131072
  WAL block size: 8192
  Bytes per WAL segment: 16777216
  Maximum length of identifiers: 64
  Maximum columns in an index: 32
  Maximum size of a TOAST chunk: 1996
  Size of a large-object chunk: 2048
  Date/time type storage: 64-bit integers
  Float8 argument passing: by value
  Data page checksum version: 1
  Mock authentication nonce: 2a28ce363859f2cc621fe10e6b20e7038c6578ba4329e42f3d081523c1b6cd12

2024-09-19 10:52:08,640 INFO: Lock owner: None; I am pgsql-ngt-test-pgha1-54bb-0
2024-09-19 10:52:08,803 INFO: starting as a secondary
2024-09-19 10:52:09,889 INFO: postmaster pid=105
/tmp/postgres:5432 - no response
2024-09-19 10:52:09.920 UTC [105] LOG:  redirecting log output to logging collector process
2024-09-19 10:52:09.920 UTC [105] HINT:  Future log output will appear in directory "log".
2024-09-19 10:52:10,906 ERROR: postmaster is not running
2024-09-19 10:52:18,514 INFO: Lock owner: None; I am pgsql-ngt-test-pgha1-54bb-0
2024-09-19 10:52:18,641 INFO: failed to start postgres
2024-09-19 10:52:28,515 WARNING: Postgresql is not running.
2024-09-19 10:52:28,515 INFO: Lock owner: None; I am pgsql-ngt-test-pgha1-54bb-0
2024-09-19 10:52:28,525 INFO: pg_controldata:
  pg_control version number: 1300
  Catalog version number: 202307071
  Database system identifier: 7416304783840694348
  Database cluster state: shut down
  pg_control last modified: Thu Sep 19 10:50:25 2024
  Latest checkpoint location: 0/186DFB0
  Latest checkpoint's REDO location: 0/186DFB0
  Latest checkpoint's REDO WAL file: 000000010000000000000001
  Latest checkpoint's TimeLineID: 1
  Latest checkpoint's PrevTimeLineID: 1
  Latest checkpoint's full_page_writes: on
  Latest checkpoint's NextXID: 0:730
  Latest checkpoint's NextOID: 13545
  Latest checkpoint's NextMultiXactId: 1
  Latest checkpoint's NextMultiOffset: 0
  Latest checkpoint's oldestXID: 722
  Latest checkpoint's oldestXID's DB: 1
  Latest checkpoint's oldestActiveXID: 0
  Latest checkpoint's oldestMultiXid: 1
  Latest checkpoint's oldestMulti's DB: 1
  Latest checkpoint's oldestCommitTsXid: 0
  Latest checkpoint's newestCommitTsXid: 0
  Time of latest checkpoint: Thu Sep 19 10:50:25 2024
  Fake LSN counter for unlogged rels: 0/3E8
  Minimum recovery ending location: 0/0
  Min recovery ending loc's timeline: 0
  Backup start location: 0/0
  Backup end location: 0/0
  End-of-backup record required: no
  wal_level setting: replica
  wal_log_hints setting: off
  max_connections setting: 20
  max_worker_processes setting: 8
  max_wal_senders setting: 10
  max_prepared_xacts setting: 0
  max_locks_per_xact setting: 64
  track_commit_timestamp setting: off
  Maximum data alignment: 8
  Database block size: 8192
  Blocks per segment of large relation: 131072
  WAL block size: 8192
  Bytes per WAL segment: 16777216
  Maximum length of identifiers: 64
  Maximum columns in an index: 32
  Maximum size of a TOAST chunk: 1996
  Size of a large-object chunk: 2048
  Date/time type storage: 64-bit integers
  Float8 argument passing: by value
  Data page checksum version: 1
  Mock authentication nonce: 2a28ce363859f2cc621fe10e6b20e7038c6578ba4329e42f3d081523c1b6cd12

2024-09-19 10:52:28,526 INFO: Lock owner: None; I am pgsql-ngt-test-pgha1-54bb-0
2024-09-19 10:52:28,527 INFO: starting as a secondary
2024-09-19 10:52:29,565 INFO: postmaster pid=176
2024-09-19 10:52:29.573 UTC [176] LOG:  redirecting log output to logging collector process
2024-09-19 10:52:29.573 UTC [176] HINT:  Future log output will appear in directory "log".
/tmp/postgres:5432 - no response
2024-09-19 10:52:30,585 ERROR: postmaster is not running
2024-09-19 10:52:38,514 INFO: Lock owner: None; I am pgsql-ngt-test-pgha1-54bb-0
2024-09-19 10:52:38,515 INFO: failed to start postgres

When running with PG13 image it also refuses now to start, but was healthy before upgrade:

Defaulted container "database" out of: database, replication-cert-copy, postgres-startup (init), nss-wrapper-init (init)
2024-09-19 10:54:23,103 INFO: No PostgreSQL configuration items changed, nothing to reload.
2024-09-19 10:54:23,118 CRITICAL: system ID mismatch, node pgsql-ngt-test-pgha1-54bb-0 belongs to a different cluster: 7416304783840694348 != 7174435655098040412

REPRO

above

EXPECTED

pgupgrade complete without issues

ACTUAL

pgupgrade failed, and postgres cluster not starting at all with new version and with the old one.

Logs

Above

Additional Information

na

@todeb
Copy link
Author

todeb commented Sep 19, 2024

for some additional info database with postgresql13 was initialized on host without hugepages, now it is running on host with hugepages enabled.
When i scheduled the pgupgrade on host without hugepages there are no bus errors anymore although still upgrade fail.

Performing PostgreSQL upgrade from version "13" to "16" ...

Step 1: Making new pgdata directory...

Step 2: Initializing new pgdata directory...

The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.utf-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are enabled.

fixing permissions on existing directory /pgdata/pg16 ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... UTC
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
initdb: warning: enabling "trust" authentication for local connections
initdb: hint: You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb.
syncing data to disk ... ok


Success. You can now start the database server using:

    /usr/pgsql-16/bin/pg_ctl -D /pgdata/pg16 -l logfile start


Step 3: Setting the expected permissions on the old pgdata directory...

Step 4: Copying shared_preload_libraries setting to new postgresql.conf file...

upgrade: line 18: /usr/pgsql-13/bin/postgres: No such file or directory
Step 5: Running pg_upgrade check...


check for "/usr/pgsql-13/bin" failed: No such file or directory
Failure, exiting

real    0m0.070s
user    0m0.000s
sys     0m0.009s

@todeb
Copy link
Author

todeb commented Sep 19, 2024

even running clean cluster on host without hugepages and trying pgupgrade on it also running pgupgrade on non hugapes host. It just fail. Same log as above.

@todeb
Copy link
Author

todeb commented Sep 19, 2024

i was able to perform uprade using crunchy-upgrade:ubi8-5.5.0-0
but on 5.6.0 and 5.6.1 upgrade fails.

So I think it is still something that should be addressed. Also these bus errors when running upgrade on host with hugepages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant