Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No such file or directory for /pgdata on postgres-startup with NFS #3955

Open
ranchodeluxe opened this issue Jul 8, 2024 · 9 comments
Open

Comments

@ranchodeluxe
Copy link

ranchodeluxe commented Jul 8, 2024

Overview

This postgres-startup container install script line is reporting that no such file or directory exists when trying to create /pg16 subdir.

However, the resulting pod spec after install mounts the NFS to /pgdata correctly and AFAICT the install script should handle the rest:

    volumeMounts:
    ...
    - mountPath: /pgdata
      name: postgres-data
volumes:
 ...
 - name: postgres-data
   persistentVolumeClaim:
      claimName: example-example-h42f-pgdata
click to expand the pod definition creationTimestamp: "2024-07-08T01:20:33Z" generateName: example-example-h42f- labels: apps.kubernetes.io/pod-index: "0" controller-revision-hash: example-example-h42f-59955fdd4c postgres-operator.crunchydata.com/cluster: example postgres-operator.crunchydata.com/data: postgres postgres-operator.crunchydata.com/instance: example-example-h42f postgres-operator.crunchydata.com/instance-set: example postgres-operator.crunchydata.com/patroni: example-ha statefulset.kubernetes.io/pod-name: example-example-h42f-0 name: example-example-h42f-0 namespace: default ownerReferences: - apiVersion: apps/v1 blockOwnerDeletion: true controller: true kind: StatefulSet name: example-example-h42f uid: b8e14fdb-562e-4f1c-a9de-7db9ea4c965c resourceVersion: "2425768" uid: 22d5cce2-fa01-4fa1-8dfe-cbac22f041b2 spec: containers: - command: - patroni - /etc/patroni env: - name: PGDATA value: /pgdata/pg16 - name: PGHOST value: /tmp/postgres - name: PGPORT value: "5432" - name: KRB5_CONFIG value: /etc/postgres/krb5.conf - name: KRB5RCACHEDIR value: /tmp - name: PATRONI_NAME valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.name - name: PATRONI_KUBERNETES_POD_IP valueFrom: fieldRef: apiVersion: v1 fieldPath: status.podIP - name: PATRONI_KUBERNETES_PORTS value: | - name: postgres port: 5432 protocol: TCP - name: PATRONI_POSTGRESQL_CONNECT_ADDRESS value: $(PATRONI_NAME).example-pods:5432 - name: PATRONI_POSTGRESQL_LISTEN value: '*:5432' - name: PATRONI_POSTGRESQL_CONFIG_DIR value: /pgdata/pg16 - name: PATRONI_POSTGRESQL_DATA_DIR value: /pgdata/pg16 - name: PATRONI_RESTAPI_CONNECT_ADDRESS value: $(PATRONI_NAME).example-pods:8008 - name: PATRONI_RESTAPI_LISTEN value: '*:8008' - name: PATRONICTL_CONFIG_FILE value: /etc/patroni - name: LD_PRELOAD value: /usr/lib64/libnss_wrapper.so - name: NSS_WRAPPER_PASSWD value: /tmp/nss_wrapper/postgres/passwd - name: NSS_WRAPPER_GROUP value: /tmp/nss_wrapper/postgres/group image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres-gis:ubi8-16.3-3.4-0 imagePullPolicy: IfNotPresent livenessProbe: failureThreshold: 3 httpGet: path: /liveness port: 8008 scheme: HTTPS initialDelaySeconds: 3 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 5 name: database ports: - containerPort: 5432 name: postgres protocol: TCP readinessProbe: failureThreshold: 3 httpGet: path: /readiness port: 8008 scheme: HTTPS initialDelaySeconds: 3 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 5 resources: {} securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL privileged: false readOnlyRootFilesystem: true runAsNonRoot: true terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /pgconf/tls name: cert-volume readOnly: true - mountPath: /pgdata name: postgres-data - mountPath: /etc/database-containerinfo name: database-containerinfo readOnly: true - mountPath: /etc/pgbackrest/conf.d name: pgbackrest-config readOnly: true - mountPath: /etc/patroni name: patroni-config readOnly: true - mountPath: /tmp name: tmp - mountPath: /dev/shm name: dshm - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: kube-api-access-xvwgp readOnly: true - command: - bash - -ceu - -- - |- monitor() { declare -r directory="/pgconf/tls" exec {fd}<> <(:) while read -r -t 5 -u "${fd}" || true; do if [ "${directory}" -nt "/proc/self/fd/${fd}" ] && install -D --mode=0600 -t "/tmp/replication" "${directory}"/{replication/tls.crt,replication/tls.key,replication/ca.crt} && pkill -HUP --exact --parent=1 postgres then exec {fd}>&- && exec {fd}<> <(:) stat --format='Loaded certificates dated %y' "${directory}" fi done }; export -f monitor; exec -a "$0" bash -ceu monitor - replication-cert-copy image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres-gis:ubi8-16.3-3.4-0 imagePullPolicy: IfNotPresent name: replication-cert-copy resources: {} securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL privileged: false readOnlyRootFilesystem: true runAsNonRoot: true terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /pgconf/tls name: cert-volume readOnly: true - mountPath: /tmp name: tmp - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: kube-api-access-xvwgp readOnly: true - command: - pgbackrest - server env: - name: LD_PRELOAD value: /usr/lib64/libnss_wrapper.so - name: NSS_WRAPPER_PASSWD value: /tmp/nss_wrapper/postgres/passwd - name: NSS_WRAPPER_GROUP value: /tmp/nss_wrapper/postgres/group image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:ubi8-2.51-0 imagePullPolicy: IfNotPresent livenessProbe: exec: command: - pgbackrest - server-ping failureThreshold: 3 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 name: pgbackrest resources: {} securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL privileged: false readOnlyRootFilesystem: true runAsNonRoot: true terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /etc/pgbackrest/server name: pgbackrest-server readOnly: true - mountPath: /pgdata name: postgres-data - mountPath: /etc/pgbackrest/conf.d name: pgbackrest-config readOnly: true - mountPath: /tmp name: tmp - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: kube-api-access-xvwgp readOnly: true - command: - bash - -ceu - -- - |- monitor() { exec {fd}<> <(:) until read -r -t 5 -u "${fd}"; do if [ "${filename}" -nt "/proc/self/fd/${fd}" ] && pkill -HUP --exact --parent=0 pgbackrest then exec {fd}>&- && exec {fd}<> <(:) stat --dereference --format='Loaded configuration dated %y' "${filename}" elif { [ "${directory}" -nt "/proc/self/fd/${fd}" ] || [ "${authority}" -nt "/proc/self/fd/${fd}" ] } && pkill -HUP --exact --parent=0 pgbackrest then exec {fd}>&- && exec {fd}<> <(:) stat --format='Loaded certificates dated %y' "${directory}" fi done }; export directory="$1" authority="$2" filename="$3"; export -f monitor; exec -a "$0" bash -ceu monitor - pgbackrest-config - /etc/pgbackrest/server - /etc/pgbackrest/conf.d/~postgres-operator/tls-ca.crt - /etc/pgbackrest/conf.d/~postgres-operator_server.conf image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:ubi8-2.51-0 imagePullPolicy: IfNotPresent name: pgbackrest-config resources: {} securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL privileged: false readOnlyRootFilesystem: true runAsNonRoot: true terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /etc/pgbackrest/server name: pgbackrest-server readOnly: true - mountPath: /etc/pgbackrest/conf.d name: pgbackrest-config readOnly: true - mountPath: /tmp name: tmp - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: kube-api-access-xvwgp readOnly: true dnsPolicy: ClusterFirst enableServiceLinks: false hostname: example-example-h42f-0 initContainers: - command: - bash - -ceu - -- - |- declare -r expected_major_version="$1" pgwal_directory="$2" pgbrLog_directory="$3" permissions() { while [[ -n "$1" ]]; do set "${1%/*}" "$@"; done; shift; stat -Lc '%A %4u %4g %n' "$@"; } halt() { local rc=$?; >&2 echo "$@"; exit "${rc/#0/1}"; } results() { printf '::postgres-operator: %s::%s\n' "$@"; } recreate() ( local tmp; tmp=$(mktemp -d -p "${1%/*}"); GLOBIGNORE='.:..'; set -x chmod "$2" "${tmp}"; mv "$1"/* "${tmp}"; rmdir "$1"; mv "${tmp}" "$1" ) safelink() ( local desired="$1" name="$2" current current=$(realpath "${name}") if [ "${current}" = "${desired}" ]; then return; fi set -x; mv --no-target-directory "${current}" "${desired}" ln --no-dereference --force --symbolic "${desired}" "${name}" ) echo Initializing ... results 'uid' "$(id -u)" 'gid' "$(id -G)" results 'postgres path' "$(command -v postgres)" results 'postgres version' "${postgres_version:=$(postgres --version)}" [[ "${postgres_version}" =~ ") ${expected_major_version}"($|[^0-9]) ]] || halt Expected PostgreSQL version "${expected_major_version}" results 'config directory' "${PGDATA:?}" postgres_data_directory=$([ -d "${PGDATA}" ] && postgres -C data_directory || echo "${PGDATA}") results 'data directory' "${postgres_data_directory}" [[ "${postgres_data_directory}" == "${PGDATA}" ]] || halt Expected matching config and data directories bootstrap_dir="${postgres_data_directory}_bootstrap" [ -d "${bootstrap_dir}" ] && results 'bootstrap directory' "${bootstrap_dir}" [ -d "${bootstrap_dir}" ] && postgres_data_directory="${bootstrap_dir}" if [[ ! -e "${postgres_data_directory}" || -O "${postgres_data_directory}" ]]; then install --directory --mode=0700 "${postgres_data_directory}" elif [[ -w "${postgres_data_directory}" && -g "${postgres_data_directory}" ]]; then recreate "${postgres_data_directory}" '0700' else (halt Permissions!); fi || halt "$(permissions "${postgres_data_directory}" ||:)" results 'pgBackRest log directory' "${pgbrLog_directory}" install --directory --mode=0775 "${pgbrLog_directory}" || halt "$(permissions "${pgbrLog_directory}" ||:)" install -D --mode=0600 -t "/tmp/replication" "/pgconf/tls/replication"/{tls.crt,tls.key,ca.crt}
  [ -f "${postgres_data_directory}/PG_VERSION" ] || exit 0
  results 'data version' "${postgres_data_version:=$(< "${postgres_data_directory}/PG_VERSION")}"
  [[ "${postgres_data_version}" == "${expected_major_version}" ]] ||
  halt Expected PostgreSQL data version "${expected_major_version}"
  [[ ! -f "${postgres_data_directory}/postgresql.conf" ]] &&
  touch "${postgres_data_directory}/postgresql.conf"
  safelink "${pgwal_directory}" "${postgres_data_directory}/pg_wal"
  results 'wal directory' "$(realpath "${postgres_data_directory}/pg_wal")"
  rm -f "${postgres_data_directory}/recovery.signal"
- startup
- "16"
- /pgdata/pg16_wal
- /pgdata/pgbackrest/log
env:
- name: PGDATA
  value: /pgdata/pg16
- name: PGHOST
  value: /tmp/postgres
- name: PGPORT
  value: "5432"
- name: KRB5_CONFIG
  value: /etc/postgres/krb5.conf
- name: KRB5RCACHEDIR
  value: /tmp
image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres-gis:ubi8-16.3-3.4-0
imagePullPolicy: IfNotPresent
name: postgres-startup
resources: {}
securityContext:
  allowPrivilegeEscalation: false
  capabilities:
    drop:
    - ALL
  privileged: false
  readOnlyRootFilesystem: true
  runAsNonRoot: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /pgconf/tls
  name: cert-volume
  readOnly: true
- mountPath: /pgdata
  name: postgres-data
- mountPath: /tmp
  name: tmp
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
  name: kube-api-access-xvwgp
  readOnly: true
  • command:
    • bash
    • -c
    • "export NSS_WRAPPER_SUBDIR=postgres CRUNCHY_NSS_USERNAME=postgres CRUNCHY_NSS_USER_DESC="postgres"
      \n# Define nss_wrapper directory and passwd & group files that will be utilized
      by nss_wrapper. The\n# nss_wrapper_env.sh script (which also sets these vars)
      isn't sourced here since the nss_wrapper\n# has not yet been setup, and we therefore
      don't yet want the nss_wrapper vars in the environment.\nmkdir -p /tmp/nss_wrapper\nchmod
      g+rwx /tmp/nss_wrapper\n\nNSS_WRAPPER_DIR="/tmp/nss_wrapper/${NSS_WRAPPER_SUBDIR}"\nNSS_WRAPPER_PASSWD="${NSS_WRAPPER_DIR}/passwd"\nNSS_WRAPPER_GROUP="${NSS_WRAPPER_DIR}/group"\n\n#
      create the nss_wrapper directory\nmkdir -p "${NSS_WRAPPER_DIR}"\n\n# grab
      the current user ID and group ID\nUSER_ID=$(id -u)\nexport USER_ID\nGROUP_ID=$(id
      -g)\nexport GROUP_ID\n\n# get copies of the passwd and group files\n[[ -f "${NSS_WRAPPER_PASSWD}"
      ]] || cp "/etc/passwd" "${NSS_WRAPPER_PASSWD}"\n[[ -f "${NSS_WRAPPER_GROUP}"
      ]] || cp "/etc/group" "${NSS_WRAPPER_GROUP}"\n\n# if the username is missing
      from the passwd file, then add it\nif [[ ! $(cat "${NSS_WRAPPER_PASSWD}")
      =~ ${CRUNCHY_NSS_USERNAME}:x:${USER_ID} ]]; then\n echo "nss_wrapper: adding
      user"\n passwd_tmp="${NSS_WRAPPER_DIR}/passwd_tmp"\n cp "${NSS_WRAPPER_PASSWD}"
      "${passwd_tmp}"\n sed -i "/${CRUNCHY_NSS_USERNAME}:x:/d" "${passwd_tmp}"\n
      \ # needed for OCP 4.x because crio updates /etc/passwd with an entry for
      USER_ID\n sed -i "/${USER_ID}:x:/d" "${passwd_tmp}"\n printf '${CRUNCHY_NSS_USERNAME}:x:${USER_ID}:${GROUP_ID}:${CRUNCHY_NSS_USER_DESC}:${HOME}:/bin/bash\n'

      "${passwd_tmp}"\n envsubst < "${passwd_tmp}" > "${NSS_WRAPPER_PASSWD}"\n
      \ rm "${passwd_tmp}"\nelse\n echo "nss_wrapper: user exists"\nfi\n\n#
      if the username (which will be the same as the group name) is missing from group
      file, then add it\nif [[ ! $(cat "${NSS_WRAPPER_GROUP}") =~ ${CRUNCHY_NSS_USERNAME}:x:${USER_ID}
      ]]; then\n echo "nss_wrapper: adding group"\n group_tmp="${NSS_WRAPPER_DIR}/group_tmp"\n
      \ cp "${NSS_WRAPPER_GROUP}" "${group_tmp}"\n sed -i "/${CRUNCHY_NSS_USERNAME}:x:/d"
      "${group_tmp}"\n printf '${CRUNCHY_NSS_USERNAME}:x:${USER_ID}:${CRUNCHY_NSS_USERNAME}\n'
      "${group_tmp}"\n envsubst < "${group_tmp}" > "${NSS_WRAPPER_GROUP}"\n
      \ rm "${group_tmp}"\nelse\n echo "nss_wrapper: group exists"\nfi\n\n#
      export the nss_wrapper env vars\n# define nss_wrapper directory and passwd &
      group files that will be utilized by nss_wrapper\nNSS_WRAPPER_DIR="/tmp/nss_wrapper/${NSS_WRAPPER_SUBDIR}"\nNSS_WRAPPER_PASSWD="${NSS_WRAPPER_DIR}/passwd"\nNSS_WRAPPER_GROUP="${NSS_WRAPPER_DIR}/group"\n\nexport
      LD_PRELOAD=/usr/lib64/libnss_wrapper.so\nexport NSS_WRAPPER_PASSWD="${NSS_WRAPPER_PASSWD}"\nexport
      NSS_WRAPPER_GROUP="${NSS_WRAPPER_GROUP}"\n\necho "nss_wrapper: environment
      configured"\n"
      image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres-gis:ubi8-16.3-3.4-0
      imagePullPolicy: IfNotPresent
      name: nss-wrapper-init
      resources: {}
      securityContext:
      allowPrivilegeEscalation: false
      capabilities:
      drop:

      • ALL
        privileged: false
        readOnlyRootFilesystem: true
        runAsNonRoot: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
    • mountPath: /tmp
      name: tmp
    • mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-xvwgp
      readOnly: true
      nodeName: ip-172-31-59-120.us-west-2.compute.internal
      preemptionPolicy: PreemptLowerPriority
      priority: 0
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
      fsGroup: 26
      fsGroupChangePolicy: OnRootMismatch
      serviceAccount: example-instance
      serviceAccountName: example-instance
      shareProcessNamespace: true
      subdomain: example-pods
      terminationGracePeriodSeconds: 30
      tolerations:
  • effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  • effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
    topologySpreadConstraints:
  • labelSelector:
    matchExpressions:
    • key: postgres-operator.crunchydata.com/data
      operator: In
      values:
      • postgres
      • pgbackrest
        matchLabels:
        postgres-operator.crunchydata.com/cluster: example
        maxSkew: 1
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: ScheduleAnyway
  • labelSelector:
    matchExpressions:
    • key: postgres-operator.crunchydata.com/data
      operator: In
      values:
      • postgres
      • pgbackrest
        matchLabels:
        postgres-operator.crunchydata.com/cluster: example
        maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: ScheduleAnyway
        volumes:
  • name: cert-volume
    projected:
    defaultMode: 384
    sources:
    • secret:
      items:
      • key: tls.crt
        path: tls.crt
      • key: tls.key
        path: tls.key
      • key: ca.crt
        path: ca.crt
        name: example-cluster-cert
    • secret:
      items:
      • key: tls.crt
        path: replication/tls.crt
      • key: tls.key
        path: replication/tls.key
      • key: ca.crt
        path: replication/ca.crt
        name: example-replication-cert
  • name: postgres-data
    persistentVolumeClaim:
    claimName: example-example-h42f-pgdata
  • downwardAPI:
    defaultMode: 420
    items:
    • path: cpu_limit
      "/tmp/example.yaml" 727L, 25055B

Environment

  • Platform: EKS
  • Platform Version: v1.27.13
  • PGO Image Tag: ubi8-16.3-3.4-0
  • Postgres Version: 16
  • Storage: NFS

Steps to Reproduce

  1. Spin up an EKS cluster (Terraform here )
  2. Create an EFS storage class (like this one)
  3. Install this chart

EXPECTED

The postgres-startup container completes without error and installs everything into /pgdata/pg16 correctly

ACTUAL

The postgres-startup container shows that it's not finding the bootstrap directory

$ kubectl logs -f pod/example-example-h42f-0 -c postgres-startup

Initializing ...
::postgres-operator: uid::26
::postgres-operator: gid::26
::postgres-operator: postgres path::/usr/pgsql-16/bin/postgres
::postgres-operator: postgres version::postgres (PostgreSQL) 16.3
::postgres-operator: config directory::/pgdata/pg16
::postgres-operator: data directory::/pgdata/pg16
install: cannot change permissions of ‘/pgdata/pg16’: No such file or directory
stat: cannot statx '/pgdata/pg16': No such file or directory
drwxr-xr-x    0    0 /pgdata

The pod definition below has a volumeMount entry for /pgdata so that should definitely be created:

click to expand the pod definition creationTimestamp: "2024-07-08T01:20:33Z" generateName: example-example-h42f- labels: apps.kubernetes.io/pod-index: "0" controller-revision-hash: example-example-h42f-59955fdd4c postgres-operator.crunchydata.com/cluster: example postgres-operator.crunchydata.com/data: postgres postgres-operator.crunchydata.com/instance: example-example-h42f postgres-operator.crunchydata.com/instance-set: example postgres-operator.crunchydata.com/patroni: example-ha statefulset.kubernetes.io/pod-name: example-example-h42f-0 name: example-example-h42f-0 namespace: default ownerReferences: - apiVersion: apps/v1 blockOwnerDeletion: true controller: true kind: StatefulSet name: example-example-h42f uid: b8e14fdb-562e-4f1c-a9de-7db9ea4c965c resourceVersion: "2425768" uid: 22d5cce2-fa01-4fa1-8dfe-cbac22f041b2 spec: containers: - command: - patroni - /etc/patroni env: - name: PGDATA value: /pgdata/pg16 - name: PGHOST value: /tmp/postgres - name: PGPORT value: "5432" - name: KRB5_CONFIG value: /etc/postgres/krb5.conf - name: KRB5RCACHEDIR value: /tmp - name: PATRONI_NAME valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.name - name: PATRONI_KUBERNETES_POD_IP valueFrom: fieldRef: apiVersion: v1 fieldPath: status.podIP - name: PATRONI_KUBERNETES_PORTS value: | - name: postgres port: 5432 protocol: TCP - name: PATRONI_POSTGRESQL_CONNECT_ADDRESS value: $(PATRONI_NAME).example-pods:5432 - name: PATRONI_POSTGRESQL_LISTEN value: '*:5432' - name: PATRONI_POSTGRESQL_CONFIG_DIR value: /pgdata/pg16 - name: PATRONI_POSTGRESQL_DATA_DIR value: /pgdata/pg16 - name: PATRONI_RESTAPI_CONNECT_ADDRESS value: $(PATRONI_NAME).example-pods:8008 - name: PATRONI_RESTAPI_LISTEN value: '*:8008' - name: PATRONICTL_CONFIG_FILE value: /etc/patroni - name: LD_PRELOAD value: /usr/lib64/libnss_wrapper.so - name: NSS_WRAPPER_PASSWD value: /tmp/nss_wrapper/postgres/passwd - name: NSS_WRAPPER_GROUP value: /tmp/nss_wrapper/postgres/group image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres-gis:ubi8-16.3-3.4-0 imagePullPolicy: IfNotPresent livenessProbe: failureThreshold: 3 httpGet: path: /liveness port: 8008 scheme: HTTPS initialDelaySeconds: 3 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 5 name: database ports: - containerPort: 5432 name: postgres protocol: TCP readinessProbe: failureThreshold: 3 httpGet: path: /readiness port: 8008 scheme: HTTPS initialDelaySeconds: 3 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 5 resources: {} securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL privileged: false readOnlyRootFilesystem: true runAsNonRoot: true terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /pgconf/tls name: cert-volume readOnly: true - mountPath: /pgdata name: postgres-data - mountPath: /etc/database-containerinfo name: database-containerinfo readOnly: true - mountPath: /etc/pgbackrest/conf.d name: pgbackrest-config readOnly: true - mountPath: /etc/patroni name: patroni-config readOnly: true - mountPath: /tmp name: tmp - mountPath: /dev/shm name: dshm - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: kube-api-access-xvwgp readOnly: true - command: - bash - -ceu - -- - |- monitor() { declare -r directory="/pgconf/tls" exec {fd}<> <(:) while read -r -t 5 -u "${fd}" || true; do if [ "${directory}" -nt "/proc/self/fd/${fd}" ] && install -D --mode=0600 -t "/tmp/replication" "${directory}"/{replication/tls.crt,replication/tls.key,replication/ca.crt} && pkill -HUP --exact --parent=1 postgres then exec {fd}>&- && exec {fd}<> <(:) stat --format='Loaded certificates dated %y' "${directory}" fi done }; export -f monitor; exec -a "$0" bash -ceu monitor - replication-cert-copy image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres-gis:ubi8-16.3-3.4-0 imagePullPolicy: IfNotPresent name: replication-cert-copy resources: {} securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL privileged: false readOnlyRootFilesystem: true runAsNonRoot: true terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /pgconf/tls name: cert-volume readOnly: true - mountPath: /tmp name: tmp - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: kube-api-access-xvwgp readOnly: true - command: - pgbackrest - server env: - name: LD_PRELOAD value: /usr/lib64/libnss_wrapper.so - name: NSS_WRAPPER_PASSWD value: /tmp/nss_wrapper/postgres/passwd - name: NSS_WRAPPER_GROUP value: /tmp/nss_wrapper/postgres/group image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:ubi8-2.51-0 imagePullPolicy: IfNotPresent livenessProbe: exec: command: - pgbackrest - server-ping failureThreshold: 3 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 name: pgbackrest resources: {} securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL privileged: false readOnlyRootFilesystem: true runAsNonRoot: true terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /etc/pgbackrest/server name: pgbackrest-server readOnly: true - mountPath: /pgdata name: postgres-data - mountPath: /etc/pgbackrest/conf.d name: pgbackrest-config readOnly: true - mountPath: /tmp name: tmp - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: kube-api-access-xvwgp readOnly: true - command: - bash - -ceu - -- - |- monitor() { exec {fd}<> <(:) until read -r -t 5 -u "${fd}"; do if [ "${filename}" -nt "/proc/self/fd/${fd}" ] && pkill -HUP --exact --parent=0 pgbackrest then exec {fd}>&- && exec {fd}<> <(:) stat --dereference --format='Loaded configuration dated %y' "${filename}" elif { [ "${directory}" -nt "/proc/self/fd/${fd}" ] || [ "${authority}" -nt "/proc/self/fd/${fd}" ] } && pkill -HUP --exact --parent=0 pgbackrest then exec {fd}>&- && exec {fd}<> <(:) stat --format='Loaded certificates dated %y' "${directory}" fi done }; export directory="$1" authority="$2" filename="$3"; export -f monitor; exec -a "$0" bash -ceu monitor - pgbackrest-config - /etc/pgbackrest/server - /etc/pgbackrest/conf.d/~postgres-operator/tls-ca.crt - /etc/pgbackrest/conf.d/~postgres-operator_server.conf image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:ubi8-2.51-0 imagePullPolicy: IfNotPresent name: pgbackrest-config resources: {} securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL privileged: false readOnlyRootFilesystem: true runAsNonRoot: true terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /etc/pgbackrest/server name: pgbackrest-server readOnly: true - mountPath: /etc/pgbackrest/conf.d name: pgbackrest-config readOnly: true - mountPath: /tmp name: tmp - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: kube-api-access-xvwgp readOnly: true dnsPolicy: ClusterFirst enableServiceLinks: false hostname: example-example-h42f-0 initContainers: - command: - bash - -ceu - -- - |- declare -r expected_major_version="$1" pgwal_directory="$2" pgbrLog_directory="$3" permissions() { while [[ -n "$1" ]]; do set "${1%/*}" "$@"; done; shift; stat -Lc '%A %4u %4g %n' "$@"; } halt() { local rc=$?; >&2 echo "$@"; exit "${rc/#0/1}"; } results() { printf '::postgres-operator: %s::%s\n' "$@"; } recreate() ( local tmp; tmp=$(mktemp -d -p "${1%/*}"); GLOBIGNORE='.:..'; set -x chmod "$2" "${tmp}"; mv "$1"/* "${tmp}"; rmdir "$1"; mv "${tmp}" "$1" ) safelink() ( local desired="$1" name="$2" current current=$(realpath "${name}") if [ "${current}" = "${desired}" ]; then return; fi set -x; mv --no-target-directory "${current}" "${desired}" ln --no-dereference --force --symbolic "${desired}" "${name}" ) echo Initializing ... results 'uid' "$(id -u)" 'gid' "$(id -G)" results 'postgres path' "$(command -v postgres)" results 'postgres version' "${postgres_version:=$(postgres --version)}" [[ "${postgres_version}" =~ ") ${expected_major_version}"($|[^0-9]) ]] || halt Expected PostgreSQL version "${expected_major_version}" results 'config directory' "${PGDATA:?}" postgres_data_directory=$([ -d "${PGDATA}" ] && postgres -C data_directory || echo "${PGDATA}") results 'data directory' "${postgres_data_directory}" [[ "${postgres_data_directory}" == "${PGDATA}" ]] || halt Expected matching config and data directories bootstrap_dir="${postgres_data_directory}_bootstrap" [ -d "${bootstrap_dir}" ] && results 'bootstrap directory' "${bootstrap_dir}" [ -d "${bootstrap_dir}" ] && postgres_data_directory="${bootstrap_dir}" if [[ ! -e "${postgres_data_directory}" || -O "${postgres_data_directory}" ]]; then install --directory --mode=0700 "${postgres_data_directory}" elif [[ -w "${postgres_data_directory}" && -g "${postgres_data_directory}" ]]; then recreate "${postgres_data_directory}" '0700' else (halt Permissions!); fi || halt "$(permissions "${postgres_data_directory}" ||:)" results 'pgBackRest log directory' "${pgbrLog_directory}" install --directory --mode=0775 "${pgbrLog_directory}" || halt "$(permissions "${pgbrLog_directory}" ||:)" install -D --mode=0600 -t "/tmp/replication" "/pgconf/tls/replication"/{tls.crt,tls.key,ca.crt}
  [ -f "${postgres_data_directory}/PG_VERSION" ] || exit 0
  results 'data version' "${postgres_data_version:=$(< "${postgres_data_directory}/PG_VERSION")}"
  [[ "${postgres_data_version}" == "${expected_major_version}" ]] ||
  halt Expected PostgreSQL data version "${expected_major_version}"
  [[ ! -f "${postgres_data_directory}/postgresql.conf" ]] &&
  touch "${postgres_data_directory}/postgresql.conf"
  safelink "${pgwal_directory}" "${postgres_data_directory}/pg_wal"
  results 'wal directory' "$(realpath "${postgres_data_directory}/pg_wal")"
  rm -f "${postgres_data_directory}/recovery.signal"
- startup
- "16"
- /pgdata/pg16_wal
- /pgdata/pgbackrest/log
env:
- name: PGDATA
  value: /pgdata/pg16
- name: PGHOST
  value: /tmp/postgres
- name: PGPORT
  value: "5432"
- name: KRB5_CONFIG
  value: /etc/postgres/krb5.conf
- name: KRB5RCACHEDIR
  value: /tmp
image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres-gis:ubi8-16.3-3.4-0
imagePullPolicy: IfNotPresent
name: postgres-startup
resources: {}
securityContext:
  allowPrivilegeEscalation: false
  capabilities:
    drop:
    - ALL
  privileged: false
  readOnlyRootFilesystem: true
  runAsNonRoot: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /pgconf/tls
  name: cert-volume
  readOnly: true
- mountPath: /pgdata
  name: postgres-data
- mountPath: /tmp
  name: tmp
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
  name: kube-api-access-xvwgp
  readOnly: true
  • command:
    • bash
    • -c
    • "export NSS_WRAPPER_SUBDIR=postgres CRUNCHY_NSS_USERNAME=postgres CRUNCHY_NSS_USER_DESC="postgres"
      \n# Define nss_wrapper directory and passwd & group files that will be utilized
      by nss_wrapper. The\n# nss_wrapper_env.sh script (which also sets these vars)
      isn't sourced here since the nss_wrapper\n# has not yet been setup, and we therefore
      don't yet want the nss_wrapper vars in the environment.\nmkdir -p /tmp/nss_wrapper\nchmod
      g+rwx /tmp/nss_wrapper\n\nNSS_WRAPPER_DIR="/tmp/nss_wrapper/${NSS_WRAPPER_SUBDIR}"\nNSS_WRAPPER_PASSWD="${NSS_WRAPPER_DIR}/passwd"\nNSS_WRAPPER_GROUP="${NSS_WRAPPER_DIR}/group"\n\n#
      create the nss_wrapper directory\nmkdir -p "${NSS_WRAPPER_DIR}"\n\n# grab
      the current user ID and group ID\nUSER_ID=$(id -u)\nexport USER_ID\nGROUP_ID=$(id
      -g)\nexport GROUP_ID\n\n# get copies of the passwd and group files\n[[ -f "${NSS_WRAPPER_PASSWD}"
      ]] || cp "/etc/passwd" "${NSS_WRAPPER_PASSWD}"\n[[ -f "${NSS_WRAPPER_GROUP}"
      ]] || cp "/etc/group" "${NSS_WRAPPER_GROUP}"\n\n# if the username is missing
      from the passwd file, then add it\nif [[ ! $(cat "${NSS_WRAPPER_PASSWD}")
      =~ ${CRUNCHY_NSS_USERNAME}:x:${USER_ID} ]]; then\n echo "nss_wrapper: adding
      user"\n passwd_tmp="${NSS_WRAPPER_DIR}/passwd_tmp"\n cp "${NSS_WRAPPER_PASSWD}"
      "${passwd_tmp}"\n sed -i "/${CRUNCHY_NSS_USERNAME}:x:/d" "${passwd_tmp}"\n
      \ # needed for OCP 4.x because crio updates /etc/passwd with an entry for
      USER_ID\n sed -i "/${USER_ID}:x:/d" "${passwd_tmp}"\n printf '${CRUNCHY_NSS_USERNAME}:x:${USER_ID}:${GROUP_ID}:${CRUNCHY_NSS_USER_DESC}:${HOME}:/bin/bash\n'

      "${passwd_tmp}"\n envsubst < "${passwd_tmp}" > "${NSS_WRAPPER_PASSWD}"\n
      \ rm "${passwd_tmp}"\nelse\n echo "nss_wrapper: user exists"\nfi\n\n#
      if the username (which will be the same as the group name) is missing from group
      file, then add it\nif [[ ! $(cat "${NSS_WRAPPER_GROUP}") =~ ${CRUNCHY_NSS_USERNAME}:x:${USER_ID}
      ]]; then\n echo "nss_wrapper: adding group"\n group_tmp="${NSS_WRAPPER_DIR}/group_tmp"\n
      \ cp "${NSS_WRAPPER_GROUP}" "${group_tmp}"\n sed -i "/${CRUNCHY_NSS_USERNAME}:x:/d"
      "${group_tmp}"\n printf '${CRUNCHY_NSS_USERNAME}:x:${USER_ID}:${CRUNCHY_NSS_USERNAME}\n'
      "${group_tmp}"\n envsubst < "${group_tmp}" > "${NSS_WRAPPER_GROUP}"\n
      \ rm "${group_tmp}"\nelse\n echo "nss_wrapper: group exists"\nfi\n\n#
      export the nss_wrapper env vars\n# define nss_wrapper directory and passwd &
      group files that will be utilized by nss_wrapper\nNSS_WRAPPER_DIR="/tmp/nss_wrapper/${NSS_WRAPPER_SUBDIR}"\nNSS_WRAPPER_PASSWD="${NSS_WRAPPER_DIR}/passwd"\nNSS_WRAPPER_GROUP="${NSS_WRAPPER_DIR}/group"\n\nexport
      LD_PRELOAD=/usr/lib64/libnss_wrapper.so\nexport NSS_WRAPPER_PASSWD="${NSS_WRAPPER_PASSWD}"\nexport
      NSS_WRAPPER_GROUP="${NSS_WRAPPER_GROUP}"\n\necho "nss_wrapper: environment
      configured"\n"
      image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres-gis:ubi8-16.3-3.4-0
      imagePullPolicy: IfNotPresent
      name: nss-wrapper-init
      resources: {}
      securityContext:
      allowPrivilegeEscalation: false
      capabilities:
      drop:

      • ALL
        privileged: false
        readOnlyRootFilesystem: true
        runAsNonRoot: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
    • mountPath: /tmp
      name: tmp
    • mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-xvwgp
      readOnly: true
      nodeName: ip-172-31-59-120.us-west-2.compute.internal
      preemptionPolicy: PreemptLowerPriority
      priority: 0
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
      fsGroup: 26
      fsGroupChangePolicy: OnRootMismatch
      serviceAccount: example-instance
      serviceAccountName: example-instance
      shareProcessNamespace: true
      subdomain: example-pods
      terminationGracePeriodSeconds: 30
      tolerations:
  • effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  • effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
    topologySpreadConstraints:
  • labelSelector:
    matchExpressions:
    • key: postgres-operator.crunchydata.com/data
      operator: In
      values:
      • postgres
      • pgbackrest
        matchLabels:
        postgres-operator.crunchydata.com/cluster: example
        maxSkew: 1
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: ScheduleAnyway
  • labelSelector:
    matchExpressions:
    • key: postgres-operator.crunchydata.com/data
      operator: In
      values:
      • postgres
      • pgbackrest
        matchLabels:
        postgres-operator.crunchydata.com/cluster: example
        maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: ScheduleAnyway
        volumes:
  • name: cert-volume
    projected:
    defaultMode: 384
    sources:
    • secret:
      items:
      • key: tls.crt
        path: tls.crt
      • key: tls.key
        path: tls.key
      • key: ca.crt
        path: ca.crt
        name: example-cluster-cert
    • secret:
      items:
      • key: tls.crt
        path: replication/tls.crt
      • key: tls.key
        path: replication/tls.key
      • key: ca.crt
        path: replication/ca.crt
        name: example-replication-cert
  • name: postgres-data
    persistentVolumeClaim:
    claimName: example-example-h42f-pgdata
  • downwardAPI:
    defaultMode: 420
    items:
    • path: cpu_limit
      "/tmp/example.yaml" 727L, 25055B

Logs

See above in "Actual"

Additional Information

The EKS EFS CSI Driver by default creates Access Points that restrict read/writes to specific UID and GID. But I'm not creating access points so EFS mounts can have containers chown|chmod to their heart's content

@ranchodeluxe
Copy link
Author

ranchodeluxe commented Jul 8, 2024

Well that was anti-climatic 😞

  1. The install executable's default error when it hits permission issues give us the dreaded "No such file or directory" which is VERY misleading

  2. the postgres-startup pod's fsGroup pod security setting doesn't help us here b/c of fsGroup securityContext does not apply to nfs mount kubernetes/examples#260 since the NFS mount is mounted as root 😞

i see I'm reliving past debugging with this issue. Is there no work around offered by PGO for this? Meaning, is there nowhere in our values to plumb through something that can set gid for that NFS mount?

@andrewlecuyer
Copy link
Collaborator

Hi @ranchodeluxe, have you tried setting PostgresCluster.spec.supplementalGroups? You can find this setting in the following section of the API reference.

https://access.crunchydata.com/documentation/postgres-operator/latest/references/crd/5.6.x/postgrescluster#postgresclusterspec

As described in the docs, this setting is often used to access shared file systems, such as NFS:

A list of group IDs applied to the process of a container. These can be useful when accessing shared file systems with constrained permissions. More info: https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context

@ranchodeluxe
Copy link
Author

ranchodeluxe commented Jul 8, 2024

Thanks @andrewlecuyer. Yeah, saw that but if the NFS is mounted as root it doesn't really help b/c we can't use 0:

spec.supplementalGroups[0] in body should be greater than or equal to 1

@ranchodeluxe
Copy link
Author

ranchodeluxe commented Jul 8, 2024

I'm not seeing options but they probably exist for me to tell PGO where the PGDATA dir should be? It seems spec.dataSource is used to only create jobs for "moving" data. If I can set PGDATA then I can spin up a pod that mounts the NFS and bootstraps a subdir with the proper perms for USER 26 like I do on other projects that do not use PGO

@ranchodeluxe
Copy link
Author

ranchodeluxe commented Jul 8, 2024

I'm not seeing options but they probably exist for me to tell PGO where the PGDATA dir should be?

Given that I know the containers are by default mounting the NFS PVC to /pgdata and looking for dirs pg16 and pg16_wal I assumed I could create an existing PVC and pod where I've bootstrapped those dirs to have the right perms and then force my instances to use that PVC via dataVolumeClaimSpec.dataSource but even that seems not to work as it tries to create a new claim

https://access.crunchydata.com/documentation/postgres-operator/latest/references/crd/5.6.x/postgrescluster#postgresclusterspecinstancesindexdatavolumeclaimspecdatasource

@ranchodeluxe
Copy link
Author

ranchodeluxe commented Jul 8, 2024

As @andrewlecuyer suggested, on AWS EFS the only solution right now (which isn't a solution for me for reasons I will mention later) is to use EFS Access Points and give them an explicit uid: 26 and gid: 26 and then everything works.

Unfortunately, I'm trying to provision PGO into an RKE cluster that has been set up for me and where I don't have control over which uid:gid the NFS mounts have.

If anyone has a work around above let me know. dataVolumeClaimSpec.dataSource should be the easiest way to make this happen but I feel like there's a bug there I will grok later on when I have time

@andrewlecuyer
Copy link
Collaborator

@ranchodeluxe are in an effort to reproduce/better-understand, can you provide a copy (e.g. via kubectl get sc -o yaml) of the exact storage class you are using in your RKE environment? I am especially curious about any settings/parameters for uid and/or gid (e.g. I'm assuming parameter such as uid and gid simply aren't set with the storage class you're testing with in EKS?).

Also, what version of the EFS CSI storage driver are you using?

@ranchodeluxe
Copy link
Author

ranchodeluxe commented Jul 11, 2024

Sorry for the late reply @andrewlecuyer

the exact storage class you are using in your RKE environment

Unfortunately I can't b/c I don't have access to the cluster. I only deploy things via ArgoCD into RKE. I do know the NFS export has *(rw,no_root_squash,no_subtree_check) so it "should" work but there are other problems preventing me from trying it out yet 😓

I'm assuming parameter such as uid and gid simply aren't set with the storage class you're testing with in EKS

Yes, it is linked above but here it is again. I don't like creating Access Points in EFS b/c they cause all sorts of issues. So the goal in EKS would be to not any uid:guid and use the static option talked about here (as this ticket was trying to do)

@tjmoore4
Copy link
Contributor

Hi @ranchodeluxe. After taking a look at the thread above, I'm curious if setting the supplemental group value to nobody (65534) might be an option in this case. Since you mentioned mounted NFS as root, but your directory permissions seem like they should be 777, maybe that would be sufficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants