I’m running PostgreSQL on Kubernetes and my Postgres pod is returning the following error:
waiting for server to start....LOG: redirecting log output to logging collector process HINT: Future log output will appear in directory "pg_log". stopped waiting pg_ctl: could not start server Examine the log output.
I needed to run a debug pod on my deployment to check the Postgres logs and dig deeper
oc debug deployment postgres --as-user=1000160000
I added –as-user to make sure I have access to the pg_log directory.
To get the user running PostgreSQL, run:
oc get pod psql-xxxx -o yaml | grep runAsUser
Once the debug pod is launched, look for the PostgreSQL data directory and you’ll find the pg_log directory in there. Looking at the relevant log file, I could see:
PANIC: could not locate a valid checkpoint record LOG: startup process (PID 29) was terminated by signal 6: Aborted LOG: aborting startup due to startup process failure LOG: database system is shut down LOG: database system was shut down at 2021-12-30 01:09:26 UTC LOG: invalid resource manager ID 48 at 4C/85350F10 LOG: invalid primary checkpoint record LOG: invalid resource manager ID in secondary checkpoint record
On the Postgres debug pod we launched in the previous section, we need to reset the transaction log for Postgres. Run the following from the debug pod’s terminal:
/usr/bin/pg_resetxlog -f /path/to/pg/data/directory
Transaction log reset
Jump out of the debug pod.
Terminate the existing pod and start a new pod for PostgreSQL DB. I’m only running one replica.
oc scale deployment postgres --replicas=0 oc scale deployment postgres --replicas=1
The Kubernetes Postgres pod should come to the ‘Running’ state at this point.