Fix PostgreSQL error PANIC: could not locate a valid checkpoint record

Issue

I'm running PostgreSQL on Kubernetes and my Postgres pod is returning the following error:

waiting for server to start....LOG:  redirecting log output to logging collector process
HINT:  Future log output will appear in directory "pg_log".
 stopped waiting
pg_ctl: could not start server
Examine the log output.

I needed to run a debug pod on my deployment to check the Postgres logs and dig deeper

oc debug deployment postgres --as-user=1000160000

I added --as-user to make sure I have access to the pg_log directory.

To get the user running PostgreSQL, run:

oc get pod psql-xxxx -o yaml | grep runAsUser

Once the debug pod is launched, look for the PostgreSQL data directory and you'll find the pg_log directory in there. Looking at the relevant log file, I could see:

PANIC:  could not locate a valid checkpoint record
LOG:  startup process (PID 29) was terminated by signal 6: Aborted
LOG:  aborting startup due to startup process failure
LOG:  database system is shut down
LOG:  database system was shut down at 2021-12-30 01:09:26 UTC
LOG:  invalid resource manager ID 48 at 4C/85350F10
LOG:  invalid primary checkpoint record
LOG:  invalid resource manager ID in secondary checkpoint record

Solution

On the Postgres debug pod we launched in the previous section, we need to reset the transaction log for Postgres. Run the following from the debug pod's terminal:

/usr/bin/pg_resetxlog -f /path/to/pg/data/directory

Output:

Transaction log reset

Jump out of the debug pod.

Terminate the existing pod and start a new pod for PostgreSQL DB. I'm only running one replica.

oc scale deployment postgres --replicas=0 
oc scale deployment postgres --replicas=1

The Kubernetes Postgres pod should come to the 'Running' state at this point.

RECENT POSTS

Table of Contents