Fix Openshift error: Wait for all control plane pods to become ready

Last Updated:

Issue

While installing Openshift with openshift-ansible, I get the following error running the playbooks:

FAILED - RETRYING: Wait for all control plane pods to become ready (1 retries left).

The playbook eventually fails as the control plane / the API server is not online.

Solution

This happens when the control plane container cannot be started for some reason.

Step 1: Check containers status on the Openshift master node

Login to your master node and check the status of all containers:

docker ps -a

Output:

CONTAINER ID        IMAGE                                  COMMAND                  CREATED              STATUS                            PORTS               NAMES
af79d92e8610        06b92f3b4d95                           "/bin/bash -c '#!/..."   About a minute ago   Exited (255) About a minute ago                       k8s_api_master-api-cluster.example.com_kube-system_56ffd2d31011e4ab6e2f

As we can see, the container has Exited and it’s not running.

Step 2: Check container logs

Replace the container ID with your own container ID

docker logs af79d92e8610

You will see either of the following messages:

failed to create listener: failed to listen on 0.0.0.0:8443: listen tcp4 0.0.0.0:8443: listen: address already in use

listen tcp4 0.0.0.0:8444: bind: address already in use

dial tcp 127.0.0.1:2379: connect: connection refused

The logs show that the port 8443 is already in use and therefore the container cannot run on the same port.

Step 3: Check what is running on the API server port

To check what ports are occupied run netstat.

netstat -ntpl

Output:

Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    

...

tcp        0      0 0.0.0.0:8443             0.0.0.0:*               LISTEN      1072/haproxy

We can see from the output that the port 8443 is occupied by HAProxy. The haproxy could be running after previous attempts on installing openshift and without doing a proper clean-up.

Step 4: Kill the process running on the API server port

I need to stop haproxy and make port 8443 available for the API server container.

systemctl stop haproxy

verify the process has stopped and the port is no more in use

systemctl status haproxy

We can run netstat to see if port 8443 is occupied

netstat -ntpl

No process should be using port 8443.

Step 5: Restart the API server container

We can delete the Exited API server container so it will be created again. I am gonna delete all my exited containers

docker rm $(docker ps -a -f status=exited -q)

The API server container should be running shortly.

Check the status of the API server container:

docker ps -a

The container should be UP.

Step 6: Retry Openshift Installation Playbooks

At this point, the control plane pod should be ready and you can resume Openshift installation without seeing this error.

Conclusion

The issue happens because the control plane port 8443 is being used by another process and therefore the control plane containers cannot be started. We fixed the issue by stopping the process which was blocking port 8443 and restarting the control plane containers.

RECENT POSTS

Get Ops Pro Tips in Your Inbox!