Issue
While installing Openshift with openshift-ansible, I get the following error running the playbooks:
FAILED - RETRYING: Wait for all control plane pods to become ready (1 retries left).
The playbook eventually fails as the control plane / the API server is not online.
Solution
This happens when the control plane container cannot be started for some reason.
Step 1: Check containers status on the Openshift master node
Login to your master node and check the status of all containers:
docker ps -a
Output:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
af79d92e8610 06b92f3b4d95 "/bin/bash -c '#!/..." About a minute ago Exited (255) About a minute ago k8s_api_master-api-cluster.example.com_kube-system_56ffd2d31011e4ab6e2f
As we can see, the container has Exited and it’s not running.
Step 2: Check container logs
Replace the container ID with your own container ID
docker logs af79d92e8610
You will see either of the following messages:
failed to create listener: failed to listen on 0.0.0.0:8443: listen tcp4 0.0.0.0:8443: listen: address already in use
listen tcp4 0.0.0.0:8444: bind: address already in use
dial tcp 127.0.0.1:2379: connect: connection refused
The logs show that the port 8443 is already in use and therefore the container cannot run on the same port.
Step 3: Check what is running on the API server port
To check what ports are occupied run netstat.
netstat -ntpl
Output:
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
...
tcp 0 0 0.0.0.0:8443 0.0.0.0:* LISTEN 1072/haproxy
We can see from the output that the port 8443 is occupied by HAProxy. The haproxy could be running after previous attempts on installing openshift and without doing a proper clean-up.
Step 4: Kill the process running on the API server port
I need to stop haproxy and make port 8443 available for the API server container.
systemctl stop haproxy
verify the process has stopped and the port is no more in use
systemctl status haproxy
We can run netstat to see if port 8443 is occupied
netstat -ntpl
No process should be using port 8443.
Step 5: Restart the API server container
We can delete the Exited API server container so it will be created again. I am gonna delete all my exited containers
docker rm $(docker ps -a -f status=exited -q)
The API server container should be running shortly.
Check the status of the API server container:
docker ps -a
The container should be UP.
Step 6: Retry Openshift Installation Playbooks
At this point, the control plane pod should be ready and you can resume Openshift installation without seeing this error.
Conclusion
The issue happens because the control plane port 8443 is being used by another process and therefore the control plane containers cannot be started. We fixed the issue by stopping the process which was blocking port 8443 and restarting the control plane containers.