After a failed attempt of upgrading AWX (I ran ansible-playbook install.yml from the Tower instance that is being upgraded), I was left with a broken AWX installation.
I prayed for luck and hope that AWX would come back after a reboot of the Docker host but was left with no dice.
A quick look at the Docker instances:
[root@tower ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
fa88d2ac67d9 ansible/awx:14.0.0 "tini -- /usr/bin/la…" 4 hours ago Up 6 minutes 8052/tcp awx_task
a4ded729f4a5 ansible/awx:14.0.0 "tini -- /bin/sh -c …" 4 hours ago Up 6 minutes 0.0.0.0:80->8052/tcp awx_web
0e3b3ebe8f85 postgres:10 "docker-entrypoint.s…" 4 hours ago Restarting (1) 17 seconds ago awx_postgres
f9921bb23334 redis "docker-entrypoint.s…" 4 months ago Up 6 minutes 6379/tcp awx_redis
Looks like the PostgreSQL instance kept restarting for some reasons, checking its log:
[root@tower ~]# docker logs --tail 50 --follow --timestamps awx_postgres
2020-08-20T14:34:17.657201305Z The files belonging to this database system will be owned by user "postgres".
2020-08-20T14:34:17.657268491Z This user must also own the server process.
2020-08-20T14:34:17.657276380Z
2020-08-20T14:34:17.657338660Z The database cluster will be initialized with locale "en_US.utf8".
2020-08-20T14:34:17.657377119Z The default database encoding has accordingly been set to "UTF8".
2020-08-20T14:34:17.657385751Z The default text search configuration will be set to "english".
2020-08-20T14:34:17.657393058Z
2020-08-20T14:34:17.657399739Z Data page checksums are disabled.
2020-08-20T14:34:17.657459779Z initdb: directory "/var/lib/postgresql/data" exists but is not empty
2020-08-20T14:34:17.657469711Z If you want to create a new database system, either remove or empty
2020-08-20T14:34:17.657476954Z the directory "/var/lib/postgresql/data" or run initdb
2020-08-20T14:34:17.657483835Z with an argument other than "/var/lib/postgresql/data".
2020-08-20T14:34:17.657513335Z
PostgreSQL complains about the data directory not being empty (no it shouldn’t be empty since I’m only upgrading, right?), then I tried a few things each followed by reruning the installer:
- docker kill-ing the PostgreSQL instance
- Restoring to a backup of the data dir
- Emptying the data dir
None of these helped and I was still greeted by the same error above. Maybe pulling the PostgreSQL image again will help? IDK
[root@tower ~]# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
ansible/awx 14.0.0 57b4b78d908d 13 days ago 1.31GB
postgres 10 1fb929c54703 4 months ago 200MB
redis latest 4cdbec704e47 4 months ago 98.2MB
ansible/awx_task 10.0.0 a968a1c4d9fd 4 months ago 2GB
ansible/awx_web 10.0.0 2cc33f01ffa7 4 months ago 1.96GB
memcached alpine a8da907c7f84 4 months ago 9.22MB
[root@tower ~]# docker rmi 1fb929c54703
Error response from daemon: conflict: unable to delete 1fb929c54703 (must be forced) - image is being used by stopped container 0e3b3ebe8f85
Sorry for my ignorance but I wasn’t aware that stopped container is a thing, deleting the stopped container and rerunning install.yml made it work again:
[root@tower ~]# docker ps --filter "status=exited"
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
fa88d2ac67d9 ansible/awx:14.0.0 "tini -- /usr/bin/la…" 4 hours ago Exited (137) 4 minutes ago awx_task
a4ded729f4a5 ansible/awx:14.0.0 "tini -- /bin/sh -c …" 4 hours ago Exited (137) 3 minutes ago awx_web
0e3b3ebe8f85 postgres:10 "docker-entrypoint.s…" 4 hours ago Exited (1) 4 minutes ago awx_postgres
382cc2164085 memcached:alpine "docker-entrypoint.s…" 4 months ago Exited (137) 3 hours ago awx_memcached
f9921bb23334 redis "docker-entrypoint.s…" 4 months ago Exited (137) 3 minutes ago awx_redis
[root@tower ~]# docker rm awx_postgres
awx_postgres