ZFS L2 ARC Bad Checksums

root@freenas:~ # zfs-stats -L
------------------------------------------------------------------------
ZFS Subsystem Report				Mon Aug 24 02:26:48 2020
------------------------------------------------------------------------
L2 ARC Summary:
	Low Memory Aborts:			17
	Bad Checksums:				5	[W]
	R/W Clashes:				3
	Free on Write:				43768

L2 ARC Size:
	Current Size: (Adaptive)		28928.13M
	Header Size:			0.07%	22.78M

L2 ARC Evicts:
	Lock Retries:				119
	Upon Reading:				85

L2 ARC Read/Write Activity:
	Bytes Written:				212515.69M
	Bytes Read:				862460.77M

L2 ARC Breakdown:
	Access Total:				21894550
	Hit Ratio:			33.57%	7351255
	Miss Ratio:			66.42%	14543295
	Feeds:					324164

	WRITES:
	  Sent Total:			100.00%	110713
------------------------------------------------------------------------

[W] means warning? zpool status reports all disk nominal and SMART status of the cache device is also nominal. Observing if “bad checksums" would cause concern…

Update 2020-09-01: No more bad checksums noted for L2ARC

Recovering from a failed AWX upgrade due to PostgresSQL error

After a failed attempt of upgrading AWX (I ran ansible-playbook install.yml from the Tower instance that is being upgraded), I was left with a broken AWX installation.

I prayed for luck and hope that AWX would come back after a reboot of the Docker host but was left with no dice.

A quick look at the Docker instances:

[root@tower ~]# docker ps
CONTAINER ID        IMAGE                COMMAND                  CREATED             STATUS                          PORTS                  NAMES
fa88d2ac67d9        ansible/awx:14.0.0   "tini -- /usr/bin/la…"   4 hours ago         Up 6 minutes                    8052/tcp               awx_task
a4ded729f4a5        ansible/awx:14.0.0   "tini -- /bin/sh -c …"   4 hours ago         Up 6 minutes                    0.0.0.0:80->8052/tcp   awx_web
0e3b3ebe8f85        postgres:10          "docker-entrypoint.s…"   4 hours ago         Restarting (1) 17 seconds ago                          awx_postgres
f9921bb23334        redis                "docker-entrypoint.s…"   4 months ago        Up 6 minutes                    6379/tcp               awx_redis

Looks like the PostgreSQL instance kept restarting for some reasons, checking its log:

[root@tower ~]# docker logs --tail 50 --follow --timestamps awx_postgres
2020-08-20T14:34:17.657201305Z The files belonging to this database system will be owned by user "postgres".
2020-08-20T14:34:17.657268491Z This user must also own the server process.
2020-08-20T14:34:17.657276380Z 
2020-08-20T14:34:17.657338660Z The database cluster will be initialized with locale "en_US.utf8".
2020-08-20T14:34:17.657377119Z The default database encoding has accordingly been set to "UTF8".
2020-08-20T14:34:17.657385751Z The default text search configuration will be set to "english".
2020-08-20T14:34:17.657393058Z 
2020-08-20T14:34:17.657399739Z Data page checksums are disabled.
2020-08-20T14:34:17.657459779Z initdb: directory "/var/lib/postgresql/data" exists but is not empty
2020-08-20T14:34:17.657469711Z If you want to create a new database system, either remove or empty
2020-08-20T14:34:17.657476954Z the directory "/var/lib/postgresql/data" or run initdb
2020-08-20T14:34:17.657483835Z with an argument other than "/var/lib/postgresql/data".
2020-08-20T14:34:17.657513335Z

PostgreSQL complains about the data directory not being empty (no it shouldn’t be empty since I’m only upgrading, right?), then I tried a few things each followed by reruning the installer:

  1. docker kill-ing the PostgreSQL instance
  2. Restoring to a backup of the data dir
  3. Emptying the data dir

None of these helped and I was still greeted by the same error above. Maybe pulling the PostgreSQL image again will help? IDK

[root@tower ~]# docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
ansible/awx         14.0.0              57b4b78d908d        13 days ago         1.31GB
postgres            10                  1fb929c54703        4 months ago        200MB
redis               latest              4cdbec704e47        4 months ago        98.2MB
ansible/awx_task    10.0.0              a968a1c4d9fd        4 months ago        2GB
ansible/awx_web     10.0.0              2cc33f01ffa7        4 months ago        1.96GB
memcached           alpine              a8da907c7f84        4 months ago        9.22MB
[root@tower ~]# docker rmi 1fb929c54703
Error response from daemon: conflict: unable to delete 1fb929c54703 (must be forced) - image is being used by stopped container 0e3b3ebe8f85

Sorry for my ignorance but I wasn’t aware that stopped container is a thing, deleting the stopped container and rerunning install.yml made it work again:

[root@tower ~]# docker ps --filter "status=exited"
CONTAINER ID        IMAGE                COMMAND                  CREATED             STATUS                       PORTS               NAMES
fa88d2ac67d9        ansible/awx:14.0.0   "tini -- /usr/bin/la…"   4 hours ago         Exited (137) 4 minutes ago                       awx_task
a4ded729f4a5        ansible/awx:14.0.0   "tini -- /bin/sh -c …"   4 hours ago         Exited (137) 3 minutes ago                       awx_web
0e3b3ebe8f85        postgres:10          "docker-entrypoint.s…"   4 hours ago         Exited (1) 4 minutes ago                         awx_postgres
382cc2164085        memcached:alpine     "docker-entrypoint.s…"   4 months ago        Exited (137) 3 hours ago                         awx_memcached
f9921bb23334        redis                "docker-entrypoint.s…"   4 months ago        Exited (137) 3 minutes ago                       awx_redis
[root@tower ~]# docker rm awx_postgres
awx_postgres