Today, I came across a bug where a Celery worker wasn’t gracefully shutting down, and it was causing some odd “Connection Refused” errors from requests within the task being run by the worker. It was also shutting down before it could send errors to Rollbar/Sentry for the team to know they need to address it.
This was happening because the entrypoint had a script that effectively did this:
echo "starting celery worker"
celery -A tasks worker
I was recently troubleshooting some packet loss with ping on linux, and I noticed by default ping won’t explicitly show lost packets:
$ ping x.x.x.x
64 bytes from x.x.x.x: icmp_seq=8 ttl=52 time=18.1 ms
64 bytes from x.x.x.x: icmp_seq=11 ttl=52 time=21.2 ms
(notice the skipped 8-10, those were lost packets)
To fix this, you can add run ping with -O:
$ ping -O x.x.x.x
64 bytes from x.x.x.x: icmp_seq=11 ttl=52 time=19.0 ms
no answer yet for icmp_seq=12
no answer yet for icmp_seq=13
64 bytes from x.x.x.x: icmp_seq=14 ttl=52 time=18.9 ms
To improve this output, try adding -I (traceroute -I) to make it use ICMP ECHO instead of UDP for probes.
If you want better statistics about packet loss, you can also use a tool called mtr which combines traceroute and ping. If you’re on Ubuntu, you can install this with sudo apt-get install mtr.
If you have an old computer with spare hard drives, it might be useful to use it to share files with computers on your network by turning it into a NAS (network attached storage). There are several popular DIY NAS options:
I ended up going with OpenMediaVault due to it being Debian based and super popular. To set it up, I followed this incredibly thorough guide: link
I had a few issues that caused it to stop working after restarting.
The first issue was related to the disk being encrypted. OpenMediaVault started in emergency mode due to being unable to access the encrypted disk. Emergency mode unfortunately doesn’t allow SSH access, which means you need to plug a monitor and keyboard into it. You have to make sure you have nofail in the options section of the lines of /etc/fstab starting with /dev/disk. You also need to add nonempty and remove x-systemd.requires=<disk> from the options section of the lines starting with /srv/dev-disk-by-uuid (for mergerfs). I’ve had to redo the removing x-systemd.requires part every time I apply settings from the web UI. More information about this: 1234567
The second issue involved losing network adapter configuration after restarting. I had to run omv-firstaid and configure the network adapter with a static IP to resolve this. This may have something to do with an empty configuration from the web UI ovewriting the existing working configuration.
I recently learned that docker will ignore ufw (uncomplicated firewall) rules by default. This means that it will still expose ports that are blocked by ufw.
The fix involved adding this to /etc/docker/daemon.json:
{
"iptables": false,
"ip6tables": false
}
Then I restarted the docker daemon with sudo systemctl restart docker.
I recently had a lot of trouble getting my Raspberry Pi Zero W 2 to connect to WiFi on my Ubiquiti Unifi AP AC Lite after a firmware upgrade to 5.43.46. It’s a 2.4Ghz-only device, and actually all of my 2.4Ghz-only devices wouldn’t connect.
Apparently something I did set a setting called PMF (Protected Management Frames) to “Required”, and the Pi Zero W 2 doesn’t support that. In theory, this setting should help prevent clients from getting disconnected through de-auth packets from an attacker. However, not all devices support this.
To fix the issue, you need to find the PMF setting under Settings -> Wifi -> <your network> -> Advanced -> Security and change it to “Optional”.
I was recently able to make a minimal example that reproduced a Celery memory leak. The memory leak would happen on the main Celery worker process that’s forked to make child processes, which makes the leak especially bad. Issue #4843 has been around for 3+ years and has 140+ comments, so this one has been causing a lot of problems for Celery users for a while.
The memory leak has been causing a lot of issues at my work too, and I was able to get some help resolving the issue during a work hackathon. My coworker Michael Lazar was able to find the root cause of the issue and make a pull request to fix it in py-amqp (a celery dependency when using RabbitMQ as a broker). The code with the issue was 10 years old!
The problem occurs when socket.shutdown fails on an OSError and doesn’t proceed to socket.close to clean up the socket and allow garbage collection to release the memory used for it. The OSError on shutdown can occur when the remote side of the connection closes the connection first.
The fixed example (with separate try/except blocks):
I also found another way to reduce memory usage of Connections in py-amqp and librabbitmq by changing how active channel IDs are stored.
Update 12/20: Hacker News user js2 pointed out that Python will automatically close the socket when all the references to the socket are gone.
Update 12/23: I got a pull request merged into Kombu with the same memory usage reduction fix I made to py-amqp and librabbitmq. I also opened another pull request to Kombu that should fix a memory leak issue when using Celery with Redis.
Update 12/25: I wrote a section for the Celery docs about optimizing memory usage. I also fixed another leak in Celery that happens when connection errors occur on a prefork worker.