Fixing Memory Leaks In Popular Python Libraries
I was recently able to make a minimal example that reproduced a Celery memory leak. The memory leak would happen on the main Celery worker process that’s forked to make child processes, which makes the leak especially bad. Issue #4843 has been around for 3+ years and has 140+ comments, so this one has been causing a lot of problems for Celery users for a while.
The memory leak has been causing a lot of issues at my work too, and I was able to get some help resolving the issue during a work hackathon. My coworker Michael Lazar was able to find the root cause of the issue and make a pull request to fix it in py-amqp (a celery dependency when using RabbitMQ as a broker). The code with the issue was 10 years old!
Here’s what the bug looks like:
try:
sock.shutdown(socket.SHUT_RDWR)
sock.close()
except OSError:
pass
The problem occurs when socket.shutdown
fails on an OSError
and doesn’t proceed to socket.close
to clean up the socket and allow garbage collection to release the memory used for it. The OSError
on shutdown
can occur when the remote side of the connection closes the connection first.
The fixed example (with separate try
/except
blocks):
try:
sock.shutdown(socket.SHUT_RDWR)
except OSError:
pass
try:
sock.close()
except OSError:
pass
I was able to make the same fix to a few other popular Python libraries too:
I also found another way to reduce memory usage of Connection
s in py-amqp and librabbitmq by changing how active channel IDs are stored.
Update 12/20: Hacker News user js2 pointed out that Python will automatically close the socket when all the references to the socket are gone.
Update 12/23: I got a pull request merged into Kombu with the same memory usage reduction fix I made to py-amqp and librabbitmq. I also opened another pull request to Kombu that should fix a memory leak issue when using Celery with Redis.
Update 12/25: I wrote a section for the Celery docs about optimizing memory usage. I also fixed another leak in Celery that happens when connection errors occur on a prefork worker.