This is going to be a very geeky post, but the bug in question just bit me and I am not aware of anyone else having written about it. Worse, the buggy code is actually recommended in the official Python documentation, which claims that the library call:
pid = os.spawnlp(os.P_NOWAIT, "/bin/mycmd", "mycmd", "myarg")
Can be replaced with:
pid = subprocess.Popen(["/bin/mycmd", "myarg"]).pid
It can’t. Not unless you want your child processes to mysteriously disappear on you without calls to os.wait() reflecting they’ve completed, that is.
The problem is that the suggested code immediately creates an unreferenced subprocess.Popen object, and this class declares a destructor (i.e. a __del__ method) which automatically reaps exited child processes at GC time. So the code in question creates a race condition as to which code will call os.wait() first: yours, or the destructor.
Arguably, subprocess.Popen should have an option to disable this feature (which is actually the correct behavior if you’re going to hang on to the Popen object and use it to manage the child process). Until such a time the workaround is to do something like:
class PopenNoDel(subprocess.Popen): """ A Popen object that never gratuitously reaps dead children. """ def __del__(self, **kwargs): pass pid = PopenNoDel(["/bin/mycmd", "myarg"]).pid
Thankfully, it didn’t cause me much lost time. I had thought of the recommended code myself, then rejected the idea because of worries about gratuitous process reaping at GC time, and only changed my mind about the idea when the Python manual itself endorsed it. So the cause was fresh in my mind when my child processes started mysteriously vanishing.
Keywords: os.spawn, subprocess.Popen, wait, reap, garbage collector, subprocess, disappear, bug.