Software Quality (or Lack Thereof)

Published at 16:22 on 8 July 2015

For my paid work, I maintain a program which runs for a long time (essentially, indefinitely) making millions of socket calls per day and doing extensive amounts of text parsing (it’s a web crawler).

What impresses me is how often problems in my code are not really problems in my code: they’re problems in some library that my code calls. One time it was even a problem in the system libraries; socket I/O would work fine for a day or two, then some time from two days to a week in, socket calls would simply and mysteriously hang. Another repeated source of headaches was the LXML library, which tended to cause me all sorts of issues with memory leaks and indefinite looping and recursion.

This is in the open source (Linux) world, so it underscores a general lack of thorough testing. I consider it unacceptable that a program which makes about 2 million socket calls per day will fail due to a library bug after about 10 million calls on average. One should be able to make an indefinite amount of system calls (absent system quotas and limits, of course).

But apparently I’m somewhat unusual in having high standards like that. LXML has a (totally undeserved, in my opinion) reputation for robustness, and that faulty system library made it into a major CentOS release.

Or maybe I’m being unreasonable in expecting that a program which runs for an hour without running into issues should run for a day, a week, or a month without being cut down by memory leaks in the code it calls. (I assume it was a slow memory or other resource leak in the socket call case; it presents itself as a classic symptom of such.)

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.