So Many HTML Parsers Suck

Published at 11:42 on 24 December 2019

Why? They ram a document tree down your throat, that’s why. So you’re stuck writing code that:

  • Consumes more memory, since you must load the entire document in memory at once, and
  • Makes modifying the content tricky, since traversing a document tree you are modifying is a potential minefield. (The alternative is to create an entire new document tree from the old one, which doubles the already sometimes obscene memory footprint.), and
  • Consumes more processor time, because multiple tree traversals are typically necessary.

Slow, bloated, error-prone: In a word, document trees just plain suck. Yes, sometimes they are necessary. That just means they should be a necessary alternative. They should never be the only way you can parse HTML.

Yet, with all too many HTML parsers, they are the only way. And that’s why so many HTML parsers suck.

The Shoes Start Dropping

Published at 22:09 on 16 December 2019

Today, Boeing announced that they will “temporarily” stop production of the 737 Max.

Note that I put “temporarily” in quotes. I predicted last April that the only lasting fix for the 737 Max will involve the scrap aluminum recycling industry, and I am sticking by that prediction. It may take an ill-considered recertification of that aircraft, followed by the loss of more lives, to seal its fate, however.

Why Do My Pictures Show up Sideways (And How Do I Fix Them)?

Published at 11:09 on 12 December 2019

The Root Cause

The root cause of the problem is that there’s a (relatively) new feature in image files from digital cameras which not all software supports. So an image can look just fine when you preview it (because that program supports the feature), yet when you upload it to the Web, suddenly it appears sideways (because many web browsers don’t)!

The Details

Modern cameras contain sensors that tell their on-board computers which way the camera is being held. When it captures an image, the camera records which way it was oriented (portrait or landscape) in the resulting file, but it always writes the image data itself in landscape (larger dimension horizontal) format.

It is considered the responsibility of any program that displays images to read the orientation information and use it to display the image properly, by rotating things if needed. Unfortunately, many web browsers in particular don’t read the orientation information; they simply assume that the horizontal dimension will always be horizontal (because, prior to the new feature, it was).

The Workaround

The workaround is to rotate the file if needed, so that the horizontal dimension of the image data is always the dimension that should display horizontally.

To do this, I use the free image-manipulation program GIMP. It can read the orientation information, and if it encounters a portrait-mode file, will always ask on reading it if it should be automatically rotated. Always answer no to this question! (This automatic rotation is the feature you want to get the image to display properly with without, after all.)

The result will, of course, be a file that displays sideways. Use the rotation options under the Image… Transform menu to fix the orientation. Then use File… Export As to re-save the result as a new file. The result will be a file that always displays correctly.

Corbyn is Toast

Published at 13:18 on 11 December 2019

I may be wrong (and I hope I am), but I see absolutely no evidence that Labour will prevail in the coming general election in the UK. The polls show that Labour has lost ground compared to how they polled prior to the previous election.

Yes, the pollsters botched the prediction of that one, and badly. It is, however, reasonable to assume that they have learned from their mistakes and adjusted their techniques. Remember, Labour is polling slightly worse than in the previous election, and Labour still lost that previous election. (The surprise in 2017 was that Labour barely lost an election that it was expected to lose by a landslide.)

All in all, it really doesn’t look like Jeremy Corbyn will manage to pull a rabbit out of his hat this time.

A Belated Post-Thanksgiving Check-In

Published at 11:00 on 8 December 2019

Not much to report recently save the somewhat frustrating experience I had on Thanksgiving. I was visiting some old friends in Seattle, and one of them, who works as a hydrologist, was having no end of trouble analyzing a batch of huge data files. The root of her troubles was that the software she was using was attempting to load the entire file in memory before operating on it.

That was highly frustrating for me to observe, because:

  1. All indications are that it was probably unnecessary to load the entire file into memory (i.e. it was possible to process it on a record-by-record basis).
  2. If so, I could easily correct the above problem.
  3. That their lack of computer expertise is causing this one project to be adversely impacted indicates that it’s unlikely to be the only such project; odds are this is merely the tip of an iceberg.
  4. I don’t work there, therefore I am not allowed to address such problems.
  5. I’ve been unable to convince anyone who does work there and who has the authority to hire me (either as a contractor, or as an employee) to so much as meet with me.