Mr. Penumbra's 24 hour book store

My normal readers should note that what follows is incredibly nerdy; part book review and part cryptanalysis, and no pretty pictures. You have been warned. I read "Mr. Penumbra's 24 Hour Book Store":http://www.robinsloan.com/penumbra/ on the plane to SF, and it was a wonderful book. I devoured it, really; it's got great pacing and mystery and all the things I like in a book. The characters are engaging, and I found myself rooting for the protagonist and the quest. Except... One of the plot points of the book is a code that remains unbroken for ~500 years. It's called a substitution cypher in the book, and without revealing too many details, I just couldn't believe that any substitution cypher would remain unbroken for that long. It was a bit of a let down. So, I did what any programmer would do: I built the thing. My somewhat incomplete implementation of cypher is "here":https://github.com/mattmills/penumbra-code. Having built it, it's a bit more clever than I thought. Because there are only 26 letters (less in latin, but I don't have any latin text to work from), and because there are a lot of glyphs (I stopped myself at 88 latin symbols that might be in a typeset), you can actually make english look like noise with this code, more or less. Once you know what you're up against, though, it is breakable. I haven't coded it yet, but there's at least a brute force method, and maybe something more clever is possible. It sort of depends on how much noise there is in the signal; minimally, you need 80 glyphs to make a flat distribution over a large text, but with a larger character set you could have random noise inserted to confuse the signal. The more glyphs in the typeface, the more noisy you can make it. If you had, say, 200 glyphs, and only 80 of them meaningful, then that would confuse codebreakers pretty well. If anyone has suggestions, I'd love to hear them. Pull requests too. I have a transatlantic flight on Tuesday, when I plan on getting this made into a gem and maybe even hacking together a little sinatra-based web form that will do encoding and decoding.

Posted by Matt on 2012-10-21 14:35:18 +0200