I was dinking around with mutating word lists to make password dictionaries this morning, and ran into a little hitch. Using John the Ripper, you can generate word lists using its mutation rules. It definitely works, as follows:

063014 09:38 :) josh $ wc -l words.txt
29016 words.txt
063014 09:38 :) josh $ john --wordlist=words.txt --rules --stdout > foo.txt
words: 1427525  time: 0:00:00:00 DONE (Fri Jul  4 04:47:53 2014)  w/s: 4758K  current: Zygoting
063014 09:38 :) josh $ wc -l foo.txt
1427525 foo.txt

So an expansion from 29k words to 1.4M words was fine… but there were some obvious ones missing. For example, the famous XKCD base password was missing:

063014 09:38 :) josh $ grep -i troubador words.txt
063014 09:38 :) josh $ grep Tr0ub4dor foo.txt
063014 09:38 :( (1) josh $

Now, I believe that it’s not too hard to modify John’s rules and configure your own. But I never take that path. After all, a day without coding is like a day without sunshine. So I wrote up a few scripts to do my own l33t mutations of dictionary words, and started with the code bumming. I did some timed runs, decided I wasn’t happy with the speed, etc., and then I thought maybe my code was a little long…

I had started with Ruby, but when I decided I wanted it faster I turned to one of my favorite languages, Haskell. It compiles to machine code, and should be about an order of magnitude faster than Ruby. And once I started optimizing, I started pulling out all of the weird little tricks you get with Haskell, and ended up with the following:

import Data.Char
import Data.List

leets = ["oO0*","aA4@","lL1!","eE3" ,"tT7" ,"iI1!","sS5"]

leetchar c []     = nub [c, toUpper c, toLower c]
leetchar c (l:ls) = if c `elem` l then l else leetchar c ls

leet []     = return []
leet (x:xs) = do
  p <- leetchar x leets
  n <- leet xs
  return $ p : n

main = do
  cs <- getContents
  mapM_ putStrLn (concatMap leet (lines cs))

Which works something like this:

$ echo password | ./leetspeak | head -n 10

And I think this was one of the most beautiful programs I have ever written. I’ve used the list monad before, but there’s something special about writing the imperative form of a program several times, then writing a nice functional one, and then having that dawning realization that what I’m trying to optimize is monadic in shape.

You see, the challenge in this program is not necessarily generating the different options for the characters. That’s easy – what’s a bit harder is generating all of the possible combinations of those options. A typical imperative solution keeps some sort of list of the words that have been generated so far and recursively works through each character in the word. This was the general pattern of my first solutions, but the code was so long.

The list monad in Haskell essentially abstracts away all of the boilerplate of that recursive function, and turns the “cartesian product” part of this calculation into a very concise function. My “leet” function from above:

leet []     = return []
leet (x:xs) = do
  p <- leetchar x leets
  n <- leet xs
  return $ p : n

The “bind” operator (the left arrow “<-“ in the code above) basically makes the rest of the function into a loop, without having to specify all the loop details. And by making it recursive, I don’t even have to think about how long the word is (so no keeping track of indices and lengths!). By the time you get to the “return $ p : n” line, the program is at the bottom of a nested loop.

And what’s cool about the whole program (being as short as it is) is that GHC, when it compiles it, makes a very performant little resulting program. It takes me about half a minute to generate a gigabyte of mutated passwords. And due to lazy evaluation, it never takes more than a couple megabytes of memory (which is mostly Haskell space, not data space) (note that I’m also mutating case on all characters as well).

$ ./leetspeak < words.txt > dictionary.txt
$ wc -l dictionary.txt
 102395717 dictionary.txt

It’s not significant to “security”, but it’s at least a little reminder that there is always a best tool for any given job, and when you stumble onto the “right” solution it’s very satisfying.