Passphrase Generation Revisited

One of my hobbies is to try out scripts in multiple programming languages, so I found myself dusting off the passphrase generation script. Mostly it’s an exercise in figuring out how languages do things differently. C++ loves iterators. Racket loves lists. Common-lisp makes some things easy and some things really hard. Python loves list comprehension. I’m still figuring out go. So I write the script, and then try to tweak it to find out which parts of the language are best optimized. Some lessons on this round.

The problem

  1. read a literary text file
  2. create a list of unique words
  3. randomly select 7 words from the list
  4. count the number of unique words.

Lesson 1: If you just want it done, language doesn’t matter.

Timing differences among various implementations ranged from 0.35 seconds (python) to 4 seconds (c++ no optimization). Not that big of a deal except as an object lesson in techniques.

Lesson 2: Python is very good.

My python implementation consistently was the fastest of the bunch. Some of that is due to use of highly optimized c-based functions and list comprehensions. Here is the passphrase creation script (Pastebin.com).

Lesson 3: Imperative design is sometimes faster in racket.

Declaring a variable and updating it with an anti-functional for loop is sometimes faster for this sort of thing, but not by a lot. Racket came in the middle of the pack.


(define (gen-wordlist)
  (define results (mutable-set))
  (for*
      ([line lines]
       [word (regexp-match* word-re line)])
    (let ((lcword (string-downcase word)))
      (when (not (set-member? results lcword))
        (set-add! results lcword))))
  results)
  

Lesson 4: C++ isn’t necessarily faster.

This one surprised me a bit. My C++ implementation read from stdin, and it was much slower than I expected (5 seconds). Using optimization flags -O2 brought it down to python speed on linux. Likely I could get that down faster, but I’m a novice at C++.

Lesson 5: Go concurrency isn’t as easy as it claims to be, and isn’t necessarily faster.

A single-threaded go script performed on par with racket. Trying to multi-thread it took half the night and ended up with a slower program. Again, I’m a novice at go.

Lesson 6: Using common lisp outside of the REPL is harder than it needs to be.

About half of the job was figuring out how to get the script to run outside of SLIME. Problem 1 was ensuring that the cl-ppcre regular expression library was loaded (sbcl used for the following):


;; load quicklisp and cl-ppcre
(eval-when (:compile-toplevel :load-toplevel :execute)
  (load "~/quicklisp/setup.lisp")
  (ql:quickload :cl-ppcre))

Problem 2, properly handle relative filenames from the command line:


(merge-pathnames ; expand relative paths. 
    (car (last sb-ext:*posix-argv*))
    *default-pathname-defaults*))

Problem 3, make it executable by saveing an executable core image

Problem 4, exit on error rather than launch an interactive debugger.


(defun lines (filename)
  "Get the contents of filename as a list of lines."
  (handler-case ;catch errors and quit if the filename is wrong.
      (with-open-file (in filename)
	(loop for line = (read-line in nil)
	   while line collect line))
    (error (err)
      (format t "~a~%" err)
      (sb-ext:exit))))

Problem 5, the sbcl RNG needs to be manually seeded with random data.

(setf *random-state* (seed-random-state t))

Conclusions (such as they are)

Generally speaking, I tend to lean on Racket and branch out into other languages from there. I think python generally is faster and more broadly available.

Advertisements