Passphrase Generation Revisited

One of my hobbies is to try out scripts in multiple programming languages, so I found myself dusting off the passphrase generation script. Mostly it’s an exercise in figuring out how languages do things differently. C++ loves iterators. Racket loves lists. Common-lisp makes some things easy and some things really hard. Python loves list comprehension. I’m still figuring out go. So I write the script, and then try to tweak it to find out which parts of the language are best optimized. Some lessons on this round.

The problem

  1. read a literary text file
  2. create a list of unique words
  3. randomly select 7 words from the list
  4. count the number of unique words.

Lesson 1: If you just want it done, language doesn’t matter.

Timing differences among various implementations ranged from 0.35 seconds (python) to 4 seconds (c++ no optimization). Not that big of a deal except as an object lesson in techniques.

Lesson 2: Python is very good.

My python implementation consistently was the fastest of the bunch. Some of that is due to use of highly optimized c-based functions and list comprehensions. Here is the passphrase creation script (Pastebin.com).

Lesson 3: Imperative design is sometimes faster in racket.

Declaring a variable and updating it with an anti-functional for loop is sometimes faster for this sort of thing, but not by a lot. Racket came in the middle of the pack.


(define (gen-wordlist)
  (define results (mutable-set))
  (for*
      ([line lines]
       [word (regexp-match* word-re line)])
    (let ((lcword (string-downcase word)))
      (when (not (set-member? results lcword))
        (set-add! results lcword))))
  results)
  

Lesson 4: C++ isn’t necessarily faster.

This one surprised me a bit. My C++ implementation read from stdin, and it was much slower than I expected (5 seconds). Using optimization flags -O2 brought it down to python speed on linux. Likely I could get that down faster, but I’m a novice at C++.

Lesson 5: Go concurrency isn’t as easy as it claims to be, and isn’t necessarily faster.

A single-threaded go script performed on par with racket. Trying to multi-thread it took half the night and ended up with a slower program. Again, I’m a novice at go.

Lesson 6: Using common lisp outside of the REPL is harder than it needs to be.

About half of the job was figuring out how to get the script to run outside of SLIME. Problem 1 was ensuring that the cl-ppcre regular expression library was loaded (sbcl used for the following):


;; load quicklisp and cl-ppcre
(eval-when (:compile-toplevel :load-toplevel :execute)
  (load "~/quicklisp/setup.lisp")
  (ql:quickload :cl-ppcre))

Problem 2, properly handle relative filenames from the command line:


(merge-pathnames ; expand relative paths. 
    (car (last sb-ext:*posix-argv*))
    *default-pathname-defaults*))

Problem 3, make it executable by saveing an executable core image

Problem 4, exit on error rather than launch an interactive debugger.


(defun lines (filename)
  "Get the contents of filename as a list of lines."
  (handler-case ;catch errors and quit if the filename is wrong.
      (with-open-file (in filename)
	(loop for line = (read-line in nil)
	   while line collect line))
    (error (err)
      (format t "~a~%" err)
      (sb-ext:exit))))

Problem 5, the sbcl RNG needs to be manually seeded with random data.

(setf *random-state* (seed-random-state t))

Conclusions (such as they are)

Generally speaking, I tend to lean on Racket and branch out into other languages from there. I think python generally is faster and more broadly available.

Largest Remainder

Largest Possible Remainder problem in racket:

;;lr = largest remainder so far lrd = divisor with largest remainder
(define (largest-remainder n d (lr 0) (lrd 0)) 
  (cond
    [(= d 0) (values lr lrd)] ;;catch cases like n = 6, d = 3
    [(> lr d) (values lr lrd)] ;;stop when d is < largest remainder
    [(> (modulo n d) lr)
     (largest-remainder n (sub1 d) (modulo n d) d)] ;;save new largest remainder
    [else 
     (largest-remainder n (sub1 d) lr lrd)])) ;;iterate with d-1

Common Lisp translation:

(defun largest-remainder (n d &optional (lr 0) (lrd 0))
  (cond
    ((<= d 0) (values lr lrd))
    ((< d lr) (values lr lrd))
    (t (let ((r (rem n d)))
	 (if (> r lr)
	     (largest-remainder n (1- d) r d)
	     (largest-remainder n (1- d) lr lrd))))))

Iterative in python.


def largest_remainder(n,d):
    lr = 0
    lrd = 0
    while (d >= lr) and (d > 0):
        if (n % d > lr):
            lr = n % d
            lrd = d
        d -= 1
    return (lr,lrd)

print largest_remainder(20,10)
print largest_remainder(6,3)

Limited Median Problem

Another Programming Praxis problem. The problem is nicely constrained by limiting it to 8-bit numbers.

#lang racket

(define (limited-median xs)
  (define xsv (make-vector 256 0))
  (for ([x xs])
    (vector-set! xsv x (+ 1 (vector-ref xsv x))))
  (define half-limit (/ (length xs) 2))
  (define (median low high low-count high-count)
    (cond
      [(< low-count half-limit) 
       (median (add1 low) high (+ low-count (vector-ref xsv low)) high-count)]
      [(< high-count half-limit)
       (median low (sub1 high) low-count (+ high-count (vector-ref xsv high)))]
      [else
       (/ (+ low high) 2)]))
  (median 0 255 0 0))

Edit: C++

#include <iostream>
#include <array>
#include <vector>
//#include <cstdio>
//#include <cmath>
#include <stdlib.h>     /* srand, rand */
#include <time.h>       /* time */

//linux: clang++ -std=c++11 -g -Wall limited_median.cpp -o limited_median

float limited_median (std::vector<int> numbers){
  std::array<int,256> counting_array;
  counting_array.fill(0);
  int count = numbers.size();
  for (int const& i : numbers) { counting_array[i]++; }
  int right = 255;
  int right_count = 0;
  int left = 0;
  int left_count = 0;
  int half_limit = count / 2;
  while (left_count < half_limit)
    {
      left_count += counting_array[left];
      left ++;
    }
  while (right_count < half_limit)
    {
      right_count += counting_array[right];
      right --;
    }
  return (left + right)/ 2.0;
  
}



int main(int argc, char* argv[]){
  std::vector<int> test1 {2,4,5,7,3,6,1};
  std::vector<int> test2 {5,2,1,6,3,4};
  
  std::cout << limited_median(test1) << "\n";
  std::cout << limited_median(test2) << "\n";

  
  //test with 5000 random numbers  
  srand(time(NULL));
  std::vector<int> test3;
  for (int i = 0; i < 5000; i++)
    {
      test3.push_back((rand() % 256));
    }
  std::cout << limited_median(test3) << "\n";
  return 0;
}

(divisors z) → (Listof Natural)
z : Integer
Returns a list of all positive divisors of the integer z. The divisors appear in ascending order.

Another reason why Racket rocks, because optimizing trial-division for large numbers.

It makes solving the Fermat Exponent problem easy:

http://pastebin.com/embed_js.php?i=4MR6D39a

Apparently, there was a period of time in Beethoven’s transition from classical formalist to bombastic romanticist (possibly accompanied by severe hearing loss and disillusionment with Napoleon) when musicians said, (translated from 19th-century German), “What the heck, Beethoven! This isn’t music! My kids could write this!”

Elsewhere: