Sunday, 30 December 2007

New feed

I've created a new feed that contains only Lisp-related posts. You can find the new feed here.

Compiling Common Lisp

When a language is pre-compiled (like C or C++) there is a step of compiling and linking the source code into a form that can be executed directly on the target hardware. The assembling of source into an executable form is usually handled by a build system like SCons or Make.

The CPython implementation of the Python language uses an interpreter. There is no compile step involved (it is implicitly compiled to bytecode).

Common Lisp implementations are somewhere in between in that there is a compile step but you can still use them as interpreters. For example, SBCL creates these files called FASLs which seem to stand stand for "FASt Loading". Their use should be self-explanatory. The formats are implementation-dependent meaning that they are not portable between implementations (not like .class files for Java, for example.) I couldn't figure out what SBCL stores in it's fasls but my guess is that it is a bytecode.

While developing a library, it is useful to have an interpreter-like environment. However, when you are using a library, you don't want to load and recompile the source every single time. What you want to do is compile the library's files into FASL format and have your implementation load them when necessary.

That is where ASDF comes in. ASDF, which stands for "Another System Definition Facility, is a way to define your projects so that they can be compiled and loaded along with whatever they depend on, in the right order. The rest of this post covers what I did to get a project and a testing system setup. Installing ASDF is out of scope here but if you use SBCL, you already have it.

First, the file system layout that I tend to use:

sohail@dev:~$ cd project/
sohail@dev:~/project$ find . -not -iname "*.fasl" -and -not -iname "*~"

ASDF looks for .asd (a system definition?) files in whatever paths are defined in the list asdf:*central-registry*. So the first thing you want to do is add the above path to the list. The way I did it (which is not optimal) is I modified ~/.sbclrc to include the following lines:

(require :asdf)
(push #p"/home/sohail/project/" asdf:*central-registry*)

Open up src/package.lisp in Emacs and enter the following:

(defpackage project
(:use common-lisp)
(:export some-function))

This uses the standard defpackage macro to define a Common Lisp package. A package is a mechanism to map symbols to names. The above exports a symbol called some-function. Next, open up src/code.lisp and enter the following:

(in-package #:project)

(defun some-function (a)
(format *standard-output* "a is: ~A" a))

This uses the standard in-package macro to set the current package to be project. If you are using Slime, you can try loading code.lisp (C-c C-l) and if you haven't loaded package.lisp, you should get a message telling you that project does not designate a package. To make it work, load package.lisp and then code.lisp. In a large project, it would be impossible to remember which order to load things in.

The next thing to do is to make this loadable via ASDF. To do this, open up project.asd and enter the following:

;;; Define a package to define our system definition
(defpackage #:project-asd
(:use :cl :asdf)) ;; Use ASDF (obviously!)

;;; All our definitions
(in-package project-asd)

;;; Our system is called project
(defsystem #:project
:name "My project!"
:components ((:module "src"
:components ((:file "package")
(:file "code"
:depends-on ("package"))))))

We document that code.lisp depends on package.lisp. Another way to write the defsystem could have been:

(defsystem #:project
:name "My project!"
:components ((:file "src/package")
(:file "src/code" :depends-on ("src/package"))))

I prefer using a module because if you have multiple modules that have dependencies, the dependencies are easier to define. For example, you might have a "model" module that depends on the "database" module.

Now go to the REPL and type (asdf:oos 'asdf:load-op #:project). Assuming you set up asdf:*central-registry* as above, you should get output like this:

; compiling file "/home/sohail/project/src/package.lisp" (written 30 DEC 2007 05:23:41 PM):
; compiling (DEFPACKAGE PROJECT ...)

; /home/sohail/project/src/package.fasl written
; compilation finished in 0:00:00
; compiling file "/home/sohail/project/src/code.lisp" (written 30 DEC 2007 05:23:37 PM):
; compiling (IN-PACKAGE #:PROJECT)
; compiling (DEFUN SOME-FUNCTION ...)

; /home/sohail/project/src/code.fasl written
; compilation finished in 0:00:00

Even though we told ASDF to "load" the system, since we hadn't compiled the files, ASDF compiled them for us, in the right order. Type (project:some-function 5) if you want to convince yourself that it worked!

Now we want to add a package to test our code. To do this, open up project-test.asd and enter the following:

(defpackage #:project-test-asd
(:use :cl :asdf))

(in-package project-test-asd)

(defsystem #:project-test
:name "Tests for my project!"
:depends-on (#:project)
:components ((:module "test"
:components ((:file "package")
(:file "tests"
:depends-on ("package"))))))

The only new thing here is the use of the :depends-on keyword argument to defsystem. Here, we are telling ASDF that before loading/compiling project-test, the project system must have done so successfully.

Mechanically, we add the following to test/package.lisp:

(defpackage #:project-test
(:use :cl) ;; Could also use the project package but I like to qualify symbols
(:export run-tests))

And the following into test/tests.lisp:

(in-package #:project-test)

(defmacro assert-string-equal (form1 form2)
`(if (string= ,form1 ,form2)
(print "Passed!")
(print "Failed!")))

(defun run-tests ()
(let (output)
(let ((*standard-output*
(project:some-function 5)
(setq output (get-output-stream-string *standard-output*)))
(assert-string-equal "a is: 5"

Now save all those files, go back to the REPL and type:

(asdf:oos 'asdf:load-op #:project-test)

You should see "Passed!" output.

Before you start writing your own test framework, take a look at FiveAM. You can also see my thoughts about FiveAM.

Happy Lisping!

Thursday, 27 December 2007

My thoughts about FiveAM - Common Lisp testing framework

I spent all day today adding some tests as I needed to do some refactoring. I wanted to choose a CL testing framework and came across Phil Gregory's great post: Common Lisp Testing Frameworks. I read through his review and as you can tell by the title, I chose to go with FiveAM. I won't go over what Phil covered in his post except to say that FiveAM has what I expect in a testing framework and more.

You can jump to the the bottom line.

Simple tests are defined simply:

(5am:test my-test-case
(5am::is (= 2 (1+ 1)))

FiveAM has test dependencies:

(5am:test (my-other-test-case :depends-on my-test-case)
(5am::is (= 3 (1+ (1+ 1)))))

FiveAM allows you to group tests using test suites:

(def-suite arith-tests :description "Arithmetic tests")
(in-suite arith-tests)
(... above tests ...)

However, the grouping is only useful for selecting which tests to run. You can't (for example) make one set of tests dependent on another set of tests. This makes the feature only useful for organizational purposes. It isn't a deal-killer especially since you can write a function to work around this limitation like the one below:

(defun add-test-dependency (a b)
"Make test-suite a depend on test-suite b by making every test in a depend
on every test in b"
(let ((suite-a (safe-get-test a))
(suite-b (safe-get-test b))
suite-a-tests suite-b-tests)
(maphash #'(lambda (sym obj)
(declare (ignore obj))
(push sym suite-b-tests))
(5am::tests suite-b))
(maphash #'(lambda (sym obj)
(declare (ignore sym))
(push obj suite-a-tests))
(5am::tests suite-a))
(loop for test-name in suite-a-tests
(let* ((test (safe-get-test test-name))
(depends-on (5am::depends-on test)))
(let ((new-depends-on
(if depends-on
`(and ,depends-on ,suite-b-tests)
`(and ,@suite-b-tests))))
(print test)
(print new-depends-on)
(setf (5am::depends-on test) new-depends-on))))))

In hindsight, a smarter way to do this would be to introduce a pseudo-test in b that depended on every test in b and then every test in a would depend on this pseudo-test. Ah well.

But my favourite feature is the fact that FiveAM lets you generate samples from a distribution of inputs and feed them into your functions to test. Here is one stupid example:

(test encode-password
"Passwords are encoded as ALGO$SALT$HASHED-PASSWORD. This code tests
that the structure is correct and that the components of the encoding
pass sanity checks. In the case of encode-password the salt is randomly
(for-all ((raw-password (gen-string
:length (gen-integer :min 5 :max 10)
:elements (gen-character :code-limit (char-code #\~)
:alphanumericp #'alphanumericp
:code (gen-integer :min (char-code #\Space)
:max (char-code #\~))))))
(let ((encoded-pw (myapp::encode-password raw-password)))
(validate-encoded-password encoded-pw))))

The for-all macro takes a list of generators and iterates through a set of samples for all the represented values. This is done through the use of generators (gen-string and friends.) In this case, I am iterating through a distribution of strings that generates a string between 5 and 10 characters long the contents of which are in the "interesting" ASCII character range. The body of the for-all macro is dedicated to encoding the password and validating that the encoding is sane. Although it isn't important, validate-encoded-password looks like:

(defun validate-encoded-password (encoded-pw)
(destructuring-bind (algo salt hash)
(split-sequence:split-sequence #\$ encoded-pw)
(is (string= "md5" algo))
(is (= 5 (length salt)))
(is (= 32 (length hash)))))

The Bottom Line

5am is very suitable for testing in CL but test groups should really have the ability to take part in dependencies.

Sunday, 23 December 2007

(Ab)using Hunchentoot's dispatch mechanism to implement authentication

With any application in general, it is important to ensure that the user is allowed to use the system. This is known as authorization. The first step to authorizing a user is to authenticate the user or ensure the user is who they say they are. Once the user is authenticated, then the application can decide what operations/views the user is authorized to use. The de facto standard way of authenticating is by forcing the user to input a user name and password. <rant>I personally hate this.</rant>

With a stateful application, such as a desktop application, authentication is pretty straightforward: just authenticate at application launch.

Unfortunately, HTTP is stateless (keep-alives aside.) Continuation-based frameworks such as Weblocks totally remove the problem by allowing you to write your app as if it were stateful. It is quite beautiful. Continuation-based frameworks have their own uses but if they don't fit your needs, then you need a different approach.

The usual way to implement authentication for a web application to check if the client has been authenticated on each page request. Obviously, this is quite annoying if you have to do it yourself. Frameworks like ASP.NET handle this for you (\o/ frameworks) by some careful editing of XML files. As I understand it, you tell ASP.NET what resources are protected and how to authenticate when protected resources are accessed. You fill in the authentication blanks with some helper classes that MS wrote for you. ASP.NET then generates a random session ID and sets a cookie which is used in subsequent requests to the web application. Nothing special. This is wide open to MITM attacks if you don't use SSL or sufficiently secure your session ID.

Hunchentoot does none of the above but it has all the ingredients to make it possible. The idea is that we want to intercept every request and check if a protected resource is being accessed. If a protected resource is being accessed, then we need to either force authentication or pass the request along if the session is authenticated. One way to do this is to insert the appropriate code in every page using macros. Another way is to use Hunchentoot's dispatch table. Personally, I'm partial to the method covered here because it doesn't require you to remember to secure your pages. Another benefit with this method is that you can also protect non-function resources such as when you serve static content.

When Hunchentoot receives a request, it iterates through hunchentoot:*dispatch-table* executing each dispatch function. Each of these functions is meant to return a function that will serve the request when applicable. Therefore, Hunchentoot executes the first such function returned while iterating. Hopefully the solution that I am thinking of is clear: insert a dispatch function that gets called before all others that checks for an authenticated session and redirects to a login page if one is not found. Here is an example of such a dispatch function:

(defun check-login-dispatcher (request)
(unless (or hunchentoot:*session*
(starts-with (tbnl:script-name request)

Where oopsie-need-to-login looks like:

(defun oopsie-need-to-login ()
(:p "Before you can continue, you must login. For now,
just " (:a :href "/public/auto-login" "click here"))))

And public/auto-login looks like:

(defun auto-login ()
(hunchentoot:redirect "/"))

In real life, you would obviously not automatically log someone in and instead have the regular username / password form.

Further, to prevent session hijacking with Hunchentoot, you need to do atleast the following:

  • Use SSL (HTTPS)

  • ;; See Hunchentoot documentation for the meaning of these variables
    (setf hunchentoot:*use-remote-addr-for-sessions* t)
    (setf hunchentoot:*use-user-agent-for-sessions* t)

  • You need to also redefine hunchentoot::get-next-session-id because it uses sequential session IDs which leaves you open to guessing attacks. Imagine an attacker logs in just before you and knows that the next session is N+1. Not fun.

The above method for authenticating a user is secure so long as the above three are implemented. I haven't done the third yet so I couldn't say what to do there. I think you need to generate a (theoretically) truly random number somewhere.

Update: You do *not* need to redefine hunchentoot::get-next-session-id. It turns out that information was made on a bad assumption that all the information going into the session id string was deterministic. On reading the code more, there are two elements of randomness inserted into the session string:

  • The session start time

  • A random string generated once per server

I think the above two are sufficient to make it secure for some value of secure. I believe the secrecy of the random string is important to the security. But I am no security expert!

Friday, 21 December 2007

Filter search results using regular expressions

Filter search results using regular expressions.

So Google has had this AJAX API out for some time. For the longest time, I've wanted to be able to apply regular expressions to the results. I figured using the Google API would be the best way to do it. That is the page linked to above. View the Javascript, it is pretty straight forward (and likely buggy!)

The API itself is pretty good. The only thing that bugs me is that I could only manage to get 8 results on which to apply my regular expression. Ideally, I would have liked to do something like this:

  1. Get query + filter string from user

  2. Submit query to Google using AJAX

  3. As the results come in, filter them using the filter string in step 1

  4. Present to user as results pass filter

So for the above page to be even slightly useful, I would have to be able to go through at least 100 relevant results.

Of course I understand that Google has limited the search for business reasons. But this is a very good way to totally make the feature useless, in my opinion.

Oh well, at least I learned something today.

Tuesday, 18 December 2007

RESTful handlers with Hunchentoot

I am not quite sure what REST is but I know that following REST practices gives you URLs like Pretty isn't it?

Django has a URL dispatcher which allows you to specify regular expressions that match incoming URLs and call a specific handler. If you wanted function resource_page to handle requests to URLs similar to the above, you would specify /resources/\d* as the regular expression. The slashes at the beginning and end of the regular expression are optional. Effectively, you are writing /?resources/\d*/?. This would match:

  • /resources/

  • /resources/123

  • /resources/123/

I'm not entirely sure that it would match /resources// (try it to see!)

This would not match /resources/abcd and other similar URLs.

In Django, the handler is written as:

def resource_page(request,resourceid):
# ....

And the dispatcher is registered like:

urlpatterns = patterns('',

A little thing I forgot to mention was that if you want the matches to be bound to function arguments, you have to make sure the regex remembers them. In this case, one would really want to write:

urlpatterns = patterns('',

As is my nature, I decided I want this functionality as a Hunchentoot handler. Hunchentoot works pretty much the same way except you only specify dispatch handlers. When a request comes in, Hunchentoot iterates through the dispatch handlers and if one of them returns a function, that function is called to handle the request otherwise the default handler is called.

So the solution is obvious (I think!): I want to write a handler that matches the requested URL against a regex, binds any matches to function parameters and returns that function.

What I want to write in my code is something like:

;;; I like to be explicit about the slashes myself
(create-regex-dispatcher "^/resources/(\\d*)" #'resource-page)

This is obviously very simple once you have a regex engine. Fortunately, not only has Edi Weitz made a billion other libraries including Hunchentoot, but he has also written one of the fastest regex libraries, surpassing even Perl. Crazy.

Anyway, here is the code:

(defun create-regex-dispatcher (regex page-function)
"Just like tbnl:create-regex-dispatcher except it extracts the matched values
and passes them onto PAGE-FUNCTION as arguments. You want to be explicit about
where the slashes go.

For example, given:
(defun blog-page (pagenum)
... )

(push (create-regex-dispatcher \"^/blog/page(\\d+)\" #'blog-page)

When the url blog/page5 is accessed, blog-page is called with pagenum
set to 5."
(let ((scanner (cl-ppcre:create-scanner regex)))
(lambda (request)
(multiple-value-bind (whole-match matched-registers)
(cl-ppcre:scan-to-strings scanner (tbnl:script-name request))
(when whole-match
(lambda ()
(apply page-function (coerce matched-registers 'list))))))))

And a simple example for using it:

;;; If there is no match, id will be the empty string.
;;; Should be a better way to handle this, but OK for now.
(defun resource-function (&optional id)
(tbnl:log-message* "~A ~A")
(if (or (null id)
(= 0 (length id)))
(cl-who:str "No resource"))
(cl-who:fmt "Resource ~A" id)))))))

(push (create-regex-dispatcher "^/resources/(\\d*)" #'resource-function)

Monday, 17 December 2007

Python decorators in Lisp, Part 3

In this last post, I showed a way to implement Python's decorator syntax in Lisp which actually seemed to work for more than just myself!

What I did not show is how you can use this in regular source files. As mentioned previously, one way to add new syntax into Lisp is to tell the reader (via *readtable*) to call a reader macro when it encounters a particular pair of characters. So the answer to using this syntax in regular source files is to locally enable it by rebinding *readtable*. As always, it helps to write out what you would like to do:


(defun my-function (x) ... )

(defun my-other-function (x) ... )

(defun yaf[yet-another-function] (x) ... )


Quite simply, enable-py-decorator-syntax copies the current readtable and sets the dispatch function. It also assigns the original readtable to a variable so it can be reset. Conversely, disable-py-decorator-syntax does the opposite: it sets the current readtable to the original readtable and sets the auxiliary variable to nil. Without further ado, the code for these functions:

(defparameter *original-readtable* nil)

(defmacro enable-py-decorator-syntax ()
"Turns on the decorator syntax"
'(eval-when (:compile-toplevel :load-toplevel :execute)

(defun %enable-py-decorator-syntax ()
(unless *original-readtable*
(setf *original-readtable* *readtable*
*readtable* (copy-readtable))
(set-dispatch-macro-character #\# #\@

(defmacro disable-py-decorator-syntax ()
"Turns off the decorator syntax"
'(eval-when (:compile-toplevel :load-toplevel :execute)

(defun %disable-py-decorator-syntax ()
(when *original-readtable*
(setf *readtable* *original-readtable*
*original-readtable* nil)))

Thanks to popeix for submitting the original post to reddit (mod it up!) The code for enabling and disabling the syntax was based on syntax.lisp.

Sunday, 16 December 2007

Python decorators in Lisp, Part 2

So in this earlier post, I suggested that I was envious of Python's decorator syntax and wondered if it was possible to do in Lisp. The answer was most undoubtedly yes, and it took the following form:

CL-USER> #@(lambda (fn) (lambda (&rest args) (print "in-lambda")(apply fn args)))
#@(synchronized "with-this-lock")
(defun this-function () (print "this-function"))
CL-USER> (this-function)
"Obtaining lock with-this-lock"
"Releasing lock with-this-lock"

The Lisp solution is more flexible, although that flexibility (being able to use lambda functions) is probably unwarranted.

The fundamental component of program compilation or interpretation is the Lisp reader. It is responsible for parsing representations of objects producing objects. So when an object has a non-readable representation, that means it cannot be reconstructed in this manner. For more information on the algorithm, see the relevant ultra hyperlinked hyperspec.

The Lisp reader reads one character at a time from the input stream. Big surprise. The interesting part that makes the above possible is that you can redefine what the reader does when it encounters certain characters. This dispatch information is stored in what is known as a readtable. The current readtable, the readtable being used for dispatch when reading, is stored in the dynamic variable *readtable*. So, to modify the readtable for a subset of code, all you need to do is rebind this variable within that block of code.

The hook into the Lisp reader that I used is set-dispatch-macro-character. Among other parameters, this function takes in two characters and a function to call when the reader encounters these characters. For some reason, I decided that I wanted #@ to be the dispatch pair for the decorator implementation. I suppose I could just as easily have used set-macro-character and dispatched on @. I leave that as an exercise to the reader (if you are still reading!)

So just like when dealing with macros, it helps to write out what code you want generated. In this case, given the input:

#@(another-decorator 5)
#@(lambda (fn) (lambda (&rest args) (apply fn args)))
(defun some-function (x)
(print x))

For better or worse, I would like to generate something close to the following:

(let ((some-function
(funcall (another-decorator 5)
((lambda (fn)
(lambda (&rest args) (apply fn args)))
(lambda (x) (print x)))))))
(defun some-function (x)
(funcall some-function x)))

That is, essentially just keep creating decorator functions and call them in the order they are listed until you get to the decorated function.

To get going, I wrote a small function that rebound the readtable to a local copy and set the dispatch function to use:

(defun test-readtable-thing ()
(let ((*readtable* (copy-readtable nil)))
(set-dispatch-macro-character #\# #\@

What this will do is set the read function to call |#@-reader| when #@ is encountered. So now it might help to come up with some algorithm for how the |#@-reader| reader would do it's work:

  1. Parse all the decorator representations (symbol, lambda, function call)

  2. Parse the decorated function

  3. Generate a new function that is created by successive application of each decorator function

Simple enough eh? Except when you have more than one decorator, the reader will call your dispatch function recursively. So we must disable that by temporarily rebinding the dispatch character to a simpler function. After this little tricksy bit, the rest is pretty mechanical. So without further ado, the actual code:

(defun |#@-reader-aux| (s c n)
(declare (ignore c n))
"Reads the function and returns a list with the
first element being hash-at and the second element being
the actual object following #@"
(list 'hash-at (read s t (values) t)))

(defun |#@-reader| (s c n)
(declare (ignore c n))
(let* ((first-decorator (read s t (values) t))
(decorators (list first-decorator))
(*readtable* (copy-readtable nil)))
;; On the first #@ encountered, reset the readtable to use the
;; aux function which does not recur.
(set-dispatch-macro-character #\# #\@
(let* ((decorated-function
(loop do
;; it is a decorator if it is a list
;; form with the first element being
;; hash-at
(let ((x (read s t (values) t)))
(if (and (listp x)
(equal (first x) 'hash-at))
(if (symbolp (second x))
(push `(lambda (fn) (,(second x) fn))
(push (second x) decorators))
(return x)))))
(function-name (second decorated-function))
(function-args (third decorated-function))
(function-body (cdddr decorated-function))
`(lambda ,function-args ,@function-body)))
,(reduce #'(lambda (a b)
`(funcall ,a ,b))
(reverse decorators) :from-end t :initial-value lambda-function)))
(defun ,function-name ,function-args
(funcall the-function ,@function-args))))))

(defun test-readtable-thing ()
(let ((*readtable* (copy-readtable nil)))
(set-dispatch-macro-character #\# #\@

Cut and paste into your REPL and have fun with it! If you don't have a REPL, install SBCL for your platform and give it a run. Let me know if it actually works for you, if you try it! :-)

Edit: If you want to play with this as is, the easiest way is to type (test-readtable-thing) into the REPL and use (eval *) to evaluate the output once you take a look at what it generated. You can also use (eval (test-readtable-thing)). I will write a post that shows how to enable it for normal source code soon.

Edit: The code for enabling the syntax in source files is here

Python decorators in Lisp, Part 1

In some version of Python the community reached a consensus that decorators were a useful addition to the language. Decorators were implemented to encapsulate function transformation which usually took the following form:

def synchronized(lock):
"""Return a decorator that ensures the decorated function is only called when
holding lock"""
def decorator(fn):
def the_fn(*args,**kwargs):
return fn(*args,**kwargs)
return the_fn
return decorator

def foo(bar):
foo = synchronized(lock)(foo)

In the above example, the function foo is modified to ensure it holds a lock first before calling zonk(bar). Unfortunately, this transformation has very poor placement in terms of readability. If foo was very long, it would get lost in the noise as it is at the end. So a new syntax was proposed:

def synchronized(lock);
# as before

def foo(bar):

That placement is a lot better and once you understand what decorators are for, it is a lot more readable than the alternative.

I was reading through some Django code the other day (Reviewboard) and noticed that they were very heavy on usage of decorators. It is quite a handy tool it seems. So I got envious. I wondered why Lisp did not have this functionality. Is it not possible? Do you need to meet for months to put this into the Common Lisp standard? Thankfully, the answer is no. I will cover how I arrived at this in Part 2, perhaps later today, but here is the equivalent Lisp code:

(defun synchronized (lock)
(lambda (fn)
(lambda (&rest args)
(with-lock lock
(apply fn args)))))

#@(synchronized lock)
(defun foo (bar)
(zonk bar))

I should also mention that it is more flexible than the Python equivalent:

#@(lambda (fn) (lambda (&rest args) (print "lalalalalal") (apply fn args)))
(defun foo bar (bar)
(zonk bar))

Not that there is actually any use for it besides "I should also mention" purposes.

Pretty neat eh?

An interesting quote from PEP 318:

It's not yet certain that class decorators will be incorporated into the language at a future point. Guido expressed skepticism about the concept, but various people have made some strong arguments [28] (search for PEP 318 -- posting draft) on their behalf in python-dev. It's exceedingly unlikely that class decorators will be in Python 2.4.

Thank goodness Guido will not stand in my way ;-)

Edit: I am fully aware that this is not a Lisp idiom. I just wanted to see if it could be done, more than anything.

Edit: I have added part 2 here.

Saturday, 15 December 2007

Programmer's shopping list

Every red-blooded programmer must splurge now and then on some technical books that have nothing to do with their current work. For me, this is typically vacation and Christmas time. While I am regularly purchasing on-topic business and technical books during the year, this year, I am spoiling myself on:

A bit schizophrenic, I admit, but I love Lisp and Rails seems to be getting used everywhere nowadays. I can't say I disagree with the value proposition of Rails, as I see it:

You know how you always do the same damn thing for every single web application? Well this framework does it all for you in a way that works for everyone.

Nothing like solving non-problems to get you thinking about your real problems!

Someone wise once told me that it is good to get into the habit of purchasing books regularly. Unfortunately, I seem to stick to non-fiction, or atleast factual. I would like to get into more political or historical readings, but don't know any good ones!

Friday, 14 December 2007

Easy way to give a donation

This year, BC Hydro is proud to support the BC Children's Hospital in their quest to change the
lives of kids in need. Just like with energy conservation, every little bit makes a difference. Each time our holiday card is viewed, we will make a donation to BC Children's Hospital. To view and hear the e-card, visit the following link:

Don't forget to turn the volume up!

I'm too good for that!

This is something that has interested me for the longest time: why do so-called experienced developers claim that their time shouldn't be spent working on the build system?

The best developers I know, know their build systems inside out. The best among these have written their own. Remember though, correlation is not causation. Anecdotal correlation is probably the worst kind though :-)

To me, the build system is like the conductor in an orchestra. It defines the possibilities and the boundaries. It directs what you can do. Deviation from this breaks the harmony and you get a sub-optimal result.

Why anyone would not like to direct the development process in this way is beyond me. I guess the guys whose time should be better spent elsewhere are probably the same guys who complain a lot about things in general.

Companies that invest in a suitable system for builds, even if they write their own, will be much better off in the long run. Of course, you should only write your own once you are making some money!

Wednesday, 12 December 2007

SCons: Extra warnings on Windows

The next checkpoint release of SCons will contain a change that warns users about the unreliability of -j on Windows if the Python win32 extensions are not installed.

The problem is that in a parallel build, if you create/read/modify a file in a Python action, and a command-line action is spawned, the command-line action inherits Python's open file handles and can keep them open, which may cause subsequent failures.

I followed the discussion on the mailing lists and thought it was quite cool how they solved the problem.

Monday, 10 December 2007

Use the source, Luke

The difference between proprietary software and commercial software is subtle. Proprietary software is essentially software that is usually for sale but which the user is not allowed to reverse engineer or modify for any honest purpose. These restrictions are usually laid out in pages of legalese that you either click-through and never read, or you read and die before you finish. Commercial software is software that is usually for sale.

Why do people consciously choose to close their software to the users in this manner? I think there are three reasons:

  • The code really sucks.

  • Trade secrets.

  • Everyone else does it.

I feel that only the first is a legitimate reason. If the only thing preventing you from distributing your source code is a trade secret, then you are on thin ice anyway. Even so, I'm on the fence about trade secrets. If the code really does suck, then distributing the source could really harm your reputation, which isn't worth it. Providing good service and support is the best you can do for your customers in this instance, atleast until you can rewrite it.

The trick is to notice that not all commercial software has to be proprietary. The restrictions do not have to be so onerous that you are afraid to look at an error message for fear of knowing what file it originated from. For my customers, I want them to feel that the software helps them, not restricts them. Open source software is geared towards giving the user of the software the freedom to atleast look at and modify the source.

The license of licenses, the GPL, goes a bit further. It lets you reuse it for any purpose, providing you make your modifications available. This opens up the possibility that someone may compete with you using your own software. This has happened, for example, with Redhat and CentOS (thank you for choosing unique names, it makes analyzing the trends so much easier!) Is that a problem for Redhat? I'm not sure, but their revenues have doubled in the last couple of years. They surely aren't dying and they must be having a good time.

But Redhat is a bit different aren't they? They don't provide a single piece of software, they provide a union of TONS of software. What about the really small guys? I'll just use the term uISV, for Micro-ISV.

Compared to their huge, monolithic counterparts, I think uISVs are different in one very important way: they genuinely care about software and solving hard problems for their customers. And I think here is where having the source available for the users can be important. If one of your customers needs to port your software to the Xbox 360, but you have neither the expertise or the economical inclination to do so, your customer should be allowed the right to do this. Even further, they should be encouraged to submit their changes back to you. Perhaps through some discount-on-next-version incentive program or just simply because then they don't have to maintain their patches.

So my license would allow the customer to:

  • Use the software

  • Modify the software for their own purposes

  • Submit their modifications back to me, if they feel it is beneficial to them

I would specifically prohibit the redistribution of my software in source or binary form because I'm not sure whats in it for me.

I think the above would work for 99% of paying customers. It extends the software support spectrum just a little bit more, which makes it more useful for them, which gives you (possibly) happier customers.

See Up the tata without a tutu by Joel Spolsky for another discussion of this subject. I don't know where he gets these titles from.

Saturday, 8 December 2007

Is the programming language/technology important (anymore?)

A comment in an earlier post got me to thinking whether the technology used to develop an application is important anymore.

I believe that what really matters is not the length or conciseness of code, but the functionality made available to your users. Yet, correct and judicious application of a technology can make or break your product.

So yes, I do believe the technology behind an application is important. This about sums up my thoughts on the subject:

A million mediocre programmers, pair-programming at five-hundred thousand mediocre computers, using mediocre software will never produce anything that is not mediocre. However, a few excellent programmers, programming on their own machines, using open source software, will eventually produce Linux

Friday, 7 December 2007

Update: (Weblocks) Doing first-time setup for a web-app

Just some updates to the earlier post.

As I mentioned, if you want to do model validation with Weblocks, you currently have to write it yourself. But the choice of method to specialize was sub-optimal. Ideally you would want your model to be validated no matter how you were modifying it, say with a gridedit or a dataform. So instead of hijacking dataform-submit-action, I should really have created an around method for update-object-from-request. This is below:

(defmethod weblocks:update-object-from-request :around ((data create-login)
&rest args)
(multiple-value-bind (success failed-slots)
(if success
(validate-new-login data)
(values success failed-slots))))

validate-new-login will return multiple values in the same manner as the generic update-object-from-request: a boolean value indicating success and a list of errors, if failure was indicated. It looks like this:

(defun validate-new-login (new-login)
"If the Weblocks validation succeeds, then all required values
are already there, so we only need to check the consistency. Returns
(values success errors) where success is t if there were no errors otherwise
success is nil and errors is an association list of (slot . error)"
(let (errors)
(validate-slot "password"
(equal (password new-login)
(verify-password new-login))
"Both passwords must match!")
(values (null errors) errors)))

And validate-slot is an extremely unhygienic macro that looks like:

(defmacro validate-slot (slot-name
`(unless ,predicate
(push (cons (attributize-name ,slot-name)

Pretend you didn't see it.

Which brings me to my next update. Weblocks nicely renders all the errors for you. Unfortunately, I made that part really difficult. Previously, my use of continuations would always create a new dataform which would then not show any errors that needed to be displayed. Since the continuations are stored in the widgets themselves, I should have just yielded the dataform widget itself and be done with it:

(defun setup-admin-password ()
(make-instance 'dataform
:name "create-login"
:data (make-instance 'create-login)
:ui-state :form
:allow-close-p nil
(lambda (self)
(answer self))))

Have a good weekend!

Thursday, 6 December 2007

SVN: Mergin' ain't easy (but somebody gotta do it)

An update on Subversion 1.5. I haven't seen it advertised much, for obvious reasons, but it may be the case that Subversion 1.5 ships without merge tracking or with very limited merge tracking.

From a posting on Subversion merge tracking:

I think you're referring to the new merge tracking features of 1.5. By following Subversion-devel, I can tell you that it's not even clear at this point what kind of merge support should be expected in 1.5 and what will be deferred to 1.6 or even later.

Why this is the case is beyond me. Collabnet has had a merge tracking beta for quite some time now. My guess is that they haven't had enough feedback. Regardless, you can count on a quality implementation once the feature is actually released.

If you are currently using Subversion, I think the best choice is still doing whole-tree merges. Cherry-picking using should only be used for release branches.

With a well-defined process, you barely notice lack of merge tracking. Except when someone screws it up!

Cats: Is there anything they can't do?

Two cats + a laser pointer = hours of fun.

My favourite moves:

  • The zig-zagging cat

  • The wall climbing cat

  • Head-on colliding cats

Ah cats. Always good for a laugh.

Update: To host or not to host...

Mr Software Blog, Joel Spolsky, weighs in on the debate: Where there's muck, there's brass.

Wednesday, 5 December 2007

Weblocks: Doing first-time setup for a web-app

After you read this, there is an update here!

So one of the things you want to do when you start an application is configure it. Ideally you wouldn't configure anything, but sometimes, at the very least you need an administrator login to be set up. I will talk about how to do just that using Weblocks. I assume you have already installed Weblocks and have created an application using weblocks:make-application.

First, I figured I wanted to store application configuration somewhere. To start with, I decided to just use the simple associative-container that comes with cl-containers (use ASDF to install it.) Then I added a bunch of configuration-related functions that would encapsulate the storage somewhat:

(in-package #:myapp)

(defparameter *config*
(make-container 'associative-container))

Configuration variables:
config-first-config-complete-p: Whether the first configuration has been completed or not

(defun load-config-from-file (filename)
(declare (ignore filename))
;; just set some defaults for now
(set-config-value 'config-first-config-complete-p nil))

(defun config-value (name)
(item-at *config* name))

(defun set-config-value (name value)
(setf (item-at *config* name) value))

When you called weblocks:make-application, that created the file myapp.lisp which contained the code that starts and stops your application. Insert a call to the function load-config-from-file there with some dummy argument for the filename for now (you will have to fill that in later - hint: use cl-store).

So now you can get and set arbitrary configuration values. The configuration key that I named above, namely config-first-config-complete-p is initially set to nil when the application starts for obvious reasons (hint: it is the topic of this post!)

Another file generated by make-application is init-session.lisp. If you are at all familiar with Weblocks, what this function does is initialize the session for the user connecting. You are supposed to set up a bunch of widgets and let the client have at them.

This is where using continuations comes in really handy:

(defun init-user-session (comp)
(with-flow (composite-widgets comp)
(unless (config-value 'config-first-config-complete-p)
(yield (list (first-time-setup))))
(yield (list (homepage)))))

So unless the first-time configuration has been completed (which is determined by checking the configuration value at runtime,) we return the result of first-time-setup which is obviously where the real magic happens.

I created another file, login.lisp, that I used to keep all the login logic. Right now, it only has the logic for creating a login but you can use your imagination. Anyway, the first-time-setup function looks like this:

(defun first-time-setup ()
(let ((tree-comp (make-instance 'composite)))
(with-flow (composite-widgets tree-comp)
(yield #'setup-admin-password)
(yield #'setup-done))

When yielding continuations in Weblocks, the continuation is stored in a widget. That is why we need to create the composite widget and use it with the with-flow macro.

When you create a login, the minimum pieces of information you need are usually the user name and the password. Typically, you also need to verify the password. We need to create a widget that will let us do this.

Weblocks comes with a widget called the dataform which nicely wraps up editing server-side data structures on the client. All you need to pass it is an instance of your class, and it generates the appropriate form. Quite nice, if you ask me.

So the data model that I used to store the login creation was unimaginatively called create-login. As you can see, it is a normal CLOS class and there is nothing suspicious about it:

(defclass create-login ()
:initarg :name
:accessor name
:initform nil)
:initarg :password
:type password ;; except this!
:accessor password
:initform nil)
:initarg :verify-password
:type password ;; and this!
:accessor verify-password
:initform nil)))

The reason that I gave an explicit type to the password slots of the class was because if we just let them be, then Weblocks renders the textbox representing the password as a text input, rather than a password input. We will need to use the type to override this behaviour.

I defined the password type using (deftype password () 'string).

When a class slot value is rendered to HTML, the function render-form-value is called. As mentioned before, we want to override this behaviour for the password type. We do this as follows:

(defslotmethod render-form-value ((obj create-login)
(slot-type (eql 'password))
&rest keys
&key (human-name slot-name)
;; Need to use attributize-name because thats what weblocks uses
;; as the key when reading the post parameters
(:input :name (attributize-name slot-name) :type "password")))

I love CLOS. Pay special attention to the call to attributize-name. It took me a while to figure that out!

So now, we need to actually create our widget that will let us add a login to our system. Actually, we are already done. The dataform does it for us:

(make-instance 'dataform :data login )

But what if the user just presses submit without actually entering any information? We should rap their knuckles for that, or atleast give them a message. We can use the flash widget for that. Since this will be part of the adding-a-user action, we create a widget that contains a flash message:

(defwidget login-widget (dataform)
:initarg :login-message
:accessor login-message)))

Badly named, that should be create-login-widget but c'est la vie.

Way above, in first-time-setup, we yield the setup-admin-password continuation. That code looks like this:

(defun setup-admin-password (k)
(let* ((widget (make-instance 'composite))
(message (make-instance 'flash
:name 'hi
(list "Hello! Welcome to myapp. Please create an administrator login")))
(login (make-instance 'create-login))
(get-password (make-instance 'login-widget
:name 'create-login
:data login
:ui-state :form
:allow-close-p nil
:login-message message
(lambda (&rest args)
(declare (ignore args))
(answer k)))))
(setf (composite-widgets widget)
(list message get-password))
(render-widget widget)))

We need to call render-widget because the function becomes a widget when you yield it. A little subtlety that I only came across by trial and error (and help from the mailing list of course!) The key thing to note is that we only return from the continuation (i.e., call answer) if the form submits successfully, passing all validation.

By default, Weblocks does very limited validation of form submissions. For example, it can validate whether there are any missing slots that are required. But in this case, we need to make sure that (for example,) the password and the verify-password slot values match exactly. This validation takes place when the form is submitted, and Weblocks calls the function dataform-submit-action. If you haven't guessed, we need to override this function and add our own validation:

(defmethod dataform-submit-action ((obj login-widget)
(data create-login)
&rest args)
(multiple-value-bind (success failed-slots)
(apply #'update-object-from-request data args)
(check-login-and-flash-messages data (login-message obj))))

Quite simple. The function update-object-from-request updates the data model (i.e., the create-login instance) and returns t when everything succeeded, or (nil failed-slots) if something failed. For some reason, I ignored the fail case. Go figure. The check-login-and-flash-messages function then does the actual validation, adds a bunch of messages to the flash object (referenced via (login-message obj)), and returns t if everything was ok, nil otherwise.

If this function returns t, then Weblocks considers that the submission has succeeded and calls the on-success function, which we neatly set up to return from the continuation.

In real life, you would obviously add the actual user to some database, but that is essentially the meat of what I did. In the end, you get something like the following:

Let me know what you think.

If I had a million dollars...

Tax-free of course. Otherwise the title would be: If I had 12 dollars :-)

I would spend some non-trivial amount funding the development of Weblocks and then write some kick-ass apps with it.

In my very humble opinion, I think that Weblocks is the way web apps should be written. It enforces modern web design practices in that there is not much futzing with HTML (unless you are adding UI widgets) and CSS is where you do your layout. Think about it. This way, your apps are "skinnable" from the get-go.

I've had my frustrations with it of course, due to it being a very young framework. But wow, it is pretty damn good given the scarce resources that have been responsible for it's development. Once Slava gets the object store integrated into it, watch out.

By comparison, I've been looking at Django and it makes me cringe, even though it is probably one of the best frameworks out there.

Anyway, I'd take half of the rest of the money over to my friend Asadullah in securities... (If you don't get the reference, go watch Office Space!)

Tuesday, 4 December 2007

Does Java cause self-delusion?

Does Java cause self-delusion?

I am only the messenger.

But the answer is yes.

Web browsing with Emacs

No seriously.

I'm tired of switching between Emacs and my web browser. So here is what I did:

$ cd /path/to/scratch/space
$ wget
$ cd emacs-w3m-1.4.4
$ cat > w3m-w23.el
;; Explanation: in Debian (at least), w3m-el tries to do
;; ("require" a file

;; w3m-e23.el when run from inside Gnu Emacs 23.x.x

(require 'w3m-e21)
(provide 'w3m-e23)
$ make install

And then, in my .emacs:

(when (locate-library "w3m-load") (require 'w3m-load))

Thanks to #emacs for the info.

Update: Disconnected Source Control

Apparently there are quite a few solutions to working offline with Subversion:

Yay for open source, I tell you that!

Monday, 3 December 2007

To host or not to host...

That is the question... I recently came across the post installable software on the 37signals blog which has got me thinking. If I were using some mission-critical application, would I trust my jack-of-all-trades IT to handle it, or someone who really understands the system? When the question is posed in that manner, obviously the latter.

So as far as I am concerned, it comes down to one thing: performance. Are you selling a web app? If so, host it and you are done. I think there is no intelligence involved in that decision.

But what if you are selling something in which a web interface is only one part? A silly example: game servers. With most multi-player online games, there is a server gaming component as well as a web component for administration. By the way, people do make some money running tournaments so it isn't a bunch of kids wasting their time.

So the question is, would I pay someone to host my game server that I charge for over the Internet? If the performance was sufficient, and it usually is, I think the answer is yes.

I don't think there are many arguments against hosting or perhaps I have blinded myself to them. The comforting feeling I get knowing that I can control what people see and their upgrade experiences makes me feel warm and fuzzy. Still, I can't help but think that someone needs to blog about "Installable Software" and why it is better than "Hosted Software".

It is interesting to note that Mr. Software Blog himself, Joel Spolsky, has started a hosted service for FogBugz.

New speakers

I just snagged a new set of Altec Lansing speakers for my computer. The bass in these bad boys is awesome.

What was interesting in this purchase was that the speakers were not set up so you could listen to them in the store. Apparently the other manufacturers pay the store to get them set up so you can listen to them. A couple of unauthorized rejigging of wires and I was previewing the sound.

Now for another few years of good quality sound.


How regexes work

Essential for any programmer: How regexes work. If I recall correctly, I did this in second year of university and thought it was awesome. Now you may think it is awesome.

Disconnected source control

Recently, the Linux kernel went from using a proprietary source control system to an open source so-called distributed source control system. Of course, the initial system was written by none other than Linus Torvalds. The software is known as Git. I don't have much direct experience with distributed source control myself, but I am watching a few people use it regularly. I think Git definitely fills a huge niche for the open source model. It encourages forks and merging of the best forks. It is a natural evolution in open source development.

But a lot of us don't have a need to create a fork of any random software package. Most of the time, we work in project teams that are hand-picked, not random. In this case, we mostly need centralized source control and more and more, we use Subversion (or Perforce, if you're into that sort of thing.)

Increasingly, I am finding that I really need to check in my changes but I am nowhere near the Internet (really!) or my server. I am searching for a solution to the problem. Many people claim that I really want distributed source control. No, I don't think I do!

What I do want is disconnected operation. An "offline-mode", if you will. This is how I would do it, say for Subversion. From the user's perspective:

$ svn up $DIR --offline

This command would create a local repository that I could check into while I was "disconnected", rooted at $DIR. The first revision would correspond to the latest revision that I had of each file. Then checkins would create deltas from this revision, but they would be local.

Once I am connected, I would want to type:

$ svn resync $DIR

To send all my changes back to the centralized server.

What it seems I really want is SVK. I've heard some good things about it. I suppose I will check it out (no pun intended!)

Sunday, 2 December 2007

You lazy bastard, Part 1.

Or "Portable, thread-safe, lazily initialized singletons with Boost."

Portable, efficient, lazily initialized thread-safe singletons are something that are needed fairly often in the C++ wild. I don't intend to cover why you are insane for wanting this. I intend to cover a naive solution that should work, but doesn't, and another solution that should work, given my information. I believe this solution addresses three of the four characteristics, namely efficiency is dropped. I prefer correctness and programmer time is important (and good programmers are expensive!)

To cover why you are insane for wanting these characteristics in the first place, I refer you to this and this.

Now, you may ask what makes me so special as to pretend to be an authority on this. Let me clarify: I don't pretend to be an authority. I am not. I have walked through this minefield a little bit and I will mark the mines for you as best I can. But if I may boast a little bit: I did find a bug in Open Solaris's pthread_once implementation (on x86) and another thread-safety issue in Boost Serialization. The reason I point these bugs out is not that the authors were deficient or incompetent. They are quite the opposite. I point these out because I feel that people don't realize what things are waiting to bite them in the butt when it comes to this area of software development. Indeed, it is easy to point the finger "haha, you made a bug," but we don't like those people around here. I have a shotgun and some bullets reserved for you if you insist on staying. Anyway, if our experts can make this mistake, you have made it and you don't even know about it.

If by now you haven't read through the above linked pages, I suggest you do so now. I don't have the writing capacity to not blather on like an idiot and repeat them to you. Still here? Go! Use the tabbed feature of your browser. If your browser doesn't have tabs, I have another box of bullets for you!

Done? Good.

So, quite often, we think we want a single instance of some object but we don't want the object to be constructed on program startup. The simplest way to do ensure this is the following:

MyObject & seductive()
static MyObject t;
return t;

That is a very seductive pattern, and if you are not concerned about thread-safety (i.e., you don't have multiple threads), then you are done. You may go Google Britney Spears. This is also known as Myers's singleton. Don't know which Myers. Don't actually care :-)

Now, what happens when multiple threads enter seductive() at the same time? You guessed it, you can get multiple initializations and all sorts of bad stuff happening. If you didn't guess that, then you definitely didn't read this.

All right. What the fudge? What are we supposed to do?!! I say: drop optimal efficiency as a requirement. You are probably copying a huge vector somewhere anyway before writing it to a file. That opens up many new worlds for solving this problem. The most straightforward, using Boost Threads functionality (it won't compile, I'm sure):

static boost::mutex mtx;
MyObject & looks_sexy_but_isnt()
scoped_lock lock(mtx);
static MyObject t;
return t;

Even if it does compile, it won't work. "Hang on," you say. "That will work just fine. You're crazy. I'm going to go Google Britney Spears." You are wrong. That will not work. It will only appear to work kind of most of the time. Why? And the reason is:

Boost Mutex is not statically initializable

What the heck does that mean? The details are a bit fuzzy in my head, but the bottom line is that the C++ standard does not guarantee that the mtx variable is initialized before main(). Essentially, if a datatype has a constructor, you are SOL. ES. OH. EL. I don't know why the heck they didn't make the mutex statically initializable, but I guess it starts with "Win" and ends with "dows". I know pthread mutexes are statically initializable by design.

So what the fudge? We are still screwed. Yep. Pretty much.

So let us recap our problems:

  • If we use a mutex, it must be statically initializable

  • Double checked locking is broken except when the moon is blue and you are standing on your tippy toes

  • Popcorn gives me gas

If we decide to use Boost Threads (a fine choice, but sometimes limited), it has the first problem and there is no general, portable solution to the second. The third involves less popcorn.

Ah, but there is a silver lining. A couple of years back, I was at SD West when Scott Myers was giving his "Double checked locking (DCL) is broken" talk. Most of the crowd had no flipping idea what DCL was (I sure didn't) but Myers has this ability to communicate that I envy. So by the end of the talk, everyone knew what he was talking about and people were discussing ways to make it work. David Abrahams, of Boost fame, said: "Why don't you just use pthread_once?" And I thought: "Duh!" So the idea is not mine, but the implementation is! I present to you "Captain Sabraham's (or Dohail's) Singleton" :-)

#include <boost/thread/once.hpp>

template<typename T>
struct singleton : private boost::noncopyable
T & instance()
return instance_helper();
static boost::once_flag once_;
static void call_me_once()

static T & instance_helper()
static T t;
return t;
template<typename T>
boost::once_flag singleton<T>::once_ = BOOST_ONCE_INIT;

boost::once_flag is statically initialized with BOOST_ONCE_INIT.

Now we did drop efficiency, and that makes you very very sad, Mr Premature Optimizer. There is good news and bad news. The bad news is that the Boost Thread implementation of boost::call_once is slow on Windows. The good news is that it is fast on everything else.

Oh by the way, I lied. There is no part 2.

Disclaimer: This code may not work at all. You are free to not use it.