cool application of HOF


Date: Mon, 7 Oct 2002 12:30:29 -0500 (CDT)
From: Will Benton <willb@xxxxxxxxxxx>
Subject: cool application of HOF
Folks--

This is a cool note from comp.lang.ml; it came out of a question of
"what are higher-order functions good for" and presents a lexer
generator in under a page of code.  I thought it was neat, anyway.



best,
wb

-- 
Will Benton      | "Die richtige Methode der Philosophie wäre eigentlich 
willb@xxxxxxx    |  die: Nichts zu sagen, als was sich sagen läßt...."


To: comp-lang-ml@xxxxxxxxxxxxxxxxxx
Subject: lexer generator in four functions
From: Thant Tessman <thant@xxxxxxx>
Date: Mon, 07 Oct 2002 08:08:53 -0600
Delivered-To: leaf+ml@xxxxxxxxxxxxxxxxx
Newsgroups: comp.lang.ml
Organization: XMission http://www.xmission.com/
Resent-Date: Mon, 07 Oct 2002 12:29:48 -0400
Resent-From: Leaf Eames Petersen <Leaf_Eames_Petersen@xxxxxxxxxxxxxxxxxxxx>
Resent-Message-Id: <200210071638.g97Gckt27213@xxxxxxxxxxxxxxxxxxxxx>
Resent-To: sml-redistribution@xxxxxxxxxx
Sender: news <news@xxxxxxxxxxxx>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.1b) Gecko/20020721


Okay, I finally spent a couple hours this weekend reassembling my 
favorite concise example of why higher-order functions are so cool. It 
is a lexer generater comprised at its core of only four small functions. 
The programmer assembles these functions into regular expression pattern 
matching functions.

The idea is that a pattern matcher function takes a list of streams, and 
returns a new list of streams advanced by every combination allowed by 
the pattern matcher function. Note that the number of streams returned 
by the function typically won't match the number of streams passed in. 
If the pattern doesn't match at all, the empty list is returned. In this 
implementation, a stream is simply a tuple containing a list of 
characters consumed by the pattern matcher, and a list of characters not 
yet consumed.

I've added a lot of explanatory comments, but it's worth noting that 
without them this thing fits on a single page.


(* The first function 'tok' builds a pattern matcher function
    that matches a single specified token (character). *)

   fun tok t =
     let fun f (s,h::r) = if h=t then SOME (h::s,r) else NONE
           | f _ = NONE
     in List.mapPartial f
     end


(* This matches a sequence of patterns. *)

   fun seq rules streams =
       foldl (fn (f, s) => f s) streams rules


(* This matches any of a list of patterns. It's analogous to
    a series of patterns separated by the '|' in traditional
    regular expressions. *)

   fun bar rules streams =
       List.concat (map (fn f => f streams) rules)


(* Kleene closure. Analogous to '*' *)

   fun star rule streams =
       let fun loop streams =
           case (rule streams) of
                    [] => []
                  | results => results::(loop results)
       in List.concat (streams::(loop streams))
       end


(* The rest of these are built from the previous four and
    are provided for convenience. *)


(* Positive closure. Analogous to '+' *)

   fun pos rule = seq [rule, star rule]


(* Optional pattern. Analogous to '?' *)

   fun opt rule = bar [rule, fn x => x]


(* Range of characters. Analogous to the dash within
    a character class '[]' *)

   fun range (a,b) =
       if b<a then range (b,a) else
       let fun loop i = if (i<=Char.ord b)
                        then (tok (Char.chr i)) :: loop (i+1)
                        else []
       in bar (loop (Char.ord a))
       end


(* Matches a literal string specified by 's' *)

   fun lit s = seq (map tok (String.explode s))


(* Matches any of a set of characters. *)

   fun set s = bar (map tok (String.explode s))


(* The next two functions are for demonstrating the use of
    the above functions. This first function takes the resulting
    streams produced by the application of a pattern on a stream
    (or streams) and selects the longest match. *)

   fun longest streams =
       let
           fun loop [] = (0,([],[]))
             | loop ((eaten, food)::rest) =
               let
                   val (max,stream) = loop rest
                   val l = List.length eaten
               in
                   if l<max then (max,stream)
                   else (l, ((List.rev eaten), food))
                   end
           val (_,stream) = loop streams
       in
           stream
       end


(* This takes a rule and a string, turns the string into
    a list of streams (containing one stream), applies the
    rule, and returns the longest match. *)

   fun lex rule s = longest (rule [([],String.explode s)])


(* A demonstration of the above. Here's a pattern to match
    floating point numbers. *)

(* "-"?(([0-9]+(\.[0-9]+)?)|(\.[0-9]+))([eE][+-]?[0-9]+)? *)

   val digit = range (#"0",#"9")
   val digits = (pos digit)
   val mantissa = seq [tok #".", digits]
   val exp = seq [set "eE", opt (set "+-"), digits]
   val real = (seq [opt (tok #"-"),
                    (seq [bar [seq [digits, opt mantissa], mantissa],
                          opt exp])])

With the above defined, you can do this:

- lex real ".45e-5";
val it = ([#".",#"4",#"5",#"e",#"-",#"5"],[]) : char list * char list

- lex real "hi there";
val it = ([],[]) : char list * char list

And so on...


I think a full-fledged lexer generator in a single page of code is a 
good example of the power of functional programming over imperative/OO.

-thant





[← Prev in Thread] Current Thread [Next in Thread→]
  • cool application of HOF, Will Benton <=