Reiner Pope

July 19, 2009

Building first-class patterns, continued.

Filed under: pattern-combinators — Reiner Pope @ 1:50 am

This post is in Literate Haskell. The entire contents can be copied into a file called PatternMaybe.lhs and will be compilable by GHC.

The code from this post is available on Hackage, in the first-class-patterns library.

In my previous post I developed difference type-lists with non-recursive append. Using that framework, we can solve the efficiency problem of my first attempt at nicer types for pattern combinators. At the same time, I find the implementation here easier to understand than the original one.


> {-# LANGUAGE TypeFamilies, TypeOperators, Rank2Types #-}
> module PatternMaybe (
>     Tuple, zero, one, (<>),
>     Fun,
>     Pattern(..),
>     Clause, (|||), match, (->>),
>     var, pair, mk1, left, right, cst,
>     test1,
>  ) where

We need the D type and the Difference class from the previous post

> import Difference
> import Control.Monad(mplus)
> import Data.Maybe(maybe)
> import Control.Applicative

(Curried) Functions

We define a type family of curried functions from xs to r, where xs is a type list. This is the same family as CurryFunc from my first post, but by a different name.

> -- | Curried functions.
> type family   Fun xs r
> type instance Fun Nil     r = r
> type instance Fun (h:*:t) r = h -> Fun t r

(CPSed) Tuples

We need tuples to store the variables bound by a pattern. If we CPS-encode the standard tuple type you get the following type, Tuple', for tuples. We then use the Difference machinery (under the D type) we developed in the previous to give Tuple' an efficient append, which is the Tuple type.

> newtype Tuple' xs = Tuple' { runTuple' :: forall r. Fun xs r -> r }
> newtype Tuple xs = Tuple (D Tuple' xs)

The following functions manipulate Tuples:

> zero :: Tuple Nil
> zero = Tuple zeroD
> one :: a -> Tuple (a :*: Nil)
> one a = Tuple (mkOneD (\(Tuple' t) -> Tuple' (\k -> t (k a))))
> (<>) :: Tuple xs -> Tuple ys -> Tuple (xs :++: ys)
> (Tuple xs) <> (Tuple ys) = Tuple (xs `plusD` ys)
> runTuple :: Tuple xs -> Fun xs r -> r
> runTuple (Tuple t) = runTuple' (evalD (Tuple' id) t)

The first three are for building Tuples; the last is for using Tuples.

This representation of tuples as “differences” of CPS-encoded tuples is essentially the same as Morten Rhiger used in Section 3.2 of Type-safe pattern combinators.

The Pattern type

We can now define the Pattern type and Clause types. Unlike in Morten’s code (which I used in my first post), we handle failure using the Maybe monad.

> newtype Pattern vars a = Pattern { runPattern :: a -> Maybe (Tuple vars) }
> newtype Clause a r = Clause { runClause :: a -> Maybe r }

In the export list, the Pattern type exposes its implementation. This is safe: the Difference module defined in the previous post contains the trusted kernel and the only unsafe code.

The Clause type is exported abstract. The following three combinators are sufficient for all uses of Clause I am currently aware of; if new needs arise, a user can always define their own Clause type, because the underlying Pattern type is fully exposed.

> infix 2 ->>
> infixr 1 |||
> (->>) :: Pattern vars a -> Fun vars r -> Clause a r
> (Pattern p) ->> k = Clause (\a -> fmap (\f -> runTuple f k) (p a))
> (|||) :: Clause a r -> Clause a r -> Clause a r
> l ||| r = Clause (\a -> runClause l a `mplus` runClause r a)
> match :: a -> Clause a r -> r
> match a c = maybe (error "match") id $ runClause c a


I’ll define just a few combinators, to give an example of Pattern’s use.

> var :: Pattern (a :*: Nil) a
> var = Pattern (Just . one)
> pair :: Pattern as a -> Pattern bs b -> Pattern (as :++: bs) (a,b)
> pair (Pattern m) (Pattern n) = Pattern (\(a,b) -> (<>) <$> (m a) <*> (n b))
> -- | Useful for building one-argument pattern combinators
> mk1 :: (a -> Maybe b) -> (Pattern vars b -> Pattern vars a)
> mk1 g (Pattern p) = Pattern (\a -> g a >>= p)
> left :: Pattern vars a -> Pattern vars (Either a b)
> left = mk1 (either Just (const Nothing))
> right :: Pattern vars b -> Pattern vars (Either a b)
> right = mk1 (either (const Nothing) Just)
> cst :: (Eq a) => a -> Pattern Nil a
> cst x = Pattern (\a -> if a==x then Just zero else Nothing)

Now, we can put all of this together in a test example.

> test1 :: Either Int (Int, Int) -> Int
> test1 a = match a $
>       left (cst 4)         ->> 0
>   ||| left var             ->> id
>   ||| right (pair var var) ->> (+)
> ex2 :: Num a => Either a (a,a) -> a
> ex2 a = case a of
>        Left 4      -> 0
>        Left x      -> x
>        Right (x,y) -> x+y

If we compile this module with -O2 and look at the GHC Core output, we see that test1 is compiled to the following Core code:

PatternMaybe.$wtest1 :: Data.Either.Either
                          Int (Int, Int)
                        -> Int#
PatternMaybe.$wtest1 =
  \ (a :: Data.Either.Either
                 Int (Int, Int)) ->
    case a of a' {
      Data.Either.Left x ->
        case x of x' { I# x1 ->
        case x1 of x1' { __DEFAULT -> x1'; 4 -> 0 }
      Data.Either.Right y ->
        case y of y' { (a3, b) ->
        case a3 of a3' { I# x ->
        case b of b' { I# y ->
        +# x y

So, test1 has been compiled to a standard Haskell pattern match. So the overhead has been completely optimised away, as we wanted.

The first-class-patterns library on Hackage is based on this implementation, and has more combinators. Have a look there for more.


July 18, 2009

Building first-class patterns. The “Difference type-list”.

Filed under: pattern-combinators — Reiner Pope @ 9:54 am

This post is in Literate Haskell. The entire contents can be copied into a file called Difference.lhs and will be compilable by GHC.

In my previous post I used type families to simplify the types in Morten Rhiger’s “Type safe pattern combinators”. Unfortunately, in order to convince GHC that various types were in fact equal (such as (a :++: b) :++: c and a :++: (b :++: c)) we needed to (recursively) build type equality witnesses. In doing this, we prevented GHC optimising away all the overhead, since GHC will never inline a recursive function.

In this post I start to solve the efficiency problem by using unsafeCoerce. By carefully choosing the exports from the module, we can ensure that the API presented is never-the-less safe.

Type lists, again

These are the same as defined in the previous post, so I give no explanation of them.

> {-# LANGUAGE TypeFamilies, EmptyDataDecls, TypeOperators, ScopedTypeVariables, Rank2Types, GADTs, GeneralizedNewtypeDeriving #-}
> module Difference(
>     Nil,
>     (:*:),
>     (:++:),
>     Difference(..),
>     D,
>  ) where
> import Unsafe.Coerce
> data Nil
> data h :*: t
> infixr :*:
> type family (:++:) a b
> type instance Nil     :++: xs = xs
> type instance (h:*:t) :++: xs = h :*: (t :++: xs)
> data Proxy a
> proxy :: Proxy a
> proxy = undefined

Difference type-lists

It will be (also, has been) a common theme in building the pattern combinators to define a type inductively for lists. For instance, in the previous post, we defined Tuple xs and CurryFunc xs as types on lists, essentially as folds on the list structure: e.g. with Tuple, replace every (:*:) with a (,), and every Nil with a ().

We can efficiently (read: non-recursively) build objects of these types when we only build the types using Nil and (:*:); however, we have more difficulties when we want to efficiently construct such types using (:++:). For instance, suppose we wanted to construct a function

f :: Tuple xs -> Tuple ys -> Tuple (xs :++: ys)

With the definition of Tuple used in the previous post, we would find it difficult (impossible?) to define such a function non-recursively.

In the previous post, this problem was solved by the “difference list” technique, which we used for the DConv type. It will be convenient to use difference list technique again. Here we package the technique in a single convenient type, D, which adds efficient (non-recursive) concatenation to datatypes defined on type-lists.

We put the operations we will need in a type-class, Difference, which is defined below. The idea is that, if d is an instance of Difference, and t is some datatype defined inductively on lists, then d t is the “difference list converted” version of this type — that is, it contains the same data, but with an efficient append.

> class Difference d where
>     -- | constructs the empty `d t`.
>     zeroD :: d t Nil
>     -- | appends two `d t`s.
>     plusD :: d t xs -> d t ys -> d t (xs :++: ys)
>     -- | given a "cons" operation, constructs the singleton `d t`.
>     mkOneD :: (forall ys. t ys -> t (a :*: ys)) -> d t (a :*: Nil)
>     -- | given a "nil" value, "runs" the `d t`.
>     evalD :: t Nil -> d t xs -> t xs

It is interesting that zeroD and plusD can be defined without any knowledge of the underlying type: internally, they will be implemented just as id and (.). All the knowledge is contained within mkOneD and evalD.

As a demonstration of its use, we can define difference tuples. Note that the t mentioned in the typeclass above must have kind (* -> *). This means that t cannot be a type family. However, it can be a data family, so this is how we define tuples. First, we define tuples inductively:

> data family Tuple' xs
> data instance Tuple' Nil = TupleEmpty
> data instance Tuple' (h:*:t) = TupleCons h !(Tuple' t)

The zeroD and plusD functions need no work. However, we need a little work for the singleton and evaluation functions, mkOneD and evalD:

> singletonTuple :: Difference d => a -> d Tuple' (a :*: Nil)
> singletonTuple a = mkOneD (\t -> TupleCons a t)
> evalTuple :: Difference d => d Tuple' xs -> Tuple' xs
> evalTuple = evalD TupleEmpty

There we have it: tuples with efficient append.

Now we actually need to define an instance of Difference.


The basic idea of the implementation is just

type D t xs = forall ys. t ys -> t (xs :++: ys)

However, to implement plusD, we will need to convince GHC of the equality of the types (a :++: b) :++ c and a :++: (b :++: c). I give two instances of Difference, which differ in how they prove this equality: one uses GADTs and a List type-class (as in the previous post); the other uses unsafeCoerce. As opposed to the unsafeCoerce approach, the GADT approach is guaranteed to be safe, but is recursive. The unsafeCoerce is non-recursive but circumvents the type system; however, the existence of the GADT implementation should convince us that the use of unsafeCoerce here is actually a safe one.

The GADT instance

This is very similar to the previous post, so I don’t give much explanation.

> data GadtD t xs = List xs => GadtD (forall ys. t ys -> t (xs :++: ys))
> instance Difference GadtD where
>     zeroD = GadtD id
>     plusD (GadtD fx :: GadtD t xs) (GadtD fy :: GadtD t ys) = 
>         case closure (proxy :: Proxy xs) (proxy :: Proxy ys) of
>           ListD -> GadtD (\(zs :: t zs) -> 
>             case assoc (proxy :: Proxy xs) (proxy :: Proxy ys) (proxy :: Proxy zs) of
>               Equal -> fx (fy zs))
>     mkOneD f = GadtD f
>     evalD nil (GadtD f :: GadtD t xs) =
>         case rightIdent (proxy :: Proxy xs) of
>           Equal -> f nil
> class List a where
>     closure :: forall b. List b =>
>        Proxy a -> Proxy b ->
>         ListD (a :++: b)
>     assoc :: forall b c.
>        Proxy a -> Proxy b -> Proxy c ->
>         ((a :++: (b :++: c)) :==: ((a :++: b) :++: c))
>     rightIdent :: Proxy a ->
>         (a :++: Nil) :==: a
> instance List Nil where
>     closure _ _  = ListD
>     assoc _ _ _  = Equal
>     rightIdent _ = Equal
> instance List t => List (h :*: t) where
>     closure _ b  = case closure (proxy :: Proxy t) b of
>                      ListD -> ListD
>     assoc _ b c  = case assoc (proxy :: Proxy t) b c of
>                      Equal -> Equal
>     rightIdent _ = case rightIdent (proxy :: Proxy t) of
>                      Equal -> Equal
> data a :==: b where
>     Equal :: a :==: a
> data ListD a where
>     ListD :: List a => ListD a

The unsafeCoerce instance

This instance is a lightweight one. It is the same as the GADT one except that, whenever we needed an equality proof before, we use unsafeCoerce.

> newtype CoerceD t xs = CoerceD (forall ys. t ys -> t (xs :++: ys))
> instance Difference CoerceD where
>     zeroD = CoerceD id
>     plusD (CoerceD fx :: CoerceD t xs) (CoerceD fy :: CoerceD t ys) = 
>         CoerceD (\(zs :: t zs) ->
>             case assoc2 (proxy :: Proxy xs) (proxy :: Proxy ys) (proxy :: Proxy zs) of
>               Equal -> fx (fy zs))
>     mkOneD f = CoerceD f
>     evalD nil (CoerceD f :: CoerceD t xs) = 
>         case rightIdent2 (proxy :: Proxy xs) of
>           Equal -> f nil
> assoc2 :: Proxy a -> Proxy b -> Proxy c -> (a :++: (b :++: c)) :==: ((a :++: b) :++: c)
> assoc2 _ _ _ = unsafeCoerce Equal
> rightIdent2 :: Proxy a -> (a :++: Nil) :==: a
> rightIdent2 _ = unsafeCoerce Equal

The only equalities we prove with unsafeCoerce are a :++: (b :++: c) ~ (a :++: b) :++: c and a :++: Nil ~ a. These equalities are certainly true for all instances of the List class given in the GADT implementation. The GADT implementation also shows that, for any value of type d t xs constructible using the API of the Difference class, xs will be an instance of List. So the equalities used are true (safe) for any CoerceDs constructed using only the Difference interface. So our use of unsafeCoerce is safe if we export only the Difference class, and the name of the type CoerceD. If you look at the imports, this is what I have done.

Wrapping up

Finally, I also export the type D, which is just a newtype of CoerceD, with a derived instance of Difference:

> newtype D t xs = D (CoerceD t xs) deriving(Difference)

This exposes an instance of Difference to the outside world, but hides which one it is. This gives us a single place to change the implementation from CoerceD to GadtD if we so desire.

It turns out that this module encapsulates all the uses of GADTs that we will need.

I’ll stop here for now, but we will see in later posts how this type is used, and we will see the efficiency it allows.

Create a free website or blog at