Haskell

At the end of my previous learn-Haskell post I was just finished using some lifters for monads for a log parser. The next part of the homework was to make a binary tree to hold the log messages. I think I made something similar in ML as part of Programming Languages on Coursera, and the types for the tree nodes were already defined, so this was fairly straightforward.

The main part to get right was inserting log messages in the correct order:

insert :: LogMessage -> MessageTree -> MessageTree
insert (Unknown _) t = t
insert m Leaf = Node Leaf m Leaf
insert _ (Node _ (Unknown _) _) = error "Invalid Node: a node must not contain Unknown"
insert m@(LogMessage _ t _) (Node l m2@(LogMessage _ t2 _) r) = case compare t t2 of
  LT -> insertOnLeft
  EQ -> insertOnLeft
  GT -> Node l m2 (insert m r)
  where insertOnLeft = Node (insert m l) m2 r

I could have done a few things differently – the error condition could be ignored, but it feels much better to include it so that I have a total function (as far as I understand function totality). I could also have taken a different approach to the last case, using an if then else such as if (t <= t2) then Node (insert m l) m2 r else Node l m2 (insert m r). This may be more readable, now that I think about it, since compare t t2 isn’t quite as intuitive about what is less than what. I could also probably have done something with guards, which might have been nicer.

One thing I would have liked to do is leave off the t in the first pattern, giving insert (Unknown _) = id, but this seems to make it incompatible with the other patterns.

As usual, I am sure there are some abstractions I could use over this code to make it simpler, but I am happy enough with this for now.

I ended up with a nice little function to build an ordered tree out of an unordered list of log messages:

build :: [LogMessage] -> MessageTree
build = build' Leaf
  where build' = foldr insert

Looking at that now, I think I mustn’t have sat back and looked at it after I simplified the where clause, because I could as easily have written build = foldr insert Leaf, which is faster to comprehend.

Converting the ordered tree back to a list was straightforward – I’m not totally confident about the performance characteristics, but I doubt that is worth worrying about at this stage of my learning.

inOrder :: MessageTree -> [LogMessage]
inOrder = (`inOrder'` [])
  where
    inOrder' Leaf lst = lst
    inOrder' (Node l m r) lst = inOrder' l (m : inOrder' r lst)

I have been working through the week 2 homework exercises, and had a bit of a rocky start. I got it working without much trouble, but I had a real mess of code:

-- parses individual line from the log file
parseMessage :: String -> LogMessage
parseMessage line = case words line of
  "I" : time : message -> case maybeRead time :: (Maybe Int) of
                            Just t -> LogMessage Info t (unwords message)
                            _ -> Unknown line
  "W" : time : message -> case maybeRead time :: (Maybe Int) of
                            Just t -> LogMessage Warning t (unwords message)
                            _ -> Unknown line
  "E" : severity : time : message -> case maybeRead severity :: (Maybe Int) of
                            Just s -> case maybeRead time :: (Maybe Int) of
                                        Just t -> LogMessage (Error s) t (unwords message)
                                        _ -> Unknown line
  _ -> Unknown line

This is awful – so full of duplication and really not pleasant to read. I couldn’t leave it like that, so I set about at least separating out the code for determining the message type. I ended up with something at least moderately readable:

parseMessage line = let (messageType, rest) = maybeGetMessageTypeAndRemainder line in
  case messageType of
    Nothing -> Unknown line
    Just aType -> case rest of
           [] -> Unknown line
           time : message -> case maybeRead time :: (Maybe Int) of
             Just t -> LogMessage aType t (unwords message)
             _ -> Unknown line

-- partition the message type and remainder of the string
maybeGetMessageTypeAndRemainder :: String -> (Maybe MessageType, [String])
maybeGetMessageTypeAndRemainder line = case words line of
  "I" : rest -> (Just Info, rest)
  "W" : rest -> (Just Warning, rest)
  "E" : severity : rest  -> case maybeRead severity :: (Maybe Int) of
                              Just s -> (Just (Error s), rest)
                              _ -> (Nothing, words line)
  _ -> (Nothing, words line)

It still looked wrong though, with a number of cases mapping to the exact same code. It was at this point that my brain kicked into gear, and I had to laugh out loud when I realized that I was using the Maybe monad and completely ignoring one of the main uses of monads: that thing where you chain them together. I had forgotten what it was called (the back of my mind said ‘bind’, but it is a human mind so I did not trust it fully), but I knew what I wanted it to do, so I plugged this into Hoogle: m a -> (a -> m b) -> m b, and it obligingly reminded me that I’m looking for (>>=) :: Monad m => m a -> (a -> m b) -> m b. It was indeed ‘bind’.

After a short break to make some tea and let my brain process the problem a little with its rediscovered knowledge, I set about setting things right.

My approach would be to use Maybe tuples to pass through the parsed and unparsed sections of a log message. The tuples would increase in size until they were used at the end to construct a LogMessage… but wait! It occurred to me that I may even be able to get away with only using pairs, if I could partially apply the LogMessage constructor. I gave it a little try in ghci:

*Log> :t LogMessage
LogMessage :: MessageType -> TimeStamp -> String -> LogMessage
*Log> :t LogMessage Warning
LogMessage Warning :: TimeStamp -> String -> LogMessage
*Log> :t LogMessage Warning 5
LogMessage Warning 5 :: String -> LogMessage

Aha! So I would be able to build up a log message along the way.

Tried a bunch of stuff, got confused about bind (>>=), nutted it out and used (>=>) but had to add return to the final appendMessage.

At this point I asked on #haskell-beginners on freenode, and got a bit of advice:

jle> so the thing is, if you have m a jle> you can apply all sorts of functions to it, you just need the right lifter jle> if you have an (a -> b), you use fmap jle> if you have an m (a -> b), you use ap (or (<*>)) jle> if you have an (a -> m b), you use (=<<) , or (>>=) jle> fmap :: (a -> b) -> m a -> m b jle> (<*>) :: m (a -> b) -> m a -> m b jle> (=<<) :: (a -> m b) -> m a -> m b jle> so…there ya go

Then went on with a whole lot of information, too much to process for the moment. I was directed to watch the week 4 videos of a functional programming course, so I have added those to the list.

I persevered a bit more and ended up with something that seems about as concise as I can make it:

parseMessageType :: String -> Maybe (TimeStamp -> String -> LogMessage, [String])
parseMessageType line = case words line of
  "I" : rest -> Just (LogMessage Info, rest)
  "W" : rest -> Just (LogMessage Warning, rest)
  "E" : rest -> parseErrorSeverity rest
  _ -> Nothing

parseErrorSeverity :: [String] -> Maybe (TimeStamp -> String -> LogMessage, [String])
parseErrorSeverity (severity : rest) = case maybeRead severity :: (Maybe Int) of
                                  Just s -> Just (LogMessage (Error s), rest)
                                  _ -> Nothing
parseErrorSeverity _ = Nothing

parseTimeStamp :: (TimeStamp -> String -> LogMessage, [String]) -> Maybe (String -> LogMessage, [String])
parseTimeStamp (_, []) = Nothing
parseTimeStamp (logMessage, timeStamp : rest) = case maybeRead timeStamp :: (Maybe TimeStamp) of
                                    Just t -> Just (logMessage t, rest)
                                    _ -> Nothing

appendMessage :: (String -> LogMessage, [String]) -> LogMessage
appendMessage (logMessage, message) = logMessage . unwords $ message

parseValidMessage :: String -> Maybe LogMessage
parseValidMessage = fmap appendMessage . (parseMessageType >=> parseTimeStamp)

parseMessage :: String -> LogMessage
parseMessage = liftM2 fromMaybe Unknown parseValidMessage

parseLog :: String -> [LogMessage]
parseLog = map parseMessage . lines

I can probably simplify some of the earlier functions in wonderful ways of which I am not yet aware, but that will do for now. I can always look back at this later when I have learned more, and laugh at the verbosity of the above.

I did some reading and got through all the recommended chapters of Learn You A Haskell and Real World Haskell.

Playing around

I also explored several concepts, including composing functions that take multiple arguments. I did not find evidence of any built-in functions, so I ended up defining this:

composeDiadic :: (c -> d) -> (a -> b -> c) -> a -> b -> d
composeDiadic f g x y = f (g x y)

(.:) = composeDiadic

I wanted to make this because I was playing around with lists of lists, and found intersperse, concat, and a function that combines the two: intercalate. I wanted to rewrite intercalate as a little exercise, but it did not like concat . intersperse since (.) does not expect any function that takes 2 arguments.

Lecture 2

Algebraic data types are defined like so:

data MyType = Constructor1
            | Constructor2 Type1
            | Constructor3 TypeA TypeB
  deriving Show
  • add deriving Show after a data type definition to automatically make it able to be displayed as a string.
  • type constructors and data constructors inhabit different namespaces, so single-constructor types often use the same name for both.

Pattern Matching

  • Parens are needed for pattern matching around a data constructor that has one or more parameters.
  • Bind the whole value to a name using something like myFuncton name@(Constructor2 a) = <do something with name>

I decided to get my editor set up this evening, before I make a start on the homework exercises.

I tend to use Sublime Text 3, so I am adding Sublime Haskell to get it sorted.

The first instruction is to get some dependencies by running cabal install aeson haskell-src-exts haddock so I go ahead and do that. Most of it seems to work, but it fails when compiling haddock. I could have saved myself a bit of time by reading the instructions more thoroughly, but I was playing games at the same time. The problem was that the haddock version was too new for the version of ghc that I have, and I had to constrain it to an earlier version, like so: cabal install haddock --constraint=haddock==2.13.2.1.

I decided to go ahead and add some of the “optional, but useful” requirements as well:

cabal install ghc-mod
cabal install stylish-haskell
cabal install haskell-docs
cabal install hdevtools

They all seem to work without any problems.

I frequently forget how to bring up package control in Sublime Text. To do it:

  • Ctrl + Shift + P
  • type ‘Install’ to filter for ‘Package Control: Install Package’
  • press Enter and wait for the next prompt
  • type ‘SublimeHaskell’ and press Enter

That’s it, it seems to be ready to go. Entering Ctrl+Shift+P, haskell gives me a nice set of options.

I started on the first homework. The first exercise was simple enough, and I incidentally learned something about monadic map:

I wanted to have my program print a few different test values, so I made an array of print . This didn’t work so well – main is expected to be of type IO (), and I had ended up with [IO ()]. The solution was to use mapM_ with the list of display values.

  • mapM is a monadic map
  • mapM_ is similar but that does not collect its output

The second exercise was too much for my late-evening brain, so I turned in for the night.

At work the next day I played around on GHCi with ideas my brain had come up with overnight, and got it sorted:

Prelude> :type zip
zip :: [a] -> [b] -> [(a, b)]
Prelude> :type zipWith
zipWith :: (a -> b -> c) -> [a] -> [b] -> [c]
Prelude> zipWith apply [1, 2, 3] cycle $ [id (*2)]

<interactive>:1:9: Not in scope: `apply'
Prelude> zipWith (f x -> f x) [1, 2, 3] cycle $ [id (*2)]

<interactive>:1:10:
    Pattern syntax in expression context: f x -> f x
Prelude> zipWith ((f x) -> f x) [1, 2, 3] cycle $ [id (*2)]

<interactive>:1:10:
    Pattern syntax in expression context: (f x) -> f x
Prelude> zipWith ($) [1, 2, 3] cycle $ [id (*2)]

<interactive>:1:23:
    Couldn't match expected type `[b0]' with actual type `[a0] -> [a0]'
    In the third argument of `zipWith', namely `cycle'
    In the expression: zipWith ($) [1, 2, 3] cycle
    In the expression: zipWith ($) [1, 2, 3] cycle $ [id (* 2)]
Prelude> zipWith ($) [1, 2, 3] (cycle $ [id (*2)])

<interactive>:1:20:
    No instance for (Num ((a0 -> a0) -> c0))
      arising from the literal `3'
    Possible fix:
      add an instance declaration for (Num ((a0 -> a0) -> c0))
    In the expression: 3
    In the second argument of `zipWith', namely `[1, 2, 3]'
    In the expression: zipWith ($) [1, 2, 3] (cycle $ [id (* 2)])
Prelude> zipWith ($) (cycle $ [id (*2)]) [1,2,3]
[2,4,6]
Prelude> zipWith ($) (cycle $ [id, (*2)]) [1,2,3]
[1,4,3]
Prelude> zipWith ($) (cycle $ [id, (*2)]) [1,2,3,4,5]
[1,4,3,8,5]
Prelude> zipWith ($) (cycle $ [id, (*2)]) [1,1,1,1,1]
[1,2,1,2,1]
Prelude> let doubleAlternate = zipWith ($) (cycle $ [id, (*2)])
Prelude> doubleAlternate [2,2,2,2,2,2,2,2,2,2]
[2,4,2,4,2,4,2,4,2,4]
Prelude> doubleAlternate [1,2,3,4]
[1,4,3,8]
Prelude> let doubleAlternateFromEnd = reverse . doubleAlternate . reverse
Prelude> doubleAlternateFromEnd [1,2,3,4]
[2,2,6,4]
Prelude>

Two things are apparent from this:

  1. I need to get more sleep.
  2. I would benefit from some review of things like:
    • operator precedence
    • what exactly $ means
    • paying attention to the order of parameters

I was bored for a moment and had a quick play with the next exercise:

Prelude> :type fold
foldl   foldl1  foldr   foldr1
Prelude> :type foldl
foldl :: (a -> b -> a) -> a -> [b] -> a
Prelude> :type foldl1
foldl1 :: (a -> a -> a) -> [a] -> a
Prelude> let sumDigits = foldl1 (+)
Prelude> sumDigits [5,15,2,8]
30
Prelude>

Putting it all together once I got home was a simple matter of stringing a set of functions together:

validate = (== 0) . (`mod` 10) . sumDigits . doubleEveryOther . toDigits

I would prefer all the functions to be strung together the other way, at least with how I think about things at the moment. I would need (a -> b) -> (b -> c) -> a -> c, but Hoogle just matches (.) and some things with Strategy b, which don’t help me. I could define something like (.>) = flip (.), but for all I know I’m stomping on an existing symbol.

Further investigation turned up >>>, which looks promising. I can import it with import Control.Category ((>>>)). The selective import is necessary, since Control.Category appears to have a function (.), which clashes with the (.) from prelude (which looks to be an alias of <<<). I presume from the package name that these operators are getting right into some category theory, so I might do some reading into that later on.

It is Sunday evening – yesterday was a LAN and I was up gaming until the wee hours. Now I am back home. I have had enough of games for now; I am sleepy, but it is too early for bed; I feel like doing something. Why not Haskell?

I will begin the journey of learning described in learnhaskell, a guide by Chris Allen.

Installation

This was pretty straightforward. The steps I followed are here for posterity:

  • downloaded and installed Haskell Platform for Windows

    • there was one little WTF in the process – after the installation was done, the wizard informed me that I did not have something called “GLUT”, and asked if I wanted to install it. No indication of what GLUT is. A little investigation indicated that it is probably something related to OpenGL. I decided to install it; I doubt I will use it any time soon, but I was feeling adventurous.
  • opened cmd and ran ghci, and played around to make sure it worked:

    C:\Users\Monkey>ghci
    GHCi, version 7.6.3: http://www.haskell.org/ghc/  :? for help
    Loading package ghc-prim ... linking ... done.
    Loading package integer-gmp ... linking ... done.
    Loading package base ... linking ... done.
    Prelude> 1 + 1
    2
    Prelude> let x = 5 in x * x
    25
    Prelude> let f x = case x of 1 -> "hello"; _ -> "goodbye" in f 1
    "hello"
    Prelude> let f x = case x of 1 -> "hello"; _ -> "goodbye" in f 45
    "goodbye"
    Prelude>
    Leaving GHCi.
    
  • looked at learnhaskell and it said I should have cabal, so I ended up running cabal update then cabal install cabal cabal-install, which seems a bit odd but it said to do that in Getting The Haskell Cabal. It seemed to work as the output included Installed Cabal-1.20.0.1 and Installed cabal-install-1.20.0.3.

First Steps

learnhaskell is telling me to do the “Yorgey course” first. I am informed that this will, in addition to equipping me to write Haskell, help me to understand parser combinators. I am glad about this, because the Wikipedia page for Parser combinator does not seem to be helping much with understanding. Perhaps I just lack education about combinators in general, and about context-free grammars.

Much of the first lecture is familiar from previous study. Key points:

  • Integer is an integral type that is only limited by computer memory, whereas Int is just a system integer that can only be guaranteed to deal with values up to +/- 2^29 (may be bigger depending on machine).
  • backticks make prefix operators infix, parens make infix operators prefix:

    1 + 2 == (+) 1 2
    mod 2 1 == 2 `mod` 1
    (%) = mod
    2 `mod` 1 == 2 % 1
    
  • pattern matching is handy for functions. Expressions are checked in the order they are written:

    factorial :: Integer -> Integer
    factorial 0 = 1
    factorial 1 = 1
    factorial n = n + factorial (n - 1)
    
  • guards use pipe character:

    <functionName> <pattern>
      | <predicate expression> = <value>
      | <predicate expression> = <value>
      | <predicate expression> = <value>
    

    e.g.

    foo n
      | n < 0 = 0
      | n < 10 = n + 10
      | otherwise = n
    
  • function application has higher precedence than any infix operator, so f 3 n+1 7 parses as (f 3 n) + (1 7)

  • make lists with [1, 2, 3], or cons operator: 1 : 2 : 3 : []. They are singly linked lists (cons lists), with type [Integer].

  • String is just sugar for [Char], so ['h', 'i'] == "hi"

Other than a reassurance about error messages, that is it for the lecture. Next is the first homework. It looks exciting, with credit card number validation and towers of Hanoi, among other less illustrated problems, but will have to wait until after I have had some sleep.

The course also recommends Chapter 2 of Learn You A Haskell and two Real World Haskell chapters: Chapter 1 and Chapter 2, so I will have a read of those.