Bill Casarin

  • Archive
  • RSS

Iteratee Hello World

This entire blog post is in literate Haskell, you can clone it from the gist to try it out:

git clone git://gist.github.com/1571444.git iteratee-hello-world

This is the simplest example of an Iteratee that I could think of. It simply reads from an stdin/file Enumerator and spits it out to an stdout Iteratee. The stream is transformed to uppercase characters using an Enumeratee.

I put this together to figure out how to apply the Iteratee abstraction in everyday situations, hopefully this will serve as a jump off point for people looking to learn about Iteratees.

  • 1 month ago
  • Comments
  • Permalink
  • Share
    Tweet

Making IO functions safe in Haskell

Functions that perform IO in Haskell can bring along some nasty surprises. Functions that download a web page or read data from disk can throw unexpected Exceptions, or they may simply not return at all. To reason about these problems, we should expect that they may eventually happen and represent this in the return type of these functions.

In other words, what we’re looking for is a function that wraps (or “lifts”) existing functions to safer functions.

The first thing we need is a data type that can represent two possible errors: Timeouts and Exceptions:

Now we need a function that takes an arbitrary IO function (a -> IO b) and converts it into a function that handles exceptions and timeouts:

The try function is from Control.Exception and simply wraps a catch statement with an Either. The timeout function is from System.Timeout and simply adds a timeout to an IO action. I compose all of these together and convert the resulting mangled type consisting of Either’s and Maybe’s into our Error type.

And you’re done! Here’s an example of lifting an existing function that downloads webpages into a safer version:

safeDL 0 "http://google.com" would result in Left Timeout, any exceptions would result in Left (Exception SomeException) and a good result would be Right HttpResponse

Cheers!

  • 7 months ago
  • Comments
  • Permalink
  • Share
    Tweet

Speculatively constructing a type with Applicative

The Maybe type in Haskell is a useful construct for representing types that may or may not have a value. It also gains much of its expressive power from the typeclasses it implements.

One typeclass that Maybe implements is Applicative. Using Applicative we can construct types within Maybe. In other words, if any of the parameters in the constructor are Nothing, then construction of the type results to Nothing.

To see how this works, let’s look at a regular constructor

data SomeType = JustSomeType Int String

SomeType’s constructor is like so:

JustSomeType :: Int -> String -> SomeType

In the normal constructor, we are 100% sure that we have an Int and String before construction. What if we’re not sure if we have an Int or a String?

maybeSomeType :: Maybe Int -> Maybe String -> Maybe SomeType

I concise way to write this function is by taking advantage of Applicative to lift the constructor into Maybe:

maybeSomeType maybeInt maybeStr = JustSomeType <$> maybeInt <*> maybeStr

This function returns Nothing if either maybeInt or maybeStr is Nothing

What makes <$>, and <*> so cool is that they provide the tools necessary for working inside of things. You don’t need to worry about writing a whole new set of functions when you start working in a container, like IO, Either, Maybe, etc.

Monad’s first clicked for me when I realized that it was just a more powerful version of Functor/Applicative for working inside of containers. I’ll save you a Monad tutorial though, there’s enough of those everywhere.

    • #haskell
    • #applicative
    • #typeclass
  • 8 months ago
  • 37
  • Comments
  • Permalink
  • Share
    Tweet

Using Haskell’s QuickCheck to generate random test data

QuickCheck is a random testing library written in Haskell and is typically used for fuzz testing code. The main typeclass provided by QuickCheck is the Arbitrary typeclass. When you make one of your data types an instance of this typeclass (by implementing the arbitrary function) you can generate random samples of those data types

I recently needed a way to generate a large number of Serial numbers of a specific format, QuickCheck turned out to be perfect for this.

I started out by creating a new data type representing my serial number:

data Serial = Serial String Int

The String represents a random prefix (eg. “ABC”), and the Int represents some number which could represent the order number, etc. Implementing show allows me to easily convert this data type to a string:

instance Show Serial where
  show (Serial prefix number) = prefix ++ show number

Now to generate Serials we implement Arbitrary for our Serial type:

instance Arbitrary Serial where
  arbitrary = do
    prefix <- vectorOf 3 $ elements ['A'..'Z']
    number <- choose (10000, 99999)
    return $ Serial prefix number

The magic happens during the call to arbitrary. arbitrary returns the type Gen Serial which represents a computation that generates a random Serial. Gen is a Monad, so that allows us to use do notation to build our generating computation.

The first line in our do block is a call to elements, which chooses a single element out of a list of elements. The vectorOf 3 says to apply this generator 3 times and assign it to prefix. As you can probably guess by now this generates a random string of length 3 with random characters between ‘A’ and ‘Z’.

The second line in the do block is a call to choose which chooses a single value between a range of values.

Finally we return our serial with the generated prefix and number.

We can now generate a list of random serials by using unGen:

serialGen :: Int -> [Serial]
serialGen seed = unGen arbitrary (mkStdGen seed) 9999999

Some data from this function:

LWQ74236
IZK97057
MTT84566
FBL91704
OUX61740
GFX20409
SGO76263
SXE22215
JNH61151
ZQY93980

Naturally you can apply this method to any data type you can think of, the Gen Monad is a great DSL for generating random test data!

The full program:

    • #haskell
    • #dsl
    • #quickcheck
  • 8 months ago
  • 7
  • Comments
  • Permalink
  • Share
    Tweet

redo by Example: Git Submodule Dependencies

On a fresh checkout you can sometimes forget to type git submodule init and git submodule update, but luckily by using redo we don’t even have to think about it. The existence of the .git folder in submodules can just be added as a build rule/dependency.

Any part of your build system can use this rule to make sure the submodule exists before attempting to do anything with it.

All you need are these three .do files in your root .git directory:

default.git.do

exec >/dev/null
git submodule init $1 && git submodule update $1    

submodules.do

exec >&2
redo-ifchange $(git submodule status | awk '{ print $2"/.git" }')

all.do

redo-ifchange submodules

You can now use redo to update all your submodules as a part of your build. You can even add a submodule dependency as a build rule to make sure it exists before anything else is done:

redo-ifchange ../deps/somesubmodule/.git

Going further

While this is a pretty simple example, you could imagine updating default.git.do to make more advanced build rules. For example, calling redo-always and then redo-stamp with the current revision, dirtying the dependency when submodule update pulls new changes.

Have fun!

    • #redo
  • 10 months ago
  • 5
  • Comments
  • Permalink
  • Share
    Tweet

F# CSV Parsing with FParsec

I needed a .NET CSV parser for work, and since I don’t trust most .NET code I find online I wrote one in F# with FParsec!

This handles both quoted and unquoted cells. It even supports escaped characters (including commas)!

For example,

CODE, EN, FR
HOME_DESCRIPTION, Hello\, welcome to our site!, Bonjour\, bienvenue sur notre site

Will parse to:

[
  ["CODE", "EN", "FR"],
  ["HOME_DESCRIPTION", "Hello, welcome to our site!", "Bonjour, bienvenue sur notre site"]
]

You can even mix them if you want:

"CODE", "EN", "FR"
HOME_DESCRIPTION, Hello\, welcome to our site!, "Bonjour, bienvenue sur notre site"

This will parse to the same thing

It even handles cases where there are quotes inside unquoted cells!

Since we’re using FParsec, if the parse fails it will tell you exactly where and why. Here is an excerpt from one of my test cases that fails when there’s a comma missing:

Error in Ln: 4 Col: 19
SOME_CODE,"Herp\,"Derp
                  ^
Expecting: end of file, newline or ','

Have fun!

  • 10 months ago
  • Comments
  • Permalink
  • Share
    Tweet

Factoring code and why it matters

If there was a single concept that one might describe as fundamental in programming I would have to say that concept is: factoring. “Factoring? Surely you’re talking about re-factoring, right?”. Almost, not quite.

Factoring might seem obvious to experienced programmers, but to many others, it can be one of those concepts that drastically increases the quality of their code.

When most people think of factoring, they think of factoring in the mathematical sense: You have some number, and you’re trying to find two numbers that multiply together to produce that number. eg. 5 x 3 = 15. You can think of the number 15 as some composition of the numbers 5 and 3. In that sense the numbers 5 and 3 are more fundamental than the number 15, because you can make 15 out of 5’s and 3’s, but you can’t make 5’s using 15’s. You can continue this reasoning until you reach its most fundamental elements. Since 15 is a number, and number is information, the most fundamental elements of 15, 3, and 5 are bits. Any number you can think of can be represented as a series of yes/no questions. These bits form a fundamental basis for representing information.

Bringing this back to programming, you can think of the number 15 as like a class in object-oriented programming. The numbers 3 and 5 are like member functions. I might be stretching the analogy a bit far, but if you think about it, it’s really the same idea. The class (15) is a composition of its member functions (3 and 5).

The effect may be more subtle. Imagine the number 15 as a monolithic function, performing some logic that does some numerical calculation (3) and displays that information to a GUI (5). The monolithic function (15) is a composition of two independent functions (3) and (5).

In both cases, 3 and 5 are not fundamental components of the program. In the case of the class, member functions 3 and 5 are not accessible until you instantiate the class (15). In the case of the monolithic function, there is no way to do the numerical calculation (3) and display the result (5) without doing both by calling the function (15).

Building programs with unfactored code is like trying to do math using only multiples of the number 15, 23 and 42. No wonder it can seem difficult at times.

You’re #1 goal as a programmer is to identify these compositions and to refactor them into reusable components. Decomposition is another word for this, but its all really the same thing.

Abstraction

Abstraction is a term commonly heard in software development. The number 15 is an abstraction. It is some abstraction over numbers 3 and 5. I don’t have to think about the numbers 3 or 5 when dealing with 15. I can pass the number 15 instead of passing around a function that multiplies 3 and 5. Base 10 numbers are an abstraction. They’re an abstraction over bits to make our lives easier, who would want to do higher maths in only base 2?

In programming, a more familiar abstraction is a function. It abstracts a series of statements that performs some, well, function. A class is an abstraction that abstracts a series of related functions, especially when that class conforms to some interface.

Function -> Classes -> Monolithic Functions -> Monolithic/Specialized Classes -> Program

The further you go up the abstraction hierarchy, the less powerful they become for building programs (unless your program is a function). I have a feeling this is also why there seems to be a vocal subset of programmers who denounce OOP outright. You can do everything classes can do using generic functions, interfaces and abstract data types, with arguably the same amount of abstraction.

Unintentionally composing unrelated functionality into a single, monolithic function is a common problem that new programmers face. Much time is spent building sentences, rather than verbs and nouns.

Composability

All of this boils down to a relatively simple concept: build 5 and 3’s instead of just 15’s. Make these your fundamental basis for building your programs. A simple function that takes some input and produces an output. A function that doesn’t rely on any external state. Any dependencies are clearly visible through its parameters.

These fundamental verbs can be used to build classes and frameworks. Future code can leverage these functions to build new classes and even other functions. This idea naturally leads to composability, combining functions with other functions to build abstractions.

To improve composability, have these functions work on generic types and interfaces. This idea naturally leads into functional programming, which takes these all of these ideas to the ideal.

“I hate frameworks, they’re too restricting!”

It should be now clear why frameworks can feel restrictive, they’re forcing you to do math with only multiples of 15, 23 and 42. They don’t have to be this way, well designed frameworks will allow the programmer to compose different aspects together to form a slightly different framework. An example of this is the node.js connect middleware for snapping together modules to build a webserver, which is what the express framework is built on.

That’s all programming is about, really.

At some basic level this is all programming really comes down to. As long as your good at keeping concepts in your code independent of each other, your life and coworker’s lives will be a lot easier. Of course it can get complicated at times, such as managing dependencies between components (which is really a non issue once you start composing things instead of stringing together complicated hierarchies of dependencies, don’t get me started on dependency injection frameworks), balancing abstractions and performance, etc.

I guess I’ll stop rambling and conclude with a plug for the Clay programming language, which is some awesomely weird combination of C and Haskell and an implementation of some of the ideas I touched upon in this post.

  • 11 months ago
  • Comments
  • Permalink
  • Share
    Tweet

redo by Example: Multiple Targets

Here’s a simple example, convert multiple markdown documents to pdf:

all.do

DEPS=$(ls *.md | sed 's/\.md/\.pdf/')
redo-ifchange $DEPS

default.pdf.do

redo-ifchange $1.md
markdown2pdf $1.md -o $1_
mv $1_.pdf $3

This will convert all .md files in the current directory into .pdf files, and will only convert them again when the respective .md file changes (example requires pandoc and a latex installation)

Download redo | pandoc

    • #redo
  • 11 months ago
  • Comments
  • Permalink
  • Share
    Tweet

redo by Example: Web Dependencies

djb redo is a software build system designed to be a replacement for make. The goal of these ‘redo by example’ posts are to show the power of redo, mainly through evidence instead of me talking about it.

Goal

The goal of this example is to track a list of urls via the ‘Last-Modified’ HTTP header. When the ‘Last-Modified’ header changes, we redownload the file.

Purpose

I wrote this build script to keep track of a list of .mobi ebooks that occasionally get updated. You can imagine using this technique to download updated library packages or something similar.

Scripts

urls.txt

http://ikeran.org/rationality.mobi
http://ikeran.org/rationality.epub

all.do

DEPS=$(cat urls.txt | grep -o "\w\+\.\w\+$")
redo-ifchange $DEPS

The first line builds a list of dependencies by parsing the filenames out of the urls.

Second line: redo-ifchange can be read as: “Rebuild if any of the dependencies have changed”. Since $DEPS contains “rationality.mobi rationality.epub”, both of those files get added as build dependencies.

default.do

URL=$(grep $1$2 urls.txt)
redo-ifchange $1$2.stamp
curl $URL > $3

default.do is the catch-all build rule. It is executed when there are no other more specific build rules for a given target. Since I have not defined a default.mobi.do or a default.epub.do, the targets rationality.mobi and rationality.epub both get handled by this build rule.

$1 contains rationality for both targets.

$2 contains .epub for the first target and .mobi for the second target.

$3 is the file name you use to write out your built file.

First line: I parse the url for the given target out of the list of urls.

Second line: I tell redo to rebuild the current target only if the rationality.epub.stamp dependency has changed.

Third line: Download the target.

default.stamp.do

This is where the magic happens.

redo-always
URL=$(grep $1 urls.txt)
curl -silent -I ${URL} | grep "Last-Modified" | redo-stamp

$1 now contains either rationality.epub or rationality.mobi.

$2 contains .stamp.

The first line tells redo to always run the build rule, we need this or otherwise the curl ‘Last-Modified’ check would get skipped.

Second line: parse the url for either rationality.epub or rationality.mobi out of the list of urls

Third line: Only download the header and parse out the Last-Modified header. The key to this entire script the redo-stamp commad.

redo-stamp tells redo to forget about this dependency if the stamp data hasn’t changed from last time. The Documentation has a better explanation on how this works.

Conclusion

Simply type redo and all of the files will be downloaded. Type redo again and they will only re-download when the header changes. All with only 3 build rules, 9 lines total. Are you excited yet?

You can download the scripts in this example from this gist:

git clone git://gist.github.com/855783.git redo-web-deps

More to come!

    • #redo
  • 11 months ago
  • Comments
  • Permalink
  • Share
    Tweet

Brian Greene’s layman explanation of String Theory’s 10 dimensions

I’d love to explain in purely nontechnical terms how this comes about, but I can’t, and I’ve never encountered anyone who can. I made an attempt in The Elegant Universe, but that treatment only describes, in general terms, how the number of dimensions affects aspects of string vibrations, and doesn’t explain where the specific number ten comes from.

So, in one slightly technical line, here’s the mathematical skinny.

There’s an equation in string theory that has a contribution of the form (D - 10) times (Trouble), where D represents the number of spacetime dimensions and Trouble is a mathematical expression resulting in troublesome physical phenomena, such as the violation of energy conservation mentioned above. As to why the equation takes this precise form, I can’t offer any intuitive, nontechnical explanation. But if you do the calculation, that’s where the math leads.

Now, the simple but key observation is that if the number of spacetime dimensions is ten, not the four we expect, the contribution becomes 0 times Trouble. And since 0 times anything is 0, in a universe with ten spacetime dimensions the trouble gets wiped away. That’s how the math plays out. Really. And that’s why string theorists argue for a universe with more than four spacetime dimensions.

Not the most satisfying answer, but if its a physics concept that Greene can’t convey intuitively, that’s interesting in its own right.

Quoted from Brian Greene’s latest book, The Hidden Reality

  • 1 year ago
  • 3
  • Comments
  • Permalink
  • Share
    Tweet
← Newer • Older →
Page 1 of 2

About

Personal blog of Bill Casarin. I write code.

Twitter

loading tweets…

  • RSS
  • Random
  • Archive
  • Mobile

Effector Theme by Carlo Franco.

Powered by Tumblr