Silk
Product Examples Tutorials

Silk raises $1.6MM in a seed round led by NEA

By: Salar al Khafaji August 1st 2012

Today we are very excited to announce the completion of a $1.6 million seed funding round led by New Enterprise Associates (NEA) with participation from existing Silk investor Atomico and three additional investors. The raised amount will be used to expand our team and to keep improving our product and infrastructure.

We are thrilled to partner up with NEA, the biggest US venture capital firm with a committed capital of $13 billion, who happened to announce their new $2.6 billion fund last week. Also, its a great confirmation that our existing investor Atomico (the fund co-founded by Skype co-founder Niklas Zennstrm) followed up on their investment from last year by participating in this round as well.

Additional backers in the round are Anil Hansjee, a well known angel and advisor to many companies, including Fon, who previously served as Google’s head of corporate development EMEA; Jens Christensen, best known for his role as CEO of Ellerdale (acquired by Flipboard); and Philippe Cases, an angel investor and entrepreneur, currently CEO of Spoke Software.

What are we working on?

For those who dont know us: Silk is a place to create, share and find structured information. Share those special places in Madrid, the video games you are playing or the length of your children as they grow. Public transport companies can use Silk to find patterns in their delays. Human rights organizations can keep track of violations by governments. Investors use Silk to publish their investment portfolio.

Weve learned that Silk comes in handy for information that:

  • has some structure,
  • is valuable to share in real time, but
  • should also remain available to be used later

Example

A current example would be a Silk site about Olympic medals. You may want to keep track of all Olympic medals as they are won, or perhaps filter on your country or on a sport you are interested in. The information should also remain available to generate overviews from, such as a map of the world indicating the number of medals each country has won or simply a list of medals won by The Netherlands.

We are very excited to be working with both Atomico and NEA, as this will help us to further build on our vision. If you are interested in using Silk, sign up here. Organizations who want to use Silk professionally, feel free to get in touch.

And finally, yes, we are hiring!

silk blog

Multiple IP addresses on Amazon EC2

By: Jurian July 9th 2012

Last week, Amazon announced support for multiple IP’s for instances inside a VPC. This can be useful in many situations where a single IP for a machine is not enough. One example is SSL endpoints, which can only pass on certificates based on the IP listened on, and don’t work on hostname, for example. (Announcement here)

Another reason to use multiple IP’s is if you want to access a service that is limited to a number of requests per IP. Having more IP’s will give you a higher limit, but creating new instances just for extra IP’s can get expensive quickly.

While Amazon has updated their instance IP documentation, using multiple IP addresses may still be a bit confusing for people. I hope this post will be able to explain the basics of Elastic IP’s and how to use multiple IP’s on a single instance.

Pricing

Amazon changed their IP pricing structure with this update. Prior to the update, used Elastic IP’s would cost nothing, and unused IP’s would cost around $10 / month. This monthly fee was introduced to discourage retaining unused IP’s – IP addresses get scarcer every year, and will be interesting to see what will do once they cannot acquire new IP blocks.

With this update, Amazon has updated their pricing as follows: all IP’s cost $0.005 / hour (around $3.65 / month), unless the IP is assigned as primary IP of the first network interface of an instance, in which case it’s free.

Limits

When you start with your first VPC, you are limited to a maximum of 5 IP’s, probably because of the IP scarcity. However, it’s easy to request a higher limit from AWS, and it shouldn’t be a problem to raise your limit to for example 50 IP’s. You might want to do that before following the rest of this guide. Amazon published a list of interfaces and IP limits per instance.

How Elastic IPs work

By default, instance in the VPC will only get an IP from a private subnet within the VPC, usually somewhere in the 10.0.0.0/8 subnet. This means that while those instances can access other instances within that same subnet, they won’t be able to connect to the wider internet.

One way to give every machine access is to setup a NAT machine which all machines use to route traffic through. We won’t go any deeper into this option.

The alternative is to give every machine it’s own world-routable Elastic IP address. Amazon will route traffic to the elastic ip to your internal IP. This makes this setup simpler from a network topology perspective.

Routing in the VPC works through routing tables which are set up for each subnet. You can view your routing table in the VPC tab of the AWS console. By default, requests within your subnet are routed locally, and requests outside of that range are routed through an ‘internet gateway’.

For example, let’s say that you have an instance with IP 10.0.0.1. It has an Elastic IP associated with it, let’s say 1.2.3.4. Now whenever you access other internal machines on the 10.0.0.0/8 subnet, your request will originate from your private IP and nothing will happen to the traffic.

When you access a machine outside of your subnet, the packets are oruted through the internet gateway, a VPC device with a name like igw-9d7534f2. The gateway (which isn’t an actual EC2 instance, just an opaque AWS system) accepts the request and then looks up the internal IP from which the request originates, and checks if this IP is associated with any Elastic IP’s. If so, the request is rewritten to appear as if the origin is the elastic IP, and then send out over the internet. When a package returns to the gateway with an elastic IP as destination, the gateway checks if the elastic IP is associated with an internal IP, and if so rewrites the package to that IP. The package is then forwarded into the private subnet.

This is how the one-to-one NAT of elastic IP’s works. The advantage of this approach is that machines within the subnet need to have no knowledge of their elastic IP’s. This makes it easy to switch elastic IP’s of a running instance, since the only thing that needs to be updated is the mapping table on the internet gateway. The external IP can change many times; since the packages are rewritten to use internal IP’s before they arrive at the instance, the instance will never know this.

Multiple Network Interfaces

Before we dive into the new system Amazon introduced, let’s review what was available previously.

In the past, Amazon added an option to add multiple network interfaces to a VPC instance. These interfaces appear as separate ethernet cards on your machine, and will have a separate internal IP. Separate interfaces also means you need to set up your routing between these interfaces correctly: if you send out a packet over the wrong interface, the packet will simply be dropped.

These network interfaces were introduced to give you the option of connecting different subnets together: you can have one interface in each subnet, and then route trafic between them as you like. Since each subnet has its own IP range, it is easy to route traffic to the right network interface.

For each internal IP, it is possible to associate an external Elastic IP. While this is possible, you get into a bit of a tricky routing situation now. Both internal IP’s (on eth0 and eth1) have an associated external IP, and should be able to make requests to the wider internet, by having the internet gateway translate their internal IP to the Elastic IP. However, in order to do this correctly, the requests have to be send out over the correct interface. Amazon itself admits that this is not simple:

I should note that attaching two public ENIs to the same instance is not the right way to create an EC2 instance with two public IP addresses. There’s no way to ensure that packets arriving via a particular ENI will leave through it without setting up some specialized routing. We are aware that a lot of people would like to have multiple IP addresses for a single EC2 instance and we plan to address this use case in 2012.

By default, Linux has a routing table which routes all traffic outside of your local subnet to a single interface. You can look this up with route:

# route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         10.0.0.1        0.0.0.0         UG    0      0        0 eth0
default         10.0.0.1        0.0.0.0         UG    100    0        0 eth0
10.0.0.0        *               255.255.255.0   U     0      0        0 eth0
10.0.0.0        *               255.255.255.0   U     0      0        0 eth1

Here, all internet-wide traffic will go out over interface eth0. Even packets with a source ip of the eth1 interface will go out over eth0, and then will be silently discarded instead of being NAT’d to the right elastic IP. While it’s possible to do source-based routing to fix this, this isn’t trivial to set up (note that you need to run dhclient after adding another interface to a machine):

# ifconfig | grep eth\\\|inet\ 
eth0      Link encap:Ethernet  HWaddr 02:86:10:77:7f:fe  
          inet addr:10.0.0.76  Bcast:10.0.0.255  Mask:255.255.255.0
eth1      Link encap:Ethernet  HWaddr 02:86:10:65:fd:31  
          inet addr:10.0.0.226  Bcast:10.0.0.255  Mask:255.255.255.0

# curl --interface 10.0.0.76 ifconfig.me
107.23.13.138
# curl --interface 10.0.0.226 ifconfig.me
<< TIMEOUT >>>

# ip rule add from 10.0.0.226 table out2
# ip route add default via 10.0.0.1 dev eth1 table 2
# ip route flush cache

# curl --interface 10.0.0.226 ifconfig.me
107.23.13.127
# curl --interface 10.0.0.76 ifconfig.me
107.23.13.138

If you are OK with being a bit more limited in how many IP addresses you can use, it’s easier if you can assign multiple addresses to the same interface. The rest of this blog post assumes you will only use multiple IP’s on a single interface, but if you really need to, the above should be a good starter on how to add more interfaces.

Multiple IP Addresses

What Amazon announced last week is support for multiple internal IP’s on the same interface. This way we can avoid some of the routing nasties from above: all packets will be sent over eth0, regardless of which IP you have. Because of the one-to-one mapping of public and private addresses, you first need to add some new private addresses to one of your instances.

Amazon’s API has been updated to support assigning secondary private IP’s, but for this example it’s easier to just go to the AWS Console. In the EC2 tab, go to Instances. Find the instance you want to update, right-click and choose “Manage Private IP Adresses”.

You’ll now see that the instance has two private IP’s in your subnet (since an interface is locked to specific subnet, all IP’s on that interface have to fall within that same subnet). Nice! However, if you now check your instance:

# ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 02:86:10:7b:e4:f5  
          inet addr:10.0.0.34  Bcast:10.0.0.255  Mask:255.255.255.0

It won’t have the new IP yet. Let’s try retrieving it over DHCP!

root@ip-10-0-0-34:/# dhclient -d eth0
Listening on LPF/eth0/02:86:10:7b:e4:f5
Sending on   LPF/eth0/02:86:10:7b:e4:f5
DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 3
DHCPREQUEST of 10.0.0.34 on eth0 to 255.255.255.255 port 67
DHCPOFFER of 10.0.0.34 from 10.0.0.1
DHCPACK of 10.0.0.34 from 10.0.0.1

Nope, we only get our primary private ip! With DHCP you can only get a single address. In the future, perhaps Amazon will add support for multiple IP’s using some client identifier or another mechanism, but for now you’ll have to add the address manually. Hopefully Ubuntu’s cloud-init will add support for this in the future, so we don’t have to do it ourselves.

Amazon suggests creating a new virtual interface and then bringing that up. While that will work, it means you store your interface state on your hard drive, and any changes you make on AWS you will have to propagate to these files.

For testing, we can do something simpler:

MAC_ADDR=$(ifconfig eth0 | sed -n 's/.*HWaddr \([a-f0-9:]*\).*/\1/p')
IP=($(curl http://169.254.169.254/latest/meta-data/network/interfaces/macs/$MAC_ADDR/local-ipv4s))
for ip in ${IP[@]:1}; do
    echo "Adding IP: $ip"
    ip addr add dev eth0 $ip/24
done

This will check the EC2 instance meta-data for all internal IPs associated with eth0, and adds all the secondary IP’s.

You can now asociate another elastic ip to the secondary IP in the AWS Console:

And now you have multiple external IP’s!

root@ip-10-0-0-34:/# curl --interface 10.0.0.58 ifconfig.me
107.23.13.127
root@ip-10-0-0-34:/# curl --interface 10.0.0.34 ifconfig.me
107.23.10.202

While this works for testing, it’s not persistent across reboot. You can create an upstart or cloud-init script to do that for you. Even then, ip addresses aren’t automatically added when you add them through EC2. I’m not sure if there’s a good way to do that. Finally, this script also won’t remove any local addresses you may have removed in the meantime.

However, it should be a good place to start!

engineeringamazon

Today we open up Silk

By: Jurian May 10th 2012

Today is a good day: we just launched the Silk editor. Everyone can now create their own Silk site.

Over the past few months Silk has been in private beta. Only the first 10,000 users could sign up and create a Silk site. It has been both exciting and insightful to see the first groups of users creating Silk sites. It teaches us what aspects of Silk are appreciated and where we can improve.

We have seen interest from professional publishers and data journalists, but also from businesses and personal users. During the private beta, we allowed users to use the Silk web editor to structure existing content and to build entirely new websites. Many users tried our importer tools to turn existing data sets into structured Silk sites. If you are curious by now, you may sign up here.

A structured web

If you have been following us, you know that Silk sites are websites that can be understood by both humans and computers. This enables an entirely new kind of web search and new powerful ways to visualize information.

The Silk sites that have been created so far demonstrate how Silk impacts the way information is consumed. For example, the Countries of the world site contains information about the various countries in the world, just like Wikipedia does. The site contains text like any other website, but allows you to interact with the content more deeply. It is like a database that anyone can use. Want to know which countries have a life expectancy below 60, for example? It is easy to get the answer in a table, chart or map.

Now that we have launched, more new Silk sites will be created each day. You can create your own site here. Making information available through Silk will make the web a better place, one Silk site at a time.

engineeringsilk blog

Erik’s talk at Functional Programming eXchange 2012

By: Jurian April 2nd 2012

On the 16th of March, Skills Matter organized the Third Functional Programming eXchange, an annual Functional Programming conference. A day of talks, open-space discussions and brainstorming on Functional Programming. Third speaker on the agenda was our own Erik Hesselink.

Erik spoke about the uses of and experiences with functional programming at Silk. Our backend uses Haskell to provide an API. It provides functionality from simple account creation to querying a custom graph database. Haskell’s types and abstraction power have enabled us to build complex functionality with very few errors. However, laziness and performance are tricky to understand and get under control. 

Silk’s frontend is written in Javascript. With first-class functions, functional programming techniques can readily be used here. For example, we use reactive values to statically model the dependency between application state and the user interface. However, runtimes are not optimized for this, and performance in the browser can be limited.

View Erik’s talk here (46 minutes) or read about the Functional Programming Exchange 2012. 

haskellengineeringjavascript

A RESTful API with automatically generated bindings

By: Jurian February 9th 2012

Silk is mainly built on two languages: Haskell and Javascript. We use Haskell for our back-end to get a stable, type-checked core and Javascript to bring the content to our users. While both these languages are high-level programming languages, the communication via the web forces us to write much lower-level HTTP requests and marshaling code. At Silk, we make an effort to avoid writing low-level code manually. A previous result of this are our generic XML/JSON picklers as described in this previous post. Since then we have not been idle and we took the code automation one step further: We now generate our documentation and Javascript wrappers directly from our own Haskell web-EDSL.

The benefits of this approach are obvious. Instead of having to update documentation and javascript wrappers on every web API change, these parts are always kept up to date and correct. This saves work and avoids errors.

The heart of this system is our Haskell web-API-EDSL. This system allows the programmer to define API resources in a declarative way together with the associated functionality. The EDSL forces the resources to be constructed in a RESTful way. The EDSL constructs an abstract representation of web resources, which we can then interpret to generate the following functionality:

  • We construct the actual web-API using Happstack. Because we kept the API abstract, we could easily switch out Happstack for another framework.
  • We generate documentation which gives an overview of what entry points the API has and what parameters/methods entry points can be called with.
  • We generate a javascript wrapper which simplifies access to the API resources.

All this is best illustrated with an example. We could for example define a User API resource in the following way:

user :: Resource Root WithUser User
user = mkResource 
  { identifier     = "user"
  , multiGetBy     = [ ("search",  search)  ]
  , singleGet      = [ ("current", current) ]
  , singleGetBy    = [ ("email",   byEmail) ]
  , singleCreate   = Just create
  , singleDelete   = Just delete
  }

We can bundle such resources into an API:

myApi = 
  api ---/ user --/ permissions
                --/ subscriptions
      ---/ site

This also shows that resources can be nested. We use a monad stack to allow child resources to access their parents’ context information. We could now request the permissions of a certain user by requesting the following URL:

/user/email/e@mail.com/permissions

This would list all permissions for the user with email adress e@mail.com. A sneak peek of what the documentation looks like for such URLs can be found here. All information on this page is automatically generated, including the examples and schemas for XML and JSON.

We could also access this resource using our Javascript API as follows:

silkApi.User.byEmail("e@mail.com).Permissions.list()

We are currently in the process of constructing similar wrappers for Ruby and Haskell. We are also working on finalizing the base EDSL and we plan to open source it in the near future.

haskellengineeringjavascriptruby

Testing Javascript

By: Jurian February 23rd 2010

At typLAB, we primarily use two programming languages: Javascript and Haskell. Haskell has a compiler, strong static typing, and is pure (no state or side-effects) by default. Javascript, on the other hand, is interpreted, dynamically typed, and, when using objects, uses lots of state (since objects are an encapsulation of state). This makes testing Javascript more important: otherwise, a typo in a variable name might go undetected until someone uses our product.

To test our code, we need a couple of things: we need to write tests, we need a framework to run our tests in, and we need to automatically run this framework regularly. The number of Javascript test frameworks is large these days, and we started by reviewing a few. We finally settled on QUnit. It provides a few simple functions to write tests, and a clean and useful page to present the results.

We then integrated QUnit into our work flow. We use a simple, home grown module system to write our Javascript. It uses keywords like Module, Class and Static to easily build modular code. We defined a new annotation, Test, to define a test in a module. All these tests are automatically collected. A separate test runner module imports the modules to test and calls a single function to run all tests in a module.

This provided us with a simple way to write tests. But if this was all, people would forget to run the tests, the tests would start failing without anyone seeing it, and the effort to write the tests would be wasted. To prevent this from happening, we wanted to integrate the running of tests in our work flow. Luckily, John Resig has written TestSwarm.

Traditionally, integrating Javascript tests in a build process or as a commit hook has been difficult, since Javascript generally depends on the browser to run. Different browsers have different quirks that might make your Javascript behave differently, and your code might also depend on theDOM, which is only available in a browser. However, you don’t want your developer machine to start many different browsers on each build or commit, and your server might not even have aGUI.

TestSwarm is a solution to this problem. It provides a central server, and new tests to be run can be pushed to this server. Clients willing to run tests can also connect to it, and are given tests to run. Results are reported back to the server. This allows TestSwarm to build reports, showing test results per browser for each test run. In our case, our server pushes a new test run to the server on each commit set that is pushed. We have a few browsers open pointing to our TestSwarm instance, which run these tests. This means that testing doesn’t get in our way, but if something breaks, we can easily see which commit broke it and on which browsers.

The final piece in our testing puzzle takes its inspiration from the Haskell world. QuickCheck is a tool that performs randomized testing. You supply a function testing a certain property of your code, and it generates random inputs of increasing size. For example, if you have written a function reversing a list, you can state the property (function) that given a list (the input argument which is randomly generated), reversing it twice produces the original list.

A Javascript version of QuickCheck exists, and we had a good use case. We use so-called reactive lists, that is, lists which can be connected to keep each other updated. For example, we can create two of these lists, and connect one to the other saying it is always the reverse of the first. When elements are inserted into the first, it sends an event to the second, which processes it, and generates an appropriate insert for the reversed list. The interesting thing is that we have an easily testable static property. Regardless of the mutations on the original list, at any point in time, the reverse of its values must be the same as the values in the output list.

QuickCheck has quickly proven its worth in testing these reactive list functions. Since it generates random values of increasing size, it quickly and reliably finds all the corner cases where bugs like to hide. We’ve written a small helper function around QuickCheck to integrate it with QUnit, so that we can write code like this:

function splitTest ()
{
 var l = new R.List();
 var splitfun = function (e) { return e === 1; }
 var s = l.split(splitfun);
 quickcheck([arbListOp(arbRange(0, 5))],
   function (c, op)
   {
     op(l);
     c.assert(QUnit.equiv(A.split(splitfun, l.unR()), s.unR()));
   });
}

This creates two lists: an input list l, and an output list s that is the result of splitting the input list at elements equal to 1. We then call the quickcheck function, which is our helper function. It takes a list of generators as a first argument. In Haskell, these generators can be derived from the types, but here, we have to supply them. In this case, we generate operations on reactive lists. This generator for list operations takes a generator as an argument, which it uses to generate list elements to insert. In this case we generate random integers between 0 and 4 to make sure we get at least a few 1’s.

The second argument to the quickcheck function is the test function to run. It gets a QuickCheck object as its first argument, followed by the output from all the generators. In this case it gets a function representing an operation on a reactive list. We perform this operation on the input list by calling the function. We then check if the values in the input list (the function unR performs a deep conversion from reactive to normal lists) split on 1’s are equal to the values in the output list. Here, A.split is a static splitting function, and QUnit.equiv performs a structural equality check on arrays.

Thanks to the availability of these three open source tools, setting all this up only took a few days, and has provided us with a testing setup that works, but doesn’t get in our way. In the future, we will probably also start testing DOM and user interface related code; hopefully, that will work

engineeringjavascript

Reinventing XSLT in pure Javascript

By: Sebastiaan Visser December 10th 2009

In this post we will explore some boundaries of functional programming in Javascript and show how easy it is to implement a set of combinators that can express functions similar to queries in XPath and similar to transformations in XSLT. We call the result a combinator library because we implement a few primitive queries and transformations and allow combining these into bigger ones using some basic composition functions. As we will show, all functions will follow more or less the same structure.

This post is really about Javascript, which will be the target language of this library. But most of the techniques and underlying thoughts actually come from a statically typed functional programming background. While reading this post it might be interesting to continuously keep in mind the types of the functions, which makes it much easier to understand what is going on and how this framework might be extended with more interesting transformations.

For the Haskell programmer: the framework we are about to describe here is very similar to the list arrows of the Haskell XML Toolkit. Looking at the documentation of this package might gain some additional insight.

Some functions we will see are selection functions that can be used to select parts of a document, other functions can be seen as filtering functions that exclude parts of the output, some are creation functions that introduce new structure in the output. In these post we call all of these function just transformations. The resulting framework is some hybrid comparable to XPath and XSLT.

Primitive transformations

Let’s first look at the structure of the transformations performed in XPath and XSLT. Both languages can be seen as ways of describing a function that takes one input, the context node, and produces a list of outputs, a node set. Primitive examples of such transformation functions are: getChildren (/* in XPath), getParent (.. in XPath) and getID (@id in XPath). Implementing these three functions in Javascript is very easy:

function getChildren (ctx) toArray(ctx.childNodes);
function getParent   (ctx) [ctx.parentNode];
function getID       (ctx) [ctx.getAttribute("id")];

For the ease of reading most functions in this post are written down as Javascript 1.8 lambdas. This syntax, which leaves away the braces and the return statement of functions, is currently only supported by Firefox’s SpiderMonkey engine. Note that all three functions have exactly the same type, they take one context node as input and return an array of nodes (node set) as output. There is a good reason to have a consistent type signature for these primitive transformations: this allows us to easily compose multiple primitive operations into more advanced transformations.

Sequential composition

To illustrate composition, lets see how we would like to write down a transformation that selects all the grandchildren of a context node. Ideally we would like to sequentially compose two invocations of getChildren into one. With sequential composition we mean applying some transformation to the result of some earlier transformation. Something like this:

var getGrandchildren = seq(getChildren, getChildren);

But what would the seq function look like? To make it more easy to come up with the correct implementation of the seq function it might help to write down the types of all the functions involved and work from there. We first write down the Haskell style type signatures for our getChildren function, which takes one context node and produces a list of child nodes:

 getChildren :: Node -> [Node] 

Because we want the result of the expression seq(getChildren, getChildren) itself to be transformation with the same type again, the type of seq must be:

 seq :: (Node -> [Node]) -> (Node -> [Node]) -> (Node -> [Node]) 

Which means: take two transformations as input and produce one transformation as output. By only looking at the type we can already deduce the following skeleton of the seq function:

function seq (tr0, tr1) function (ctx) /* [some nodes] */ ;

Note that this function is higher-order, it takes two functions and returns a new function. Now we only have to fill in the gap, which we can do by looking at the desired semantics. In the case of the getGrandChildren function we want to apply the first getChildren transformation to the context node and then apply the second transformation (getChildren again) over all the results of the first transformation and group together the results. Translating this to code would become something like: apply tr0 to ctx, than map tr1 over all these results and concatthe output.

function seq(tr0, tr1) function (ctx) concat(tr0(ctx).map(tr1));
function concat (xs) [].concat.apply([], xs); // stand-alone concat

Now we worked out the skeleton of the function by looking at the derived type and we have come up with an implementation by looking at the desired semantics. The function is now both type correct and works as expected!

For the Haskell programmer: we just worked out Kleisli composition for the list monad. This composition internally uses the monadic bind for the list monad instance, which happens to be the concatMap function. The concatMap function has type [a] -> (a -> [b]) -> [b], which (in our context) shows exactly how we apply a transformation (a -> [b]) to the results of another transformation ([a]) and come up with the result of the composition ([b]).

To illustrate the generic behavior of our sequential composition we use it to define some other useful transformations.

getSiblings    = seq(getParent, getChildren);
getGrandParent = seq(getParent, getParent);

Alternative composition

Now we have defined sequential composition we can also try to define another form of composition which just sums up the results of two transformations working over the same context node. For example when we want to create a transformation that selects both the grand parent and the grand children of a context node, we might want to write down something like this:

getGrands = alt(getGrandParent, getGrandChildren);

We call this composition function alt because it combines two alternative transformation paths into one. The type signature of the alt function is similar to that of seq, it takes two transformations as input and is itself again a transformation. The semantics of this transformation combinator is really easy, it just applies the two input transformation to the same context node and groups the results.

function alt(tr0, tr1) function (ctx) concat([tr0(ctx), tr1(ctx)]);

For the Haskell programmer: we just worked out the <+> function of the ArrowPlus instance for the list arrow. This function is defined in terms of the append function for lists, with type [a] -> [a] -> [a]. The semantics are very similar to that of the Alternative and MonadPlus type classes. Where the seq function is the algebraic and, product, sequence, /, times, etc, the alt function is the algebraic or, sum, alternative, |, plus, etc.

Deep recursion, filtering and creating.

Now we have two basic transformation combinators: sequential and alternative composition. Combining these two functions can get us some powerful transformation combinators. E.g. we can make the deep function that applies a transformation at arbitrary depth in document.

function deep (tr) alt(tr, seq(getChildren, lazy(deep, tr)));
function lazy (f, a) function (n) f(a)(n); // postponed function application

The deep function applies a transformation and groups the results with a recursive deep invocation for all the child nodes of the current context. Because, unfortunately, JavaScript is a strict language we have to explicitly delay the recursion in order to prevent infinite loops. Before we will test the deep transformation combinator we define three other useful primitive transformations.

function self (ctx) [ctx];

This transformation self is a bit special, it is the identity transformation which outputs a singleton list containing the context node itself. This transformation can be compared to the dot (.) in XPath.

function isElem (name) function (ctx) ctx.nodeName == name ? [ctx] : [];

This function isElem is a filter transformation function which includes or excludes a context node based on the nodeName. When the node name matches it is included as a singleton list, when there is no match the empty lists will be returned.

function mkElem (name) function (tr)
 function (ctx) [createElemWithChildren(name, tr(ctx))];

The third function mkElem is neither a selection or filtering functions, it is a creation function which takes the output node set of a transformation and surrounds it with a new element. This element will be returned as a singleton node set again.

Using the set of primitives and combinators to create a transformation that produces a simple table of contents from an input document is now very easy. The following transformation shows how to select all H1 and H2 elements from the entire document. All headers will be surrounded by a LI and inserted into an unordered list UL.

var isHeader = sum(isElem("h1"), isElem("h2"));
var toc = mkElem("ul")(seq(deep(isHeader, mkElem("li")(self))));

var myToc = toc(document.body);

Now toc is a true JavaScript function that computes a table of contents from an input document.

Comparison to XPath

Here is a (far from complete) table showing a comparison between XPath queries and our transformation functions. With the very few combinators defined, we can already rebuild a powerful set of XPath queries. Some primitives shown below, like hasAttr and precedingSiblings, have not been defined in this post. Defining them is very easy, please try it for yourself and see how far you can get with other exotic XPath axes.

XPath JavaScript . self //div deep(isElem("div")) /*/p seq(getChildren, isElem("p")) (//div|//p) sum(deep(isElem("div")), deep(isElem("p"))) //p/../.. deep(seq(seq(isElem("p"), getParent), getParent)) /p[@href]/em seq(seq(isElem("p"), hasAttr("href")) , seq(getChildren, isElem("em"))) //table/preceding-sibling::h2 deep(seq(isElem("table"), seq(precedingSiblings, isElem("h2"))))

Conclusion and goal

So we have seen how easy it is to create a powerful XML query and transformation language with a bare minimum of code. The trick is to define some primitive transformation functions and only two powerful ways to compose transformations. By using functions from one context node to a list of outputs selecting, filtering and creating elements all becomes possible. Using the composition functions you can easily build more advanced and high-level transformations like the deep function that traverses an entire document tree. The comparison between XPath and the selection primitives is quite clear. Adding an element creation function shows that the query language can quite easily become a true transformation language in which we can add new structure to the output.

While building such a library yourself is fun, it might feel a bit useless at first sight. Why reinvent XPath or XSLT while most browsers have built-in support for these tools? There are two main reasons to perform transformations this way.

The first reason is that is now very easy to add extra power to the library by plugging in new JavaScript functions. Using XSLT as a programming language, which unfortunately happens a lot in practice, almost always ends up shooting yourself in the foot with large and unmanageable recursive templates that fuzz what is really going on.

The second reason is more subtle. Here at typLAB we have shown that it is possible to change the primitive transformations and the two composition functions with their functional reactive counterparts. This enables us to incrementally rebuild the output of a transformation when only small parts of the input document change. This incremental reactivity is an extremely powerful paradigm that allows for very fast live queries over semantic and structured documents.

As you can see, this framework is very simple. Try playing with it yourself!

engineeringjavascript

Writing a generic XML pickler

By: Jurian November 10th 2009

In a previous post, Sebas explained that we use so-called XML picklers to convert Haskell data types to XML. Since these picklers have a regular structure, we don’t write them by hand, but derive them automatically using generic programming techniques. In this post, I’ll explain how our generic XML pickler works. The code shown here has been made available on hackage. We use the regular library to represent data types generically. It allows you to build a structure for your type that is isomorphic to your type, but built up from standard building blocks, called functors. This is possible because Haskell data types have a standard structure: a choice of one of several constructors (this is often called a sum), each having a number of fields (this is often called a product).

As an example, we will represent a simple user data type:

data User = User
         { name  :: String
         , email :: String
         , admin :: Bool
         }

The generic representation of this type is called a pattern functor. For the user data type, it will look like this:

type instance PF User = C User_User_
 (   S User_User_name_  (K String)
 :*: S User_User_email_ (K String)
 :*: S User_User_admin  (K Bool)
 )

Here we see a few of the functors that are the building blocks of our generic representation: The C type marks constructors, the S type marks record labels, K marks the types that make up the fields of our record, and (:*:) is the product mentioned earlier, and is very similar to the (,) used for constructing tuples. There are three more functors in regular: I, which marks recursive positions in a type; (:+:), which sums together different constructors; and U, for constructors without fields.

To write a generic function, we define a type class which contains this generic function, and give an instance for each of these functors. If we then also define conversion functions from our data type to the generic representation and back, we can apply the function to our data type. Note that for regular, these conversion functions, and the representation given above, can all be generated using Template Haskell.

Since we want to write a generic XML pickler for use with the HXT library, we’ll define a type class containing such a pickler:

class GXmlPickler f where
 gxpicklef :: PU a -> PU (f a)

This says that we can give a generic pickler for one of the functors f, containing a’s at the recursive positions, if we have a pickler for a’s. I’ll now show the instances for all the functors, and explain them one by one.

The first is the instance for I, marking recursive positions. Since we get a pickler for the resursive positions, all we have to do is wrap it in the I constructor when unpickling, and remove that constructor when pickling. HXT provides the xpWrap function to transform a pickler like this.

instance GXmlPickler I where
 gxpicklef = xpWrap (I, unI)

Next, we’ll tackle K. Here, we require that the type of the field has its own pickler, and use that. We again use xpWrap to add and remove the constructor of the functor. We provide a special instance for String, since it doesn’t have a standard pickler. We choose to use the pickler that allows empty strings.

instance XmlPickler a => GXmlPickler (K a) where
 gxpicklef _ = (K, unK) `xpWrap` xpickle

instance GXmlPickler (K String) where
 gxpicklef _ = (K, unK) `xpWrap` xpText0

U works similarly. We use the pickler for (), and use xpWrap to go from a () to a U and back.

instance GXmlPickler U where
 gxpicklef _ = (const U, const ()) `xpWrap` xpUnit

The case for (:+:) is interesting. Remember that (:+:) represents a choice between two functors. If we have picklers for both of these, we can define a pickler for the sum as follows: during conversion to XML, we can pattern match on L or R (the constructors of (:+:)) and choose the left or right pickler appropriately. During conversion from XML, we try the first pickler. If it fails (an unpickler returns a Maybe value) we try the second. Since HXT doesn’t seem to have a combinator that follows this logic, let’s define one ourselves:

xpEither :: PU (f r) -> PU (g r) -> PU ((f :+: g) r)
xpEither (PU fl tl sa) (PU fr tr sb) = PU
 (\(x, st) -> case x of
                L y -> fl (y, st)
                R y -> fr (y, st))
 (\x -> case tl x of
          (Nothing, _) -> lmap (fmap R) (tr x)
          r            -> lmap (fmap L) r)
 (sa `scAlt` sb)
 where lmap f (a, b) = (f a, b)

Note that we take apart the PU type, and create a pretty printer, a parser and a schema separately. We can use this function in the GXmlPickler instance by supplying it with two picklers, created by recursive calls to gxpicklef.

instance (GXmlPickler f, GXmlPickler g) => GXmlPickler (f :+: g) where
 gxpicklef f = xpEither (gxpicklef f) (gxpicklef f)

The instance for (::) is easy again. Since (::) combines two functors, we can require that these two have a pickler. We use xpPair to combine these into a pickler for a pair, and then use xpWrap again to convert it into a pickler for (:*:).

instance (GXmlPickler f, GXmlPickler g) => GXmlPickler (f :*: g) where
 gxpicklef f = (uncurry (:*:), \(a :*: b) -> (a, b))
               `xpWrap`
               (gxpicklef f `xpPair` gxpicklef f)

Note that so far, we haven’t created any XML tags ourselves. We’ve only combined picklers. Now we come to the instances for constructors and record selectors. Here, we’ll use the constructor or selector name (converted to lowercase) to generate and parse xml tags. This is done with the xpElem combinator.

instance (Constructor c, GXmlPickler f) => GXmlPickler (C c f) where
 gxpicklef f = xpElem (map toLower $ conName (undefined :: C c f r))
                 ((C, unC) `xpWrap` gxpicklef f)

instance (Selector s, GXmlPickler f) => GXmlPickler (S s f) where
 gxpicklef f = xpElem (map toLower $ selName (undefined :: S s f r))
                 ((S, unS) `xpWrap` gxpicklef f)

We now have picklers for all generic representations built from the standard functors in regular. We still need a top level function that works on our real data types, though. Regular provides the two functions to and from to convert between data types and their generic representation. We can use these together with xpWrap to change our generic pickler into a pickler for real data types. Another matter we have not addressed is the first argument to gpicklef, which is a pickler for the recursive positions. Here we pass the top level function, since the recursive position contains our original data type again.

gxpickle :: (Regular a, GXmlPickler (PF a)) => PU a
gxpickle = (to, from) `xpWrap` gxpicklef gxpickle

And that’s all for the generic XML pickler. Using it is very simple. To use it for our user data type above, we import Generic.Regular, Generic.Regular.TH, Generic.Regular.XmlPickler and Text.XML.HXT.Arrow.Pickle. We can then derive the Regular instance for user (note that we need a little bit of extra code, since Template Haskell cannot generate type family instances yet), and make an XmlPickler instance:

$(deriveAll ''User "PFUser")
type instance PF User = PFUser

instance XmlPickler User where
 xpickle = gxpickle

And that’s it! The package is available as regular-xmlpickler on hackage, and the source can also be found on github. The packages regular and HXT can also be found on hackage.

engineeringxmlhaskell

How I learned to stop worrying and love web development again

By: Salar al Khafaji October 28th 2009

Why we don’t support Internet Explorer

We’ve already talked about some of the technology choices we’re making as a company. And while our choices on the back-end can hardly be labeled as mainstream, the most difficult choice we actually had to make was related to the client-side as it directly affects our users. Obviously, Javascript on the client is a given, and we love it. However, as most web developers know, the differences between browsers are enormous and developing for all of them is almost impossible (Try using Netscape 4.7 today, if you’re curious.). Still, current conventional wisdom dictates that you should support recents versions of Internet Explorer (Usually, this means Internet Explorer 6 and newer. However support for Internet Explorer 6 is slowly declining, as it’s dropped by major sites like YouTube, Orkut and products like Basecamp.), Firefox and the WebKit based browsers (basically, Safari and Google Chrome). We, however, have decided to drop Internet Explorer support entirely (Note that what we’re discussing here is browser support for our application. The content that resides inside the application will always be available to any web browser, whether it’s a text-based browser with no Javascript support, a low capability mobile browser or Internet Explorer 6. This is based on the principe of graceful degradation.).

In general, the trade-off you face when choosing which platform to develop for is between development time and a larger potential customer base that’s associated with the platform. Looking at the web right now, Internet Explorer still leads the market by a large margin (Wikipedia has a nice summary of various sources with browser usage statisics.). So even if it seems annoying that you have to work around some CSS bugs or still write Internet Explorer specific event handling code for a web site, the payoff in user reach will usually still be worth it. Modern Javascript libraries such as jQuery or Mootools lower cross-browser development time even more by abstracting away lots of differences between browsers, tipping the equation even more in favor of Internet Explorer support.

So why did we choose to ignore the most used browser on the planet? It’s because we decided that in our case, the development costs would simply not be worth it. Obviously, this assessment is very specific for the type of application we’re building. We didn’t just base this on a hunch, we actually have quite some experience in this field: most of us have worked on products like Xopus before. Xopus is an awesome, browser based wysiwyg XML editor. It consist of more than 120,000 lines of client-side Javascript code. A non-trivial part of that is code that completely works around standard Internet Explorer behavior because of its bugginess or complete lack of support. This isn’t about your father’s unsupported CSS selectors or the lack of addEventListener. We’re talking about stuff like having to write your own cursor becausecontentEditable becomes basically useless when working on complex documents. The amount of bugs that are related tocontentEditable, text ranges, drag and drop and the Document Object Model in general are staggering. Most JavaScript projects, including some popular libraries, don’t even deal with these advanced aspects of the browser at all.

Now obviously, the other browsers aren’t all free of bugs. Which brings us to the second problem with Internet Explorer: lack of real progress and transparency. Even if you consider relatively easy and popular features (such as support foraddEventListener), it’s hard to understand why they haven’t been implemented yet and if there is any timeline at all to implement them. That makes the probability of low visibility improvements and bug fixes in the rendering or selection code practically zero. The contrast with the open development of Mozilla and WebKit is huge, where almost everything is publicly discussed and with a focus on constant improvement of the rendering engines and pretty clear timeline.

The current state of the web is actually very exciting right now, if we ignore Internet Explorer for a moment. Thanks to HTML5 there is a lot of progress allowing us to make almost desktop class applications, with support for things like drag and drop from the desktop, background processes and offline support (To be fair, Internet Explorer 8 does provide some offline support.). All of this should greatly improve the user experience with web applications and bridge the gap with desktop applications. No amount of code or smart engineering will allow us to bring that level of experience to browsers that lack these features.

Does this mean that no complex web application should support Internet Explorer? Obviously, it depends on many factors. There are large differences between applications that are targeted to tech-savvy users (where Internet Explorer is quickly becoming a minority browser) or to large organizations (where Internet Explorer 6 is still widely used). We’re a small team and we have to prioritize our development goals aggressively. Large teams with lots of resources are in a different situation altogether. That being said, we were pleased to see that the Google Wave team also chose to drop Internet Explorer support and having Google develope the Chrome Frame plug-in.

Finally, although this might seem less important than the above considerations from a business perspective, there is the loss of friction and return of enjoyment in developing web applications again. Not developing for Internet Explorer means that we can do amazing things with CSS, use new Javascript features in our codebase, and in general rediscover the excitement of the possibilities of the web. And that makes us love web development again.

engineering

Mutation events: what happens?

By: Jurian October 8th 2009

Since typLAB is all about exploring new ways of creating and consuming online content we figured our software might want to keep track of what’s happening inside a document. All modern browsers have support for W3C’s mutation events. Safari, Chrome, FireFox and Opera all do them. But not all do all of them. Notably WebKit fails to fire DOMAttrModified events when an attribute is changed. It does however fire the DOMSubtreeModified event after an attribute is modified. So at least that gives us something to work with until the good folks at WebKit squash the bug.

Here is how we fixed the lack of DOMAttrModified. First we need to detect whether the fix is needed:

var attrModifiedWorks = false;
var listener = function(){ attrModifiedWorks = true; };
document.documentElement.addEventListener("DOMAttrModified", listener, false);
document.documentElement.setAttribute("___TEST___", true);
document.documentElement.removeAttribute("___TEST___", true);
document.documentElement.removeEventListener("DOMAttrModified", listener, false);

The code is straightforward. Add an attribute and have a listener register the firing of the subsequent DOMAttrModified event. If the event is not fired our repair code kicks in:

if (!attrModifiedWorks) 
{

Next we store and override HTMLElement.setAttribute:

HTMLElement.prototype.__setAttribute = HTMLElement.prototype.setAttribute

HTMLElement.prototype.setAttribute = function(attrName, newVal)
{
  var prevVal = this.getAttribute(attrName);
  this.__setAttribute(attrName, newVal);
  newVal = this.getAttribute(attrName);
  if (newVal != prevVal)
  {
    var evt = document.createEvent("MutationEvent");
    evt.initMutationEvent(
      "DOMAttrModified",
      true,
      false,
      this,
      prevVal || "",
      newVal || "",
      attrName,
      (prevVal == null) ? evt.ADDITION : evt.MODIFICATION
    );
    this.dispatchEvent(evt);
  }
}

The new code fetches the current value of the attribute, soon to become the previous value. It then proceeds to set the attribute using the original setAttribute method that we stored. We don’t know whether that method does fancy stuff to the new attribute value, so, just to be sure, we fetch the new value by calling getAttribute once again. If and only if the new and previous value differ we proceed to dispatch the appropriately initialised mutation event. This covers added and modified attributes. But it won’t help us with removed attributes. For those we can override the removeAttribute method of HTMLElement:

HTMLElement.prototype.__removeAttribute = HTMLElement.prototype.removeAttribute;
HTMLElement.prototype.removeAttribute = function(attrName)
{
  var prevVal = this.getAttribute(attrName);
  this.__removeAttribute(attrName);
  var evt = document.createEvent("MutationEvent");
  evt.initMutationEvent(
    "DOMAttrModified",
    true,
    false,
    this,
    prevVal,
    "",
    attrName,
    evt.REMOVAL
  );
  this.dispatchEvent(evt);
}

This concludes our fix for the lack of DOMAttrModified in WebKit. Is this fix perfect? Nope. Some known issues:

  • a DOMSubtreeModified event is fired before instead of after the (artificial) DOMAttrModified event
  • assigning a value to an attribute will not trigger our setAttribute method. Most noticeably assigning a value to a className or id attribute will not result in the appropriate DOMAttrModified event

We’re open to suggestions. But best would be if some WebKit developer would fix the bug so we can throw this code away.

engineeringmutationwebkit
Older

This is Silk’s Engineering blog

Silk is a platform to create, share and find information, such as your favorite places or recipes, information related to a project, your investment portfolio or stats about the countries of the world.

Create a Silk site
Silk
  • About
  • Jobs
  • Developers
  • Twitter
  • Facebook
  • Github
  • Blog
  • Engineering Blog