Hannes Mehnert on MirageOS and OCaml: “Functional programming is about better code maintenance and program understanding”
Our backend engineer, Pavel Argentov, traveled to Marrakech, Morocco to attend the ninth MirageOS retreat, which was held from March 13-19, 2020. The goal of the event is to bring both experienced and brand new MirageOS users together to collaborate and sync various MirageOS subprojects, start new ones, and help each other fix bugs.
MirageOS is a library operating system that constructs unikernels for secure, high-performance network applications across a variety of cloud computing and mobile platforms. The code can be developed on Linux or Mac OS X and then compiled into a fully standalone, specialized unikernel that runs under a Xen or KVM hypervisor.
At the event, Pavel spoke with Hannes Mehnert, the co-author of MirageOS and host of the event, about his work with MirageOS and OCaml. He gave us some details about his contributions to MirageOS and why he joined the project. He also explained the benefits of functional programming and why he was initially drawn to it. In addition, he broke down the potential, and limitations, of MirageOS and OCaml and gave us some information on new developments and what’s to come. We’ve included the full transcript of the interview below, so you can get the latest info, straight from the best source.
Pavel: I think we should start by speaking about OCaml. How and why did you start working with OCaml?
Hannes: Six years ago, when I had just finished my Ph.D. in formal verification of software, I was used to taking some random, already-developed software, applying some specifications to it, and then writing some proofs that the program was actually correct. That turned out to be rather complex and work-intensive, due to the ubiquitous use of shared mutable state. For quite a long time, I've been very interested in systems programming, which usually means using C and writing your operating system in it. But given my semantics background, I was more hoping to use a high-level language for writing operating systems. So, after finishing my Ph.D., I stumbled upon MirageOS, together with my friend David Kaloper.
MirageOS is written in OCaml, which is a multi-paradigm language that has a module system and is used for functional programming. That means that you can avoid shared mutable state and actually verify the programs on the operating systems. When I came to MirageOS around six years ago, it was already working to some extent, and my first contribution was the TLS stack and cryptographic algorithms.
Pavel: How MirageOS is used, and what can we get out of it?
Hannes: MirageOS started as a research project. We had a prototype and an idea on how to use different styles of programming for operating systems. My background is also very deep in security, and that was my main motivation for contributing to MirageOS and trying to get it into production. From a security perspective, here you have less mutable state and you can run HTTPS or web server with TLS. And you have much less code, which means less bugs and less resource usage, because if you don't have to run that much code, you don't waste so many CPU cycles and so much memory.
Pavel: Let’s talk about the TLS. Very often you might hit the limitation of the hardware and everything will be slow because the crypto algorithms are slow. How does OCaml solve this problem, and does it solve the problem of speed at all? Does OCaml allow you to make the code fast?
Hannes: Yes, OCaml itself has a very fast runtime. We have a garbage collector (a memory manager) which is collecting very fast. The question is basically whether or not OCaml allows you to write a decent enough interface to pass the arguments properly and not waste too much CPU time. It turns out that it is fast enough. I'm happy to use a reasonable programming language, instead of a low-level micro assembler.
And the other side of TLS is handshakes. It’s asymmetric cryptography, and in order to make that fast, we use a library called the GMP/GNU Multi-Precision library. In OCaml, we just have bindings for that, but they are the exceptions. Usually we try not to write bindings and not use too much C code. Most complex parts of decryption and encryption are still in OCaml, not in C.
Pavel: Haskell programmers and other high-level languages programmers are concerned about the performance of the garbage collector, saying it slows things down. In Haskell, they can't write any kind of “soft real-time applications”. Do you think OCaml can do that? Is OCaml’s garbage collector fast enough to perform in use cases which require speed?
Hannes: Yeah, I think so. Haskell has a completely different runtime, it has lazy evaluation by default. And OCaml is strict, we just do the computation as we go along. The garbage collector is very well-tuned for workloads, it’s really fast, and I believe that, in OCaml, “soft, real-time applications” are doable.
Pavel: As far as I know, the “unikernel” as a concept isn't unique to OCaml anymore. What was the history of unikernels? Was the name of the idea different when it started? How did people come to the idea of unikernels at all?
Hannes: I think it all started at the University of Cambridge, from the theoretical papers about the so-called Exokernel. People needed an instrument, a system which would be task-focused, less resource consuming, easily written, and easily adaptable.
Pavel: OK. As far as I know, MirageOS uses the Lwt library. Is Lwt performant enough to do some reasonable load, if you have a DNS server, which has to respond quickly on multiple directions at once? Does it work fast enough?
Hannes: I think it works reasonably well. A good application example for MirageOS is the Firewall, which is integrated into Qubes OS. Qubes OS is an operating system which uses Xen. The goal of Qubes OS is, for example, to have your mail application separated from the PDF renderer. So if you receive an email with a malicious PDF, once you view it, it shouldn't be able to access all of your mail. Instead, you save the PDF and push it to a different virtual machine. And that different virtual machine has the code to run the PDF renderer.
So, that PDF is only opened and rendered in an isolated environment. MirageOS fits in here pretty well because it has a much smaller memory footprint. We can just set up the Firewall as one of the components inside of one of the virtual machines inside of the Qubes OS environment and receive packets from other virtual machines, which have access to the network. The MirageOS unikernel works as a router which routes the packets.
Pavel: You said something about MirageOS memory consumption. How much memory can it really have? What are the lower or upper limits? I've heard that MirageOS can’t be configured for memories bigger than 1GB. Are there really such limitations?
Hannes: Well, at the moment, yes. The minimal amount of memory OCaml runtime and MirageOS unikernels need is 10 megabytes, and the upper limit, at the moment, is 1GB of memory. But that can be easily tuned, basically, if you have demand for more memory. My DNS services, for example, require around 14-24 megabytes of memory. That's not millions of records, but more like hundreds of records. And the web services I run usually have between 32 and 128 megabytes of memory. And that is sufficient to store the data.
Pavel: Have you worked with the Irmin data store? As far as I know, it's kind of like Git, and it's the only data store written in OCaml for MirageOS.
Hannes: Yeah. Irmin is a branchable, immutable store. I usually don't use Irmin directly, but I use Irmin via the Git implementation, which uses it in the background. For example, my DNS server stores its zone file in a remote Git repository, it just fetches the repository, clones it into the memory, and then serves data from there. In 2019, Irmin had a major release, Irmin 2.0.
Pavel: Well, let's switch a bit to the format of the gathering. Could you tell us a couple of things about what MirageOS retreat is? How did you come up with this idea?
Hannes: I got a lot of inspiration from different conferences, and also from the OpenBSD hackathons. The basic idea is to gather a nice group of people. You are in a nice location, where you have nice weather, food, sunshine, and you can actually enjoy the environment. It's crucial to me that the people stay together all day and communicate with each other. There's no strict schedule. There's a daily round of updates on who did what, who’s interested in what, and who's stuck at what specific point. Other people may jump in and may have a solution for them. Random people start discussing problems and solutions, while other people are just busy writing some code.
On one hand, I try to get people here who are long established in the community and have some experience and some ideas about the different libraries and the ecosystem, to discuss fundamental changes in the ecosystem while here. But also, I always appreciate having some new people here, to have new ideas and people who we can actually integrate into the group and get them to program some OCaml and some MirageOS, in order to grow the community. It's not exclusively for people who already know MirageOS or have written in OCaml for several years, it's open to everybody who's willing to take a trip to Marrakech.
Pavel: That’s great! Do you think functional programming affects the programmer’s way of thinking? When I first started writing OCaml code, I started to understand that there are types which can be transformed. And this caused me to think first of the types and the meaning of data I work with. I know that functional programming in Europe is a part of the programming scholarship at the basic level. As far as I know, most students in Russia learn how to program starting with imperative techniques, and they almost never get out of that.
Hannes: Yeah. I think a lot about types and apply quite a lot of type-driven development before writing actual code. So, when I write programs in a functional language, first I think about what the types should look like. Once I get the types in the right shape, all the implementation becomes much easier. For me, it is also about code maintenance and localized program understanding in functional programming. And I think it's much easier to understand my code five years later when it’s written in a functional language, where I don't overuse a lot of syntactic sugar and features, than it is to develop that code in imperative language and have hundreds of lines in a function. I try to keep the functions rather short and understandable. Yes, functional programming shapes your brain to think about the program.
Pavel: I see that monads are making their way into different languages. We have them in Ruby and in C++. Is it just a way of implementing some academic knowledge in day-to-day programming?
Hannes: I think it is a viable instrument, but it is very hard to comprehend if you haven't discovered monads yourself. Trying to explain monads to a new imperative programmer is very hard. We still use monads in MirageOS and in OCaml, but hopefully, with the multicore branch becoming part of the OCaml runtime at some point this year, we will get over that.
Pavel: Let's talk a bit about open-source. Everything we have been speaking about is open-source. There is a point of view that tech only succeeds when it has enough money pumped into it. While open-source consumes our efforts and our time, it doesn't really bring in money. When you are evangelizing some new tech in an open community, you sooner or later reach the idea of an open-source collaboration. How important is open-source, in your opinion?
Hannes: I think open-source is a crucial factor. Most of the stuff we do is actually developing libraries, OCaml libraries, which are then used in MirageOS unikernels. And everybody should be able to freely mix and match them together. When I write a TLS stack or a DNS implementation, I have a strong incentive to open-source all that, because then other people can reuse it. I enjoy writing software, and it makes me happy if anyone is using that software, be it an individual or a company using it for profit. That's fine with me.
In MirageOS, most of the software is under a BSD license, so everybody can use it and do whatever they want with it. I think it’s very important to have a license. Everybody can understand the GPL, but there are tons of pages of text, while BSD has two or three paragraphs, and it is usually written in 25 lines of text. And if you also want to convince an industry to use some of your software, it’s better if you use a permissive license. You’ll have a much easier time convincing them, because, if you use a GPL license, it may be a bit harder to convince lawyers that it's a good idea. In MirageOS, for example, we have code contributions from IBM Research, and we managed to convince them to use a very permissive license, which hasn't been easy because lawyers usually want to stick to trademark.
Pavel: I've read that you're working for a company which sells unikernel development. What is it like working on a tech which isn't selling, let’s say, established, well-known imperative programming?
Hannes: I work at a nonprofit company called Robur. We work on grants, donations, and commercial contracts to enhance the MirageOS ecosystem and to develop unikernels.
Over the last year, we've gotten some funding from the public. From Germany and the European Union, we got some grants to develop certain applications, like OpenVPN Gateway, and at the moment we are getting funding from the European Union to work on a DNSmasq, which is one of the crucial components in everybody's network. And that’s pretty wonderful.
Pavel: How fast does MirageOS develop over time? Is it developing fast and growing new features?
Hannes: The development is always quite slow, but we also do quite a lot of work. We try to get rid of our technical debt and adapt to modern build systems, which sometimes takes more time than the other projects. In terms of features, it is mainly about new libraries being developed. We talked briefly about the Irmin DataStore, and its 2.0 release was a major milestone, which was only reached last year. There is also an upcoming TLS 1.3 stack. As for MirageOS, we're now heading towards a 4.0 version, and it will definitely improve the development experience quite radically by getting rid of the old “ocamlbuild” and replacing them with a new build system called “dune”, which features incremental builds.
Pavel: Well, let's conclude our talk with an encouraging statement to the developers that might learn MirageOS, embrace OCaml, and stop fearing functional programming as a theoretical mind-eater. How would you encourage people?
Hannes: The good thing about FP is the level of control you have over rather complex code. In functional programming, if you spot a high-level bug, you could be able to debug it down to the lowest level and fix within a single weekend, while doing that on common operating systems is just impossible, due to the size of the codebase and involved libraries.
You have control over the entire stack. It is full-stack development, from the level of network device card until the business logic and real application runs.
Here at Evrone, we strive to stay on top of new tech developments and embrace innovative new tools and methods. This allows us to use the optimal resources to provide our clients with the very best solutions to meet their unique needs. We work with a wide variety of programming languages and tools, and we highly encourage our team members to attend and contribute to tech conferences and events, such as the MirageOS retreat. If you have an idea that you’d like to develop, let us know how to contact you, and we’ll be in touch soon to discuss your project and how we can help.