maanantai 23. huhtikuuta 2018

Dependencies with code generators got a lot smoother with Meson 0.46.0

Most dependencies are libraries. Almost all build systems can find dependency libraries from the system using e.g. pkg-config. Some can build dependencies from source. Some, like Meson, can do both and toggle between them transparently. Library dependencies might not be a fully solved problem but we as a community have a fairly good grasp on how to make them work.

However there are some dependencies where this is not enough. A fairly common case is to have a dependency that has some sort of a source code generator. Examples of this include Protocol Buffers, Qt's moc and glib-mkenums and other tools that come with Glib. The common solution is to look up these binaries from PATH. This works for dependencies that are already installed on the system but fails quite badly when the dependencies are built as subprojects. Bootstrapping is also a bit trickier because you may need to write custom code in the project that provides the executables.

Version 0.46.0 which shipped yesterday has new functionality that makes this use case noticeably simpler. In Meson you find the code generator scripts to run with the find_program command like this:

mkenums_exe = find_program('glib-mkenums')

This will find the executable from the system. However if you have built Glib as a subproject, then it can issue the following statements (this is not in Glib master yet AFAIK so it does not work, this is more of an illustrative example):

internal_mkenums_exe = <command to generate the mkenum script>
meson.override_find_program('glib-mkenums', internal_mkenums_exe)

After this issuing find_program('glib-mkenums') no longer goes to the system, but instead returns the internal program. Meson's internal helper modules have also been updated to always find the programs they use with find_program. This means that all projects using Glib functionality can be built without needing a system wide install of Glib. Even more importantly this requires zero changes in existing projects. It will just work out of the box. You can even use Glib helper code when building Glib itself.

This is especially convenient when you need a newer version of any dependency than your distro provides and especially on platforms such as Windows where "distro dependencies" do not exist.

As an example of what is possible, Nirbheek has managed to bootstrap GStreamer on Windows using nothing but Visual Studio, Python 3, Ninja and Meson. The main limitation currently is that the overriding executable may not be a build target (i.e. something you build from source with a compiler) because the result of find_program may be used during the configuration phase, before any source code compilation has taken place. We hope to remove this limitation in a future release.

sunnuntai 8. huhtikuuta 2018

Cookie purging the simple way

Getting rid of cookies (especially tracking and ad cookies) consistently is a good thing. However it turns out to be a bit tricky because you don't want to get rid of session cookies for sites you care about. Basically what you want to achieve is this:

  1. Store all cookies as normal
  2. Maintain a whitelist of servers that are allowed to store persistent cookies (usually for sites such as Github, Reddit, Twitter and the like)
  3. At regular intervals (preferably every time the browser is closed), delete all cookies not whitelisted.
There are browser extensions to do this but they are often bizarrely complex and even those that aren't are inconvenient to use as they require installing plugins, clicking through menus and so on. Firefox should have builtin functionality to do this also, but I read through instructions online on how to do it and could not understand how you should set it up to get it to work.

Thus as an experiment I wrote a Python script to do this, it is available in this Github repo. Using it is simple:

  1. Write a whitelist file consisting of one hostname per line. (all subdomains of the specified host are also permitted)
  2. Shut down Firefox.
  3. Run the script.
  4. Start Firefox.

keskiviikko 4. huhtikuuta 2018

Comparing Meson with Bazel on a Raspberry Pi 3

In this experiment we compile Google's Abseil C++ libraries with Bazel and also with Meson as a simple comparison of how they behave.

Apple and orange warning!

Please do not use this text to exclaim that one of these build systems is better/more performant/etc than the other. That's not really what this is for. The two build systems build the code completely differently to different artifacts and with different targets. Consider this a more of a rough outline.

The Meson conversion

The code was converted with a simple script and then manually fixing some dependency declarations to make everything work. Due to complicated reasons there's no Git repo. Instead you can download the whole shebang as a zip file from this location. It's not against current trunk, but instead a random commit from some time ago when I started.

The original Bazel build does a bunch of complicated things with shared libraries and the like. The Meson one simply builds a static library for each of the Abseil modules. This is not particularly efficient but I just wanted to get something out without spending days replicating the build setup.

The build setup is complete enough to build all unit tests apart from ones that seem to require magic compiler flags, because they give compilation errors about missing timespec definitions.

Memory usage

When doing the actual compilation, Meson is not resident in memory. Only Ninja is, and it takes 2-5 MB of memory in total.

The Bazel master process takes roughly 90 MB when compiling. That is almost 10% of total system memory.

CPU utilization

Based on top/htop eyeballing, Ninja keeps all cores pegged almost all of the time. The time command reported CPU usage of 328%. It was necessary to manually specify -j 4 to Ninja (the default value is 6) because otherwise the system would hard freeze under load, most likely due to memory running out.

Weirdly Bazel had a really hard time keeping cores running. It was common to have 1-3 cores idle (that is, not even waiting for IO) during the build. It is not known what causes this. Perhaps a lot of time is spent doing the file copies and symlinks that Bazel needs for its hermetic builds. But even then, maximal usage of resources is one of Bazel's claimed strong points but in this particular case that does not seem to be happening. It is possible that the behaviour has been tuned to data centers with tens of cores and fast SSDs and because of that does not scale down to ARM processors with an SD card for storage.

Total compile time

Meson used 6 minutes whereas Bazel used 17 minutes. But note the text above! Do not use this as any sort of "real" perf measurement because the setups were different. That being said if you consider that Ninja gets up to 3x better CPU utilization, the numbers seem to be in the rough neighborhood as far as total CPU usage is concerned.

Ninja reports doing 190 build steps whereas Bazel reports a number on the order of 4-500. Many of these seem to deal with file copying and the like which the Meson setup does not do at all. Effort was not spent on examining what these steps do and how (or if) they could be replicated in Meson.

Which one is better/which one should I use/which subreddit should I post this to?

That's not what this post was about. Do your own tests and draw your own conclusions.

torstai 29. maaliskuuta 2018

And now for something completely different

If anyone has been wondering why merge requests have not been reviewed fast enough recently, here is one of the reasons.

keskiviikko 7. maaliskuuta 2018

Stop trying to guess display language based on keyboard layout

A very common setup among non-English speaking computer power users is to have display language in English but have a country specific keyboard layout. For example I'm using the Finnish keyboard because I need to be able to easily type our special letters ö and ä. If your computer comes from someone else (such as your employer) there is not even the possibility to have a non-Finnish keyboard layout.

All of this has worked for tens of years flawlessly. However recently I have noticed that many programs on multiple platforms seem to alter their display language based on keyboard layout, which is just plain wrong. Display language should be chosen based on the display language choice and nothing else.

I first noticed this in the output of ls, which one would imagine to have reached stability ages ago.


Here we see that ls has chosen to print months in Finnish. Why? I have no idea. This was weird on its own, but then it spread to other operating systems as well. For no reason at all the existing Gimp install switched its display language to Finnish.


Let me reiterate: no setting was changed and the version of Gimp was exactly the same. One day it just decided to change its language to Finnish.

Then the issue spread to Windows.


VLC on Windows has chosen on my behalf to show its menus in Finnish in a completely English Windows 7 install. The only things it could use for language detection are geolocation and keyboard settings and both of these are terrible ideas. The OS has a language. It is very clearly specified. All other applications obey it, VLC should too.

The real kicker here is that Gimp on Windows displays English text correctly, as does VLC on macOS.

The newest case is the new Gnome-ified Ubuntu, whose lock screen stubbornly displays dates in the wrong language. It also does not conjugate the words correctly and has that weird american month/date date ordering which is wrong for Finnish.


What is causing this?

I don't know. But whoever is behind this: please stop doing that.

sunnuntai 4. maaliskuuta 2018

Compiling Cargo crates natively with Meson

Recently we have been having discussions about how Rust and Meson should work together, especially for mixed language projects. One thing which multiple people have told me (over a time span of several years, actually) is that Rust is Special in that everyone uses crates for everything. Thus there is no point in having any sort of Rust support, the only true way is to blindly call Cargo and have it do everything exactly the way it wants to.

This seems like a reasonable recommendation so I did what every reasonable person would do and accepted this as is.

David Addison wearing cardboard x-ray goggles with caption "This is me being completely reasonable".

But then curiosity takes hold of you and you start to wonder. Is that really the case?

Converting Cargo manifests to Meson projects

The basic setup of a Cargo project is fairly straightforward. The file format is mostly declarative and most crates are simple libraries that have a few source files and a bunch of other crates they link against. Meson has the same primitives, so could they be automatically converted.

It turns out that for simple examples you can. Here is a sample repo that downloads the Itoa crate from github, converts the Cargo build definition into a Meson project and builds it as a subproject (with Meson, not Cargo) of the main project that uses it. This prototype turned out to require 71 lines of Python.

What about dependencies other than itoa?

The script currently only works for itoa, because crates.io does not seem to provide a web API for queries and the entire site is created with JavaScript so you can't even do web scraping easily. To get this working properly the only thing you'd need is a function to get from (crate name, version) to the git repo.

What about dependencies of dependencies?

They can be easily grabbed from Cargo.toml file. Combined with the above they could be downloaded and converted in the same fashion.

What doesn't work?

A lot. Unit tests are not built nor run, but they could be added fairly easily. This would require adding compile options so the actual source could be built with the unittest flags. This is some amount of work but Meson already supports a similar feature for D so adding it should not be a huge amount of work. Similarly docs are not generated.

What about build.rs?

Cargo provides a fairly simple project model and everything more complex should be handled by writing a build.rs program that does everything else necessary. This suffers from the same disadvantages as every Turing complete build system ever has, and these scripts are not in general possible to convert automatically.

However based on documentation the common case seems to be to call into external build tools to build dependency libraries in other languages. In a build system that builds both parts at the same time it would be possible to create a better UX for this (but again would obviously not be something you can convert automatically).

Could this actually work in practice with real world projects?

It might. It might not. Ignoring the previous segment no immediate showstopper has presented itself thus far. It might in the future. But you never know until you try.

keskiviikko 28. helmikuuta 2018

On the unoptimalities of language specific build systems

A fairly big recent trend has been the emergence of new programming languages that are meant to be compiled into machine code. The silent (and sometimes not so silent) goal of these languages has been to replace C and C++ as the dominant systems programming language.

All of these languages come with their own build system and dependency management optimised for that particular language. This makes sense as having a good developer experience is important and not having 20-30 years of legacy to carry with you means you can design and develop slick systems relatively easily. But, as always, there is a downside. Perhaps the main issue comes up pretty quickly when trying to combine said code with projects in other languages.

A common approach is for the programming language in question to bundle up all its dependencies as source in a big clump. Then the advocates will say that "it's simple, just call our build system from yours and it gets built". This seems simple but it uses the weasieliest of all weasel words: just. Whenever someone tells you to "just" do something, what they almost always do is trying to trivialise away the hardest part of the entire operation. So it is here as well.

When could it work?

There is one case where this approach works without problems. That is when the dependency builds into a single library with a C interface and it also ships the header and a pkg-config file to use it. This case is indistinguishable from a plain C library so it will work exactly the same. The dependency can be provided as a system package or built as a dependency in a Flatpak manifest or any other similar issue.

Unfortunately this system breaks down the second you want to do anything else. The most common requirement is to build all dependencies from source in a single build step. This is necessary on any platform that does not have a concept of "system" package manager. Many people also want to do this on Linux systems to, for example, build their project's trunk against their dependencies' trunks. This is where things fall down.

The myth of the build dir

Most people probably haven't thought about the build directory of their builds. The most common conception is that the build system just (there's that word again) compiles source code into object files and then targets and that the installation step merely copies the files out to the staging directory. This is not true in the slightest.

Build systems need to do a whole lot of stuff to make things workable directly from the build tree. Every build system does it slightly (and sometimes massively) differently. More importantly the way each build system does it is not stable. They are allowed to, and will, change the way the build tree is laid out at any time. Nothing inside the build tree is stable, not file formats, not directory layouts, nothing.

The problem with building source code with two different build systems in a single build is that eventually they need to work together. Libraries need to be linked. Sources need to be generated. Executables need to be run. That means joining two different completely unstable elements together. The simplest problem in this space is about file layouts. Every build system expects a certain layout for the files it manages. This is usually very different from other build systems. Thus in order to work, there would need to be a way for every build system to be told to adapt to a different system's file layout when run as its subtask.

This is a challenging place to be requesting, because it takes a lot menial work that build systems have traditionally (ever, actually) been unwilling to do. Guessing the subtask's layout and hoping that it does not change might work for any amount of time and then breaks for the slightest of reasons. The problems only get harder from there.

N^2 manual work algorithms are awesome!

Even if this would work (and it does not) the next problem comes from scaling up. You can only "just call" from one build system to another if someone has taken the time to make one understand the other. This is simple for two build systems: you need to write two integrations, one in each direction. But suppose we live in a world where many of the common C libraries in use today have been replaced by implementations in another languages. If you were doing cross platform mobile development then you could have C, C++, Java, D, Rust, Go and Swift in the same project.

Seven languages means seven different build systems and possibly more since C and C++ commonly have more than one dominant build system. This means reading and understanding seven different build system syntaxes and mental models. If you want to combine those freely it means writing 7 x 7 = 49 different build system integrations who must, lest it be forgotten, combine the unstable innards of all of these. And then it gets worse.

Since every language has its own package manager and dependency downloader, you now have up to seven package managers in your project. Actually no, that is a lie.

The tangled web of lies and deceit dependencies

When talking about dependencies between projects in different languages, most people usually mean a dependency graph like this.

That is, there is one dependency of a single language and a second one of a different language that uses it. For this simple case most things are feasible. But let's see what happens when we add just one more project.

Here we have project 1 using language 1. It has a dependency to project 2 in language 2. However project 2 has an internal dependency on project 3 which is also written in language 1. The question now becomes: how should this be built?

Since languages 1 and 2 use their own build tool and language manager, the two edgemost projects don't know that they are being built as part of the same project. Language 2 completely hides its dependency, as it should. The two projects need to work independently. This means that each one of them must determine its dependencies in isolation. If they download their dependencies during configuration time then for each build setup you are accessing the dependency provider twice. Doing more dependency resolutions than you have languages in your project seems suboptimal.

The other approach to this is usually called vendoring. In this each project in a language is only used as a tarball and it embeds all its own dependencies as source code. This seems like a working solution but it's not really. Many modern languages go the NPM route where it is considered good practice to have many small dependencies. It is not uncommon for medium or even small projects to have 50+ dependencies. This leads to problems such as these:


Here project 1 depends on two different projects that are both implemented in language 1. Just like above these two projects don't know of each other because their dependency chain goes via language 2 that hides it. Both of these projects have their internal dependencies embedded so they can be built from scratch without problems.

The problem here is that due to basic popularity and probability theory, the embedded dependencies of these two projects have many of the same dependencies. The dependencies might have the same versions or they might differ. If they both end up in the toplevel executable you get, depending on your toolchain and the phase of the moon, either a working binary or the nastiest of linker bugs to fix.

Even if this yields a working program there is a big downside: compilation time takes up to twice as long because you have to compile the same dependencies twice in different but isolated parts of the build tree. As a rough approximation this means that adding a dependency to a dependency graph like this goes from being a O(1) more work to being O(N) more work because dependency graphs can not be deduplicated if there is a dependency of a different language between the two. It is left as an exercise to the reader to visualize what this would look like on a huge project such as Chromium.

The simple solution

There is a simple solution to this problem and it is very popular among language zealots: reducing the number of languages to one by claiming that in the future everything will be written in their own favourite language. It does not matter what the growth rate of complexity is if it will only be evaluated for the value 1.

The reduction of programming languages to one is expected to happen any minute now, immediately after mr Godot brings us the news on Eastasia's surrender.

The real problem

All of this boils down to the fact that language specific build systems are two opposing things at the same time. They are both a very comfortable gilded cage and an extremely isolating silo. They fertilise and promote cooperation within their own group but make things a lot harder for cooperation between groups.

One of the things we learn from history is that people who have opposed cooperation have, ultimately, lost to those who have promoted it. Maybe we should heed the teachings of history and start working towards better, more encompassing dependency management.