Saturday, 24 May 2014

Moving from SCons to CMake

TL;DR: I'm happy with CMake + Ninja, even though the language is ugly and the speedup is not that impressive.

I'm not ashamed to admit it: I stayed away from CMake because the language seemed inelegant.


Elegance and beauty can be defined in many ways but context is apparently important. Maybe this is why the most beautiful language in the world is relegated to being used as an extension language for my editor rather than to run the world.

I am currently in the middle of a project moving from SCons to CMake and I am pleasantly surprised at the quick progress. The SCons project has about 3KLOC of SCons build scripts that do everything from generating code for language bindings to building 20-something external C and C++ libraries. My experience with this larger project has also convinced me to move over my JIRA time tracker, because of all the other things I should be doing, the most important thing is moving over the build system for no particular reason at all.

SCons served the larger project well when it started. Python was an easy language to learn, if not master and so most people could read the code and make little modifications when they needed to. As the project grew and things got more complicated to manage, we started mimicking other build systems to add features that SCons did not have. The most important feature we cribbed was the "use requirement" feature from Boost Build. In a nutshell, it allows you to attach "arbitrary" requirements to a target that you are consuming. So if you use SomeFunkyDll.dll in your project, you may want to require that the library clients define SOME_FUNKY_DLL when they consume it. Automatically doing this in SCons was a big pain in the rear. I don't need to tell you how we did it because it is not relevant but I wanted to give you an idea of the types of things we found important.

Another problem that the project started having is once we passed some critical mass of source code, the build seemed to slow down significantly. In the old days, SCons could be slow as molasses because it wanted to be correct over everything else. We tried everything in the "go fast" recommendation list but at best, a no-op build took a long time and this was resulting in too many sword fights.

Eventually, we got to the point where we thought that working on this project was becoming inefficient so we looked for alternatives. I had always had CMake in the back of my mind as I had actually written my own makefile generator for another project. The architectural separation of specification and the actual build actions allowed optimizations at different levels not unlike a compiler which allowed me to add many more features than I would have been able to had I had to also develop the build action logic.

I won't be talking about the larger project yet (maybe ever) but I will talk about my experience moving over Worklog Assistant's build system. Just so that there is a frame of reference:

  • Number of targets in build (including third-party files): ~1K
  • Lines of app code (excluding third-party files): ~20K

That is, it's not a huge project.

A post on build systems can go on forever, but the performance of the build for each of these activities is what actually matters to me when I'm doing my daily work:

  • Clean build
  • Modifying a core header file (aka a file that nearly everything depends on)
  • Modifying any cpp file in the build
  • Modifying an important, but not core header file
  • For some reason this seems to be important when comparing build systems: building when nothing changed. I don't personally care about this metric.
Each of these will have some sort of relative importance and for me, the activities where build performance matters most are modifying non-core source and header files. The rest have minimal importance.

The CMake language, other weirdness and leaky abstractions

My initial aversion to CMake was likely due to the fact that it was clear to me there would be weird conventions and things that worked kind of differently depending on the backend chosen. I was not wrong in this assumption, but I was wrong in how much they would bother me: not much. Don't get me wrong, it's frustrating not completely understanding why changing an add_custom_target to add_custom_command makes things work "better" with generated files, but by and large, there are few such design bugs.

The biggest reaction any decent developer probably has when being introduced to CMake is the language. It just looks bad. However, while the language is not efficient by any means, nor is it beautiful, it is predictable and regular which makes it easy to learn. After the initial shock of the aesthetics of the language, you settle into it pretty easily. Thought: perhaps the language being intellectually unappealing encourages the developer to spend as little time in it as possible making him/her quite efficient. Did they learn this from Java? Hmm.

Different build systems

I have not launched Visual Studio to do a build since perhaps 2008. There are many reasons to do so however. Comfort and familiarity being one. Incredibuild is another. I think this is the main reason why CMake will eclipse all other C++ build systems: it is a 1-to-N solution. That is, one team's preference to fulfill their requirements can support the particular preference of N other teams to fulfill theirs. This is not possible with other build systems.

Third-party library support

What surprised me the most was how often there were CMake modules available to use for $MY_FAVOURITE_LIBRARY. This allowed me to treat the building of third-party code as a black box by simply adding a call to add_subdirectory or adding the appropriate path to CMAKE_MODULE_PATH and adding a call to find_package. CMake would just use the right compiler and the right flags to avoid build incompatibilities. Previously, I'd just write build scripts for the third-party code myself. Win.

A highly unscientific test with lots of numbers to look official

Here is a nice chart:
All tests are run with the C++ compiler that comes with Visual Studio 2013 SP1.
  • scons -j8: A fairly well-optimized SCons build using many of the tips from the SCons wiki, run with 8 parallel jobs. Source lists are built dynamically, using a recursive glob + wildcard filter.
  • cmake + ninja 1: An unoptimized CMake (2.8.12.2) build using the Ninja stable release from September 2013. Source lists are built with a preprocessing step and some elisp to make maintenance of the preprocessed file easier. See gist.
  • cmake + ninja 2: Same as cmake + ninja 1, but configured with CMAKE_LINK_DEPENDS_NO_SHARED=ON
  • cmake 3.0 + ninja: Same as cmake + ninja 1, but using cmake 3.0
  • cmake + ninja latest: Same as cmake + ninja 1, but using Ninja from git
This looks pretty OK! CMake and Ninja beat SCons. But this is only part of the story. Here is the actual data:

Importancescons -j8cmake + ninja 1cmake + ninja 2cmake 3.0 + ninjacmake + ninja latestcmake 1 % speedupcmake 2 % speedupcmake 3.0 % speedupcmake + ninja latest % speedup over scons
Clean build5%224279264266261-24.55%-17.86%-18.75%-16.52%
Touch core header file and rebuild5%165178174180175-7.88%-5.45%-9.09%-6.06%
Touch a source file and rebuild50%463535353523.91%23.91%23.91%23.91%
Touch an arbitrary non-core header file and rebuild40%606865.56665-13.33%-9.17%-10.00%-8.33%
Do-nothing and build0%100.50.50.20.4495.00%95.00%98.00%95.60%
Weighted average100%5.00%7.12%6.56%7.49%
The Ninja build sees only a significant improvement over SCons when modifying source files but it has a terrible result when it comes to touching header files. I have my own theories as to why, but running ninja -d stats showed that loading the dependency file was the biggest slowdown.

The weighted average is computed using the "importance" column. I think touching source files should perhaps have a higher importance than I've given it above. Do I really edit header files as often as source files? I don't have the answer to that so as a proxy, I just ran a simple script counting how many header files vs source files were touched in a 6 month period and applied the ratio as a weight. My gut feeling is that I probably spend more time editing source files than header files.

Conclusions

Not as clear cut a win as I had hoped: a well-optimized SCons build can still be faster for some sizes of projects and it is possibly more correct. However, the equivalent SCons build does require more maintenance to provide the same features as one in CMake. I found that compiler/toolchain support is much better with CMake than with SCons and I write less code to do the same (complex) thing. Occasionally, CMake's abstractions leak making it awkward to do simple things like generating files, which are easy with SCons. On the other hand, CMake makes it easy to add per-file compile options which are difficult with SCons.

Since the performance for this particular example is roughly equal, I have to choose the system that makes me feel more productive. And by that metric, CMake + Ninja make me "feel" more productive only because Ninja starts building immediately.

What matters even more: the SCons community has always been small. The CMake community on the other hand is huge in comparison and is only growing. I expect performance to improve and more backend options to become available that improve performance.

I had about 4 or 5 years of SCons experience that went into the build for this project. This means there are a lot of little performance tricks I applied and as my experience with CMake is not nearly as long, it isn't a fair comparison. I should be ashamed of expecting a free speedup, but the speedup I did see for an important use case makes me feel as if it was worth it. 

On the upside, I no longer have to wait for SCons to start building something after hitting compile.

Have you transitioned to/from CMake? I'd be curious to hear about your experiences.

Interesting links

Monday, 12 May 2014

So I was right about the vector, wrong about the players

You will cringe hard when you read how I wrote about C# many years ago but I had honestly expected this to happen to C#, not Java. Maybe having C# as a ISO standard actually had some benefits.

Will there be a Heartbleed-like crisis where these ideas will cause a reckoning for the industry? As I understand it, the licensing for implementing the x86/x86_64 ISAs reached monopoly level years ago, actively shutting out competition. The main hardware innovation is now coming from mobile where x86 can't yet compete. PC sales have been leveling out ever since the introduction of iStuff. I don't think the three are correlated, but it is interesting.

PS: If you still want to be friends, do not read the rest of the content on that first link. Oh man.