Gathering Code Coverage for Sandboxed Processes // Eric Woo

Motivation

While gathering code coverage for youtube-unthrottle, I encountered an interesting catch-22: tests exercising sandboxing features based on seccomp-bpf and Landlock could not write coverage data to disk because of the sandbox restrictions themselves.

Code Coverage Approach #1: gcc

The default behavior of gcc- and gcov-based code coverage is to open and write *.gcda coverage files on process exit.

My first thought, upon encountering a failure to write these *.gcda files, was to investigate whether gcc supports options or APIs for customizing this behavior: writing the coverage data elsewhere, opening the coverage files earlier in process startup, or similar.

Searching on this topic led me to gcc’s documentation on freestanding environments, which described how to work with systems that lack a filesystem to store the resulting coverage data (seemingly equivalent to a sandboxed program that cannot make filesystem-related syscalls).

However, I was discouraged by the following:

custom linker script requirement
explicit assumption of GNU linker in particular, with an the implied possibility of incompatibility with other linkers like gold, lld, and mold
circular-seeming test procedure, involving a no-op *.gcda output from one run being passed to a second run via stdin
unclear encoding/decoding requirements for in-memory coverage data

Code Coverage Approach #2: clang

Given the above, I was not confident that a gcc-based solution would be maintainable in the longer term. At the very least, it would take some time and experimentation for me to build a working understanding of what each step in the gcc-based approach actually does (not obvious to me, even after some re-reading).

Put another way, although gcc and gcov made up most of my prior experience with C/C++ code coverage (plus some work with Coverity), the freestanding environment requirements would be putting me in new territory regardless of past experience with the tools more generally. As a result, gcc and gcov had no practical familiarity advantage.

Putting it all together, research into other approaches seemed wise.

Searching for alternatives led to clang’s equivalent support for freestanding environments.

Happily, clang’s code coverage procedure for freestanding environments seemed more straightforward than gcc’s, simple enough to be excerpted here:

The first step is to export __llvm_profile_runtime […] to disable the default static initializers. Instead of calling the *_file() APIs […], use the following to save the profile directly to a buffer under your control:

Forward-declare uint64_t __llvm_profile_get_size_for_buffer(void) and call it to determine the size of the profile. You’ll need to allocate a buffer of this size.

Forward-declare int __llvm_profile_write_buffer(char *Buffer) and call it to copy the current counters to Buffer, which is expected to already be allocated and big enough for the profile.

Following these instructions and then writing the buffered data to a file opened early in process startup (before any sandboxing has occurred) produced a working solution in short order.

Results

Coverage hooks: coverage_open(), coverage_write_and_close()

Example test integration: ./tests/sandbox/seccomp.c

Interactive usage instructions: README.md

CI integration: ./scripts/coverage.sh:21-41, .gitlab-ci.yml:61-66