Test Infrastructure: preferably not Google Test!

Introduction

We recently discussed how tests of Beman projects should be implemented. There was a post (by Louis Dionne) somewhere I can’t find arguing that tests based on assert() potentially allow their use by standard library implementations: using the tests from corresponding Beman projects seems really attractive, even if they are only part of the actually used tests. The tests for these libraries can’t really depend on testing frameworks depending on the standard library itself, especially if they have a link time component: the standard libraries are frequently tested with multiple configurations which aren’t necessary binary compatible. As a result, something like Google Test is pretty much out of question.

On the other hand, I’d rather have people contributing projects and testing with Google Test than either not having the corresponding projects contributed or not have them tested - both would be worse outcomes. Thus, the approach to testing will certainly be a guideline to which deviations are allowed. It is still worth outlining what is desirable to have from tests and what approaches can be used to achieve ideally all of them.

Desirable Features of Tests

The primary objective of tests is to improve the confidence that the software tested is of good quality. There are various ways to achieve that and depending on how it is done different features can be achieved. Here are things I consider worth considering. This list is probably not complete and I guess I’ll remember an important feature milliseconds after pressing the “Create Topic” button. It could be a starting point, though.

Cmake Integration

It seems the direction for Beman projects is to build with cmake: whatever is used for testing should be able to integrate with cmake. That is

  • Tests are rebuild whenever the test or anything it depends on changes.
  • Tests are run as part of the build.
  • A test failure results in a build failure.

Testing Different Implementations

The components in Beman projects are supposed to live in a namespace like Beman::Something26. A future standard library implementation would likely live in namespace std or namespace std::something but certainly not in namespace Beman::Something26. If tests are implemented in terms of Beman::Something26 they’ll need to be rewritten on some level. This rewrite may be mostly mechanical but it may be easier to test projects using an indirection coming from a test configuration file:

// file: beman-test-config.hpp
namespace tn = Beman::Something26;

The actual tests would then use tn (or whatever name is deemed appropriate) to refer to the entities under test using explicit qualification or a using directive. Anybody wanting to test their implementation of the same interface/specification in a different namespace just provides a different configuration header. A project specific configuration header could declare similar aliases for packages it depends on and use corresponding qualification where necessary to also tests with different combinations of dependencies:

using on = Beman::Optional26;

For dependencies a similar approach may even be reasonable in the actual implementation: an implementation initially using Beman::Optional26 can then easily be migrated to use the standard component once it becomes available.

Assert Only

When testing an actual standard library it shouldn’t depend on, well, a standard library. Using different build configuration immediately cause grief if there is an object file as the build configuration may use different ABIs. Instead, a really basic test mechanism using, essentially, just <cassert>'s assert(condtion) as the basis is preferable. However, ideally even that can be replaced: reporting test failures may require more than just bailing out at a specific place and relying on whatever information the assert() macro provides or a debugger, e.g., when testing on a free-standing platform (see below). If there is a configuration, it could also provide some tools which are a bit more fancy (the code below may not be quite correct as I think assert(c) no support variadic parameters and that should be covered):

// file: beman-test-config.hpp
#if BEMAN_HAS_TEST_OVERRIDE
#    include "local-test-config.hpp"
#else
#    include <cassert>
#    define BEMAN_ASSERT(id, cond)  assert(cond)
#endif

Free-Standing

At least some parts of the standard C++ library will target free-standing environments: I want my library to also run on an ESP32. I currently don’t have much experience but it seems the work-flow for embedded systems often is:

  1. Connect the device using a USB cable to a computer, possibly holding down a [tiny] button on the device, and transfer the program to a drive appearing in the file system.
  2. Disconnect the device and connect it to a power source to run the program. Maybe it can communicate results back using USB, Wifi, or Bluetooth but it may also just blink a LED fast or slow to indicate success or failure.

Testing with such a process probably means that abort()ing on an assertion failure is really annoying and you may want to get information about multiple failures, at least. Also, running a program per test case is pretty much out of question: if there are many test cases you’d ideally run all of them. The implication is that it is probably desirable to build all tests into one executable and run all of them. On the other hand, on a hosted environment having many small programs may be easier to manage.

To support both uses case I could imagine some kind of test driver (lit?) which just overrides the name of the entry point appropriately and provides a main() calling into the entry point on a hosted environment. Maybe tests could even be written to be viable Google Tests without actually requiring the use of Google Tests.

Component-Level Coverage

A common approach for checking coverage is to run all tests of a project collecting coverage information, e.g., using gcov and then reporting the aggregated results. Doing so is certainly better than not considering coverage at all but this kind of coverage report includes “accidental” coverage: any component used while testing another components may receive additional coverage. However, the use of that code isn’t really necessarily tested at all! Thus, it is desirable to somehow declare which component is actually tested by a given test and only aggregate coverage of a components from running its associated tests.

An approach to achieve that is implementing components in component-specific files and considering tests in files matching these names to be tests of the component. A test driver may then need to carefully aggregate the coverage report. The result is hopefully higher confidence in the test results assuming there is suitable overall coverage achieved.

Standard I/O Channels

Tests are ideally just evaluating code and don’t depend on anything external. However, some component constantly writing something to standard output or standard error because some write(1, "I'm here\n", 9); statement wasn’t removed is still a problem (I’m sure, nobody would ever do that, of course!). Testing should ideally be able to verify a program’s side effects. In a previous life I used dejagnu to test my implementation of standard library components: it does allow full control over standard input, output, and error channels (I’m not recommending to use dejagnu - in the nearly 30 year since I did that we have [hopefully] moved on).

Shared Source of Test Infrastructure

Ideally, the bulk of the testing infrastructure isn’t duplicated in every project. Instead, there should be one central place with hopefully easy to follow instructions for using tests in a Beman (and possiby other) project. Simply using Google Test achieves some of the objectives but certainly not all.

What To Use?

After listing desirable features, I admit that I don’t have a ready solution addressing all I want. I think I know how most of these things can be implemented and it doesn’t seem hard. For past implementations of standard library facilities I did run my own testing infrastructure but I wouldn’t want to advertise any of these!

Should we create a Beman testing framework or is there something reasonably covering the desirable features we agree are desirable?

1 Like

Louis Dionne made this comment which described Lit with clang-verify which I believe he was advocating for at C++Now.

1 Like

Having standard workflows for testing specific projects is the important part to me. One should not need to peruse README files to understand how (or even if!) a specific project provides regression testing.

The specific test driver technology is less important to me. I could see googletest working well in some cases and something like lit working better for others, perhaps even in the same project.

More important is what volunteers we get to set up automation workflows to support testing. And what volunteers we get to write and maintain the tests themselves.

Note that cmake actually doesn’t give you any of this. Neither ctest or the test target that cmake generates in the build system depend on the builds. Workflow automation outside cmake needs to be set up.

The object file problem is why GoogleTest recommends vendoring it into your project, and discourages packaging it. Bloomberg does so, but we also control ABI strictly within a package distro, so we can get away with it.

Optional26 uses gtest because it’s what I’m most familiar with, it’s wired into my scratch template, and mostly worked. However, the problems of testing static_assert rendering of “Mandates” is a major problem for googletest, and probably for most testing frameworks. lit is probably the best in this space, but it’s very much unlike most xUnit style frameworks.

I’m not entirely convinced that testing standard library level components should be in terms of even more primitive components. It’s a philosophical argument I’ve been having with Lakos for the last couple decades, and unlikely to be resolved any time soon.

re: indirection for tests -
I’ve done it with macros, but that’s awful: Compiler Explorer near the top of the code. But it did make comparison a lot easier. Abusing #include to c/p tests might work, too? Now that compiler-explorer supports multi-file projects, too.

Last release of dejagnu was 3 years ago. LLVM has some more modern tools here, but the space is … underserved?

Considering that we’re going to want to build a matrix of compilers, sanitizers, debug/release mode flags, fuzzers(?) we probably will have to rebuild any linked infrastructure every time. So we’re either vendoring into the build tree via git submodule or subtree, or just a commit, or doing a FetchContent of some kind, which is still per build, but a bit more hidden. We won’t be able to rely on any binary packaging system.

1 Like

Attempting to port Beman.Optional64 to use lit would be a great initial project for a new contributor.

1 Like

Are there any time constraints on this? I’d like to start contributing, but I only have a limited amount of time I can spend on it.

I’m working on a writeup on Lit that concludes that its adoption probably isn’t ideal for us. I hope to have that done today and that could help inform what the next steps we’d take.

1 Like

It seems to be the policy of at least libc++ and libstdc++, so whatever its merits as a philosophical discussion, the reality is that the actual std::lib tests that might benefit from Beman tests do not (and will not) use gtest. That’s not likely to change soon either.

2 Likes

As someone who has used lit before, and wished it had better integration with CMake/CTest for familiarity and things like compile_commands.json integration, I was wondering if the main functionality could be implemented in a bit of plain CMake code with CTest as the test runner. This is the result: https://github.com/jiixyj/cmake-assert-tests. It supports .pass, .compile.pass, .compile.fail and .verify tests for now.

It supports both “merged” (with the code in src/) and “separated” (top-level tests/ folder) test placement. With a folder structure like this:

.
├── src
│   ├── CMakeLists.txt
│   ├── main.cpp
│   ├── testlib
│   │   ├── add.cpp
│   │   ├── add.h
│   │   ├── add.test
│   │   │   ├── add.pass.cpp
│   │   │   ├── constexpr_add.pass.cpp
│   │   │   ├── constructor.compile.fail.cpp
│   │   │   ├── constructor.compile.pass.cpp
│   │   │   └── constructor.verify.cpp
│   │   ├── fibonacci.h
│   │   └── fibonacci.test
│   │       └── fibonacci.pass.cpp
│   ├── testlib.h
│   └── testlib.test
│       └── include.compile.pass.cpp
└── tests
    ├── CMakeLists.txt
    └── testlib
        └── add
            └── add.pass.cpp

…you get those tests registered with CTest:

  Test #1: testlib:include.compile.pass
  Test #2: testlib.add:add.pass
  Test #3: testlib.add:constexpr_add.pass
  Test #4: testlib.add:constructor.compile.fail
  Test #5: testlib.add:constructor.compile.pass
  Test #6: testlib.add:constructor.verify
  Test #7: testlib.fibonacci:fibonacci.pass
  Test #8: testlib/add/add.pass

It’s just a PoC/experiment for now, but maybe something like this could fill the niche of “tests without a test framework” but without a “heavy” test runner. Of course, the danger is that it will over time morph into an ad hoc, informally-specified, bug-ridden, slow implementation of half of lit…

I am unsure if that is the best idea. Builds and rebuilds should be fast. We probably do not want to pay for the unit test run every time. Also, if some test is already failing (because of another bug), it might be hard to prove that our unrelated change compiles fine.

What we typically do is incorporate a test run into the CI runs and also while building a project or creating a package with package managers. See the following for implementation details:
mp-units/conanfile.py at ed9e67537bbfa5395975e6c9403c92eb7429edd0 · mpusz/mp-units · GitHub.

With the above, conan create and conan build will run unit tests and header standalone tests right after the CMake build. But when we rebuild the project in the IDE as a result of incremental feature development, we do not want to pay for tests to run every single time. Most IDEs allow running popular test frameworks as a separate step inside their GUI.

Of course, we can also run CTest to run all of the tests from the command line.

As I already mentioned header standalone test, I would strongly recommend those to Beman projects as well. More can be found at CMake documentation.

A static_assert() is a perfect test framework for compile-time-enabled libraries. If the project compiles, it is correct :slight_smile:

I have thousands of static_asserts in my unit test files in mp-units. With them, I can test for observable runtime behavior and verify if the library generates proper types at compile time. I also have an undefined behavior verification for free.

Another thing to test in a modern library is negative tests for constraints. We need to throw some types that do not satisfy concepts and ensure that the type under test does not compile. It is trivial to do with static_assert as well.

Using alternative runtime frameworks for such libraries may not catch all the issues. For example, recently, we had to remove inplace_vector from the plenary session because adding constexpr to some functions made the feature unimplementable. This would not be tested with GTest.

Of course, this does not mean that we do not need runtime unit testing at all. Some libraries will not be fully compile-time enabled. Even for mp-units, I have to test std::format at runtime as we still can’t format a text at compile time. Also, some math functions are not constexpr until C++26.

1 Like

Hello @dietmarkuehl! Probably I am late to the party, but speaking of testing frameworks for C++20 and later, I would also consider boost-ext-ut that has CMake support it is header-only, easy to install, it is macro free and supports various testing approaches TDD, BDD etc. If you want you can adapt it to any well known testing framework using macros. This option needs further evaluation since, I am not entirely sure if it fullfils the “embeded” requirements. On the other hand, is any Beman libary or any pending implementation of a C++ proposal/paper that has to be implemented and tested on an embeded device first?

1 Like

I like this option – we’ve adopted it at my current day job and it works great. For simple things it’s simple – if you want more complex things it also supports. No macro testing is also much easier to understand.

Things that are marked as // freestanding or in a header marked similarly are targeted to support embedded platforms. But nothing here will be tested for those platforms first. And yes, several libraries here will be marked for freestanding support.

CMake comes with a sibling tool called CTest. While CTest is usually just used to launch the tests that are defined with add_test in a CMake project, it can also be used script the steps of a build pipeline and has support for coverage and dynamic analysis.

With CTest, it is possible to define a build pipeline that can build and test any CMake project. With CMake, it is possible to define a project in such a way, that it can be built and tested with any CTest pipeline. This is a powerful combination for defining a build matrix where a set of projects is built with a set of pipelines (example).

In CMake/CTest, unlike in other build systems, build and test are two separate steps. This has has disadvantages but also advantages. What the advantages and disadvantages are, is a separate topic. Just note that in CMake, they are separate.

There sure are people naively advocationg ways to bend CMake to their will. They recommend defining a target called check or even avoiding add_test entirely and defining all tests as custom targets. Don’t fall into this trap. The bait is a a little convenience for yourself (“neat, I just have to type one single command”), but the negative consequences are confused clients with a standard cmake background (“ugh, this project has no tests at all”) and worst of all, it breaks the integration for sanitizers and coverage analysis.

1 Like

@purpleKarrot I’m in agreement that the real requirement is a standard workflow across Beman libraries that includes using best practices by wiring up CTest properly.

The Beman Standard, under [TOPLEVEL.CMAKE] does require some specifics. Specifically, " here must be a CMakeLists.txt file at the repository’s root that builds and tests (via. CTest) the library."

Does that work for you? If not, do you care to contribute an issue, Discourse topic, or PR to fill in more detail?

As to specific frameworks, I’m of the mind that you don’t really need any framework in many simple cases. Just have a directory structure that provides a pile of compiler explorer style examples wired up to CTest. Possibly we should add something like that to exemplar to demonstrate? It can even include good enough compilation failure tests.

For more complex cases, I’m flexible, but I’m of the mind that we’ll want some huge upside to displacing GoogleTest and I haven’t heard any major upside so far. It has warts and might be showing its age, but it’s by far the most familiar test framework, it’s available in all dependency management scenarios already [1]. But the real killer feature is that there really isn’t a strong second-best option for googlemock. I do wonder why not, but that’s the way it is. A lot of the competitors to googletest proper are better in different ways, but the second you’re ready for a stub, fake, mock, or something like that, you’re pulling in googletest and googlemock anyway. So the question to answer to displace googletest is “if I’m pulling in googlemock anyway, why wouldn’t I just use googletest?”.

So someone get out there and invent a better mocking framework. Googletest will have its days numbered when that’s a reality, I expect.

[1] No, being header-only doesn’t cut it. It must be a package in vcpkg, Conan, debian, etc. Copying headers around isn’t really dependency management outside of specific casual or niche scenarios. Let me know if we want to fork a whole thread to elaborate on that.

We just started working on scope library and I’m going to suggest using boost.ut. It requires c++20, but that’s fine. It doesn’t require macros, which is nice. If you scroll to the bottom there’s a bunch of benchmarks, against GoogleTest and it fairs well. It doesn’t have mocking, but I barely ever want that. Note that it can emulate catch2 and other test framework styles.

1 Like

My main concern would be eliminating packageability for any given project. The main issue to watch for would be projects that are not packaged or not packaged consistently. It looks like there are relatively recent boost.ext.ut packages:

There’s a small concern that the package names aren’t consistent, but that’s probably something that’s possible to work around.

Beyond that and having testing wired up with a consistent workflow (i.e., CTest), I don’t mind giving individual libraries leeway to select different testing approaches.

Thats perfect. It does not need more language than that in the standard.

However, in guidelines / exemplar, I would mention the following:

  1. Use a Testing Framework (any Testing Framework)

    It is recommended to split the unit tests across many different small source files. While it is certainly possible to give each source file its own main function and building it to its own binary executable, building multiple source files to a single executable can save a considerable amount of time and memory, especially when building static executables with large libraries and/or a large number of test files.

    One approach to create a test driver for multiple small tests is using the CMake command create_test_sourcelist.

    Another, and maybe more straight forward approach is to use a unit testing framework.

  2. … even for things that can be tested at compile time!

    Just using static_assert is tempting. But the output of unit test frameworks is often more describing. Consider:

    main.cpp:12:42: error: static assertion failed
       static_assert(std::alignment_of_v<Foo> == std::alignment_of_v<Bar>);
                     ~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~
    

    vs.

    Expected equality of these values:
      std::alignment_of_v<Foo>
        Which is: 4
      std::alignment_of_v<Bar>
        Which is: 8
    

    At the time of this writing, I just learned that both gcc and clang now produce a note like “expression evaluates to ‘4 == 8’”. This looks rather new to me.
    This guideline could be dropped when all tested compilers report this info.

  3. Fine grained registration of unit tests

    When multiple unit tests are built into a single binary executable, they should still be all registered to ctest individually, so that ctest can schedule them to run in parallel. For GoogleTest, this can be automated with gtest_discover_tests.
    For other testing frameworks, an analogous approach should be used.

These are excellent guidelines. I added this to our Beman Contributors’ Guide task.

I’m unclear on what create_test_sourcelist brings to the table. The docs describe how it generates a main file for you, but I expect working indirectly through a parameterized template would be less compelling than either hand rolling a main file or using the equivalent feature from a test framework.

But I do like the other suggestions from @purpleKarrot.

I would add the need for compilation failure tests. The latest versions of the big three support a standard called SARIF to communicate warnings and errors in a structured JSON format. If someone is excited about CMake, testing, etc., a Beman infrastructure project to allow specifying fine grained C++ source files that should compile, warn, or error would be very interesting.

Note that Louis Dionne proposed the LLVM LIT framework for these purposes. Initially we were concerned about dependency management. Probably we still are, but that was before we adopted precommit, etc., so we could revisit that idea.

Otherwise, I don’t believe it would be hard to use pure CMake to design a directory structure to organize sample C++ source files that build and run (or don’t) as expected.