Scott Wolchok

C++ Performance Trap #2: Unnecessary std::function

Posted at — Jan 14, 2021

std::function does what it says on the tin: it holds any callable object. For example, you can use lambdas with or without captures. It has a couple key drawbacks:

Function pointers, on the other hand, are much more limited: they can only point to free-standing functions (including lambdas with no captures!), and cannot store any additional state. Their key advantage compared to std::function is that they are much smaller (8 bytes, like other pointers). If you have a choice between the two, function pointers are always going to be more efficient.

I want to clear up one possible misconception: the type of a lambda object is not std::function! Instead, it has “unique unnamed non-union non-aggregate class type, known as closure type". The upshot of this is that creating a std::function from a lambda is not free: std::function has a non-trivial constructor and destructor. For example:

#include <cstdio>
#include <functional>

// Use noinline to prevent the example from being optimized away.
__attribute__((noinline))
void callFunction(std::function<void()> f) {
    f();
}

void executePrint(const char *s) {
  callFunction([s]() {
    std::puts(s);
  });
}

has constructor and destructor calls for the intermediate std::function.

In contrast, if we template over the function type, like this:

#include <cstdio>
#include <functional>

template<typename Func>
// Use noinline to prevent the example from being optimized away.
__attribute__((noinline))
void callFunction(Func f) {
    f();
}

void executePrint(const char *s) {
  callFunction([s]() {
    std::printf("hello %s\n", s);
  });
}

the whole thing is just a simple call to printf (and a tail call to callFunction since we marked it noinline).

Of course, std::function serves a purpose, and we can’t just always use the template technique. For example, if we wanted to store the function for later use as a callback, std::function might be our most reasonable choice. We might also want to avoid emitting specialized versions of callFunction for each function it’s passed if it’s too large to inline.