Copy elision is a C++ compiler optimization that, as its name suggests, eliminates extra copy and move operations. It is similar to the classical copy propagation optimization, but specifically performed on C++ objects that may have non-trivial copy and move constructors. In this post, I’ll walk through an example where an obvious optimization you might expect from your compiler doesn’t actually happen in practice.
Let’s say that you have a long function call that returns an object, and you want to immediately pass that object to another function, like this:
#include <string>
#include <string_view>
// Some type that is expensive to copy, non-trivial to destroy, and cheap but
// not free to move.
struct Widget {
std::string s;
};
void consume(Widget w);
Widget doSomeVeryComplicatedThingWithSeveralArguments(
int arg1, std::string_view arg2);
void someFunction() {
consume(doSomeVeryComplicatedThingWithSeveralArguments(123, "hello"));
}
As we can see from the generated assembly, all is well:
someFunction(): # @someFunction()
pushq %rbx
subq $32, %rsp
movq %rsp, %rbx
movl $5, %edx
movl $.L.str, %ecx
movq %rbx, %rdi
movl $123, %esi
callq doSomeVeryComplicatedThingWithSeveralArguments(int, std::basic_string_view<char, std::char_traits<char> >)
movq %rbx, %rdi
callq consume(Widget)
movq (%rsp), %rdi
leaq 16(%rsp), %rax
cmpq %rax, %rdi
je .LBB0_2
callq operator delete(void*)
.LBB0_2:
addq $32, %rsp
popq %rbx
retq
.L.str:
.asciz "hello"
Our temporary Widget
returned from
doSomeVeryComplicatedThingWithSeveralArguments
is constructed in the
stack space that someFunction
allocated for it, and then a pointer
to that stack space is passed straight to consume
, as we should
expect from learning about parameter
passing previously.
Now, imagine that you decide that your single line in someFunction
is too long, or that you want to give a meaningful name to the result of
doSomeVeryComplicatedThingWithSeveralArguments
, so you change the code:
void someFunctionV2() {
auto complicatedThingResult =
doSomeVeryComplicatedThingWithSeveralArguments(123, "hello");
consume(complicatedThingResult);
}
Naturally, things go straight off the rails:
someFunctionV2(): # @someFunctionV2()
pushq %r15
pushq %r14
pushq %r12
pushq %rbx
subq $72, %rsp
leaq 40(%rsp), %rdi
movl $5, %edx
movl $.L.str, %ecx
movl $123, %esi
callq doSomeVeryComplicatedThingWithSeveralArguments(int, std::basic_string_view<char, std::char_traits<char> >)
leaq 24(%rsp), %r12
movq %r12, 8(%rsp)
movq 40(%rsp), %r14
movq 48(%rsp), %rbx
movq %r12, %r15
cmpq $16, %rbx
jb .LBB1_4
testq %rbx, %rbx
js .LBB1_13
movq %rbx, %rdi
incq %rdi
js .LBB1_14
callq operator new(unsigned long)
movq %rax, %r15
movq %rax, 8(%rsp)
movq %rbx, 24(%rsp)
.LBB1_4:
testq %rbx, %rbx
je .LBB1_8
cmpq $1, %rbx
jne .LBB1_7
movb (%r14), %al
movb %al, (%r15)
jmp .LBB1_8
.LBB1_7:
movq %r15, %rdi
movq %r14, %rsi
movq %rbx, %rdx
callq memcpy
.LBB1_8:
movq %rbx, 16(%rsp)
movb $0, (%r15,%rbx)
leaq 8(%rsp), %rdi
callq consume(Widget)
movq 8(%rsp), %rdi
cmpq %r12, %rdi
je .LBB1_10
callq operator delete(void*)
.LBB1_10:
movq 40(%rsp), %rdi
leaq 56(%rsp), %rax
cmpq %rax, %rdi
je .LBB1_12
callq operator delete(void*)
.LBB1_12:
addq $72, %rsp
popq %rbx
popq %r12
popq %r14
popq %r15
retq
.LBB1_13:
movl $.L.str.2, %edi
callq std::__throw_length_error(char const*)
.LBB1_14:
callq std::__throw_bad_alloc()
.L.str:
.asciz "hello"
.L.str.2:
.asciz "basic_string::_M_create"
Now we take our perfectly good Widget
, complicatedThingResult
, and
copy it into a new temporary Widget
to serve as the first argument
to consume
. When we’re done, we have to destroy two Widgets
:
both complicatedThingResult
and the unnamed temporary Widget
we
passed to consume
. You might expect that the compiler would optimize
someFunctionV2()
to be just like someFunction
, but it won’t.
The problem, of course, is that we forgot to std::move
complicatedThingResult
:
void someFunctionV3() {
auto complicatedThingResult =
doSomeVeryComplicatedThingWithSeveralArguments(123, "hello");
consume(std::move(complicatedThingResult));
}
and now, the generated assembly should looks just like our original example… wait, what?
someFunctionV3(): # @someFunctionV3()
pushq %r14
pushq %rbx
subq $72, %rsp
leaq 8(%rsp), %rdi
movl $5, %edx
movl $.L.str, %ecx
movl $123, %esi
callq doSomeVeryComplicatedThingWithSeveralArguments(int, std::basic_string_view<char, std::char_traits<char> >)
leaq 56(%rsp), %r14
movq %r14, 40(%rsp)
movq 8(%rsp), %rax
leaq 24(%rsp), %rbx
cmpq %rbx, %rax
je .LBB1_1
movq %rax, 40(%rsp)
movq 24(%rsp), %rax
movq %rax, 56(%rsp)
jmp .LBB1_3
.LBB1_1:
movups (%rax), %xmm0
movups %xmm0, (%r14)
.LBB1_3:
movq 16(%rsp), %rax
movq %rax, 48(%rsp)
movq %rbx, 8(%rsp)
movq $0, 16(%rsp)
movb $0, 24(%rsp)
leaq 40(%rsp), %rdi
callq consume(Widget)
movq 40(%rsp), %rdi
cmpq %r14, %rdi
je .LBB1_5
callq operator delete(void*)
.LBB1_5:
movq 8(%rsp), %rdi
cmpq %rbx, %rdi
je .LBB1_7
callq operator delete(void*)
.LBB1_7:
addq $72, %rsp
popq %rbx
popq %r14
retq
.L.str:
.asciz "hello"
We still have two Widget
s, it’s just that the temporary argument
to consume
is move constructed now. Our first version of
someFunction
is still smaller and faster!
The fundamental problem with copy elision is that it is only allowed in a specific list of circumstances. (Briefly, RVO and initializing from a prvalue are required, NRVO is allowed, and some other cases with exceptions and coroutines are also allowed. Nothing else.) There is a philosophical reason for this: you wrote a copy constructor for your class that could do anything, and you expect it to run whenever objects of your class are copied according to the rules of C++. If compilers were to unpredictably remove copies, and thus remove pairs of copy/move constructor & destructor calls, they might break your code.
Specifically, there is simply nothing on the list of allowed circumstances for copy elision that applies to the examples we saw here. That list doesn’t include things like “the last time I use a variable before it goes out of scope” or “passing a variable to a function by value when I haven’t done anything else with it and it looks obviously safe”. Maybe it will in the future, but not in C++20 or before!