C++ unspecified behavior explained by practical example
In today's post, I like to talk about unspecified behavior in C++. While there are other sorts of behavior in our language, I'll stick with the one today and may cover the others some other time.
C++ reference lists unspecified behavior as
the behavior of the program varies between implementations, and the conforming implementation is not required to document the effects of each behavior.
I like to use a practical example that I've seen this done wrong in various code reviews or during my training classes: comparing pointers for ordering.
In the standard you can find two paragraphs relevant to such a comparison, [expr.eq] and [expr.rel]. The first deals with equal and not equal comparison. You're allowed to do that, assuming you have two compatible pointers. When looking at ordering comparisons, things get more interesting, which is what [expr.rel] deals with. Here are the two relevant paragraphs:
The result of comparing unequal pointers to objects is defined in terms of a partial order consistent with the following rules:
(4.1) If two pointers point to different elements of the same array, or to subobjects thereof, the pointer to the element with the higher subscript is required to compare greater.
(4.2) If two pointers point to different non-static data members of the same object, or to subobjects of such members, recursively, the pointer to the later declared member is required to compare greater provided neither member is a subobject of zero size and their class is not a union.
(4.3) Otherwise, neither pointer is required to compare greater than the other.
(5) If two operands p and q compare equal ([expr.eq]), p<=q and p>=q both yield true and p
q both yield false. Otherwise, if a pointer to object p compares greater than a pointer q, p>=q, p>q, q<=p, and q<p all yield true and p<=q, p<q, q>=p, and q>p all yield false. Otherwise, the result of each of the operators is unspecified.
Let's dive into the wording § 4.1 is very important. This paragraph tells us that comparing two pointers that point to elements of the same array is an allowed operation. They further say that the higher subscript compares greater. In code:
1 2 3 4 5 6 |
|
The next section, § 4.2, talks about class data members. Since we and the compiler know the layout, pointers to data members are comparable, and the later declared member is the greater one. Once again, in code:
1 2 3 4 5 6 7 8 9 10 11 |
|
Very interesting is then § 4.3, which says that in any other case, neither pointer is required to compare greater. I will show you what that means in a moment. Let's first see in code how such a scenario would look like:
1 2 3 4 5 6 7 |
|
With § 4.3, it is unclear which of the two pointers, pAlice
and pBob
, is the greater one. This is when § 5 comes into play. This section talks about when you get true
and false
but only for a supported comparison. Since the one above with alice
and bob
is not supported, no pointer is required to compare greater. The last sentence of § 5 is the crucial one: Otherwise, the result of each of the operators is unspecified.
See, there is the unspecified.
As a summary, at this point, comparing pointers for ordering which are not in range of the same array or class is unspecified behavior.
Now, why is that? Have a look at this code snippet:
1 2 3 4 5 6 7 8 9 10 11 |
|
In the first part of this snippet, you see the stack variables fred
and amy
with their pointer versions stackFred
and stackAmy
. In the second part, you will find the heap version heapFred
and heapAmy.
In this code, stackAmy
compares greater to stackFred
, but heapFred
compares greater to heapAmy
, at least on Linux.
Why is that? Usually, the stack grows from a smaller to a larger address while the heap is the other way around.
With C++, you look directly into the hardware implementation with no abstraction, remember? Managed languages hide this detail since you don't get a pointer in the first place.
I guess it makes total sense now that a comparison like stackAmy
with heapFred
makes absolutely no sense.
Here is the key takeaway from today: As a rule of thumb, comparing two pointers for equality is a safe and well-defined operation, comparing them for ordering is only safe if the two pointers come from the same object (array and class/struct). Since, in most cases, when receiving the pointers via function parameters, we can't tell anymore, try to stick with equality comparisons.
Andreas