Coroutines in C++ Insights

A longer time ago #92 was opened, requesting to support coroutines in C++ Insights. In the meantime, the coroutines TS got merged into what will be C++20. Clang 9 is available now having coroutines support enabled with -std=c++2a. It looks like it is time to do something about it. So, let's do something about it.

Coroutine Resources

As time has gone by, I learned more things about coroutines and finally Adi Shavit asked me at NDC {TechTown} for some code snippet which would illustrated how we can create dangling references with coroutines. An issue that was at least raised by Arthur O’Dwyer in his blog post C++2a Coroutines and dangling references. This gave me motivation to take another step to implement the transformation in C++ Insights. Because now I have an idea what may be interesting to people when it comes to coroutines.

As resources, I used a CppCon 2016: Gor Nishanov “C++ Coroutines: Under the covers" by Gor Nishanov, one of if not the main driver behind coroutines. There he explains in detail how coroutines look internally.

Looking at another blog post by Lewiss Baker C++ Coroutines: Understanding the promise type he summarizes the steps to create a coroutine like this:

  1. Allocate a coroutine frame using operator new (optional).
  2. Copy any function parameters to the coroutine frame.
  3. Call the constructor for the promise object of type, P.
  4. Call the promise.get_return_object() method to obtain the result to return to the caller when the coroutine first suspends. Save the result as a local variable.
  5. Call the promise.initial_suspend() method and co_await the result.
  6. When the co_await promise.initial_suspend() expression resumes (either immediately or asynchronously), then the coroutine starts executing the coroutine body statements that you wrote.

Plus there is the latest C++ Standard N4830 which specifies coroutines. So, enough resources, let's get started.

Clang's implementation

The first step is to show the coroutine as it is. Here things looked easy at the first glance. Clang comes with a couple of new statements:

  • CoroutineBodyStmt is created by the compiler, whenever it finds a co_... statement in a functions body. It is the root of any other coroutine statement.
  • CoroutineSuspendExpr abstracts both co_yield and co_await. There are also two additional expressions CoawaitExpr and CoyieldExpr. But CoroutineSuspendExpr is the base class and those sufficient for now.
  • CoreturnStmt is created whenever there is a co_return statement.

With this three expressions I can work. Typically, it goes as follows:

  • Add a new overload for InsertArgfor the statement.
  • For CoroutineSuspendExpr check there it is a yield or await and reenter the corresponding keyword plus pass the expression attached to it to InsertArg to fill it. For example: co_yield i + 1; Here after inserting co_yield the expression is passed to InsertArg which does the rest of the job. The same goes for CoreturnStmt.

Handling the CoroutineBodyStmt comes with opening a scope and inserting the data there. Done... or not.

First observation, the expressions attached to the CoroutineSuspendExpr give something like __promise...(i+1). It reveals already parts of the internal implementation. All right, could be done by looking ahead into the children of the expression and filter some parts out. Then, we have identical code. Excellent.

But wait, does this help in some way? No. It does not show any problems with references.

Do a transformation showing the internals of a coroutine

All right, lets have a deeper look. There are more nodes attached to a CoroutineBodyStmt. There is for example a promise declaration, some functions called:

  • getParamMoves
  • getAllocate
  • getReturnStmtOnAllocFailure
  • getResultDecl
  • getExceptionHandler
  • getInitSuspendStmt

That looks helpful. Together with the post from Lewiss Baker and the video from Gor Nishanov it looks like I just need to insert these result of these functions via InsertArg in the right place and I'm done. No, that looks horrible. How is this coroutine gone suspend and resume? And what are these getParamMoves. As Gor and others explain, one approach can be to split a coroutine internally into two functions.

One that has the same name and signature as the one written by a user. This is something like a setup function. It allocates the coroutine frame, requests the return object and then calls the coroutine to the first point, the initial suspend part. This is where the second, newly, created function comes into place. It has some unknown name to the user and contains the coroutine body. The signature can look like this:

1
void __FuncNameStateMachine(COROUTINE_FRAME_TYPE* __f);

Here FuncName is the name of the original function.

There is an allocation function, but it returns void*. Plus where do all the parameters you pass to the original function go? They are moved to the coroutine frame. Ah well, that's where getParamMoves comes in. But wait! There is no structure or anything these parameters refer to. At least not in the AST.

First obstacle

That is bad, because it means I have to make something up! For now, let's do it. The struct shall be named struct __FuncNameFrame where FuncName again is the original function name. Then, I needed to create all the variables with names in to. In addition, the promise type needs to go there as well to survive between suspends.

Second obstacle

To test the transformation I used to following code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
generator seq(const int& start) {
  for (int i = start;; ++i) {
    co_yield i+1;
  }
}

int main() {
  auto s = seq(3);

  for(auto&& i : s ) {}
}

The implementation of generator is not relevant at the moment. And yes, it is not the most sense full code, but it is sufficient to see a transformation and some parts of the AST.

As the initial goal was to see what happens with references to temporaries I chose this version. The int is just to avoid bloating the AST with for example what std::string drags in.

Using this example as a base will give a coroutine frame like this:

1
2
3
4
5
struct __seqFrame
{
  std::experimental::__coroutine_traits_sfinae<generator, void>::promise_type __promise;
  const int & start;
};

The first member is the promise type. The second comes from the parameter. Remember, I created this struct by hand. Just using the types and names provided by the promise type and the result of getParamMoves. Do you immediately spot the problem? It is hard to assign a value to the member const int & start as it is const. Ok, one solution is to also make up a constructor. Sounds solvable. Still keep in mind that I'm drifting away from what the compiler does. It is hand crafted.

Let's pause here for a moment and look at parts of the rest of the coroutine. Specifically the for-loop. There is the variable i and the suspend point. To preserve the value of i between suspensions that variable also needs to be place in the coroutine frame.

Oh boy, that implies to the declaration of the variable can no longer be in the for-loop. And another oh dear, each access to i needs to be redirected to the variable in the frame. Considering the function signature from above void __FuncNameStateMachine(COROUTINE_FRAME_TYPE* __f); every i becomes __f->i. Totally made up code by me. Way away from the AST.

Fine, let's life with it for now and be not that precise about construction of i, just say it is fine that the head of the for-loop looks like this:

1
for( __f->i = __f->start; ; ++__f->i)

I redirected the access to i as well as the one to start. What can happen in the body of the for-loop?

More obstacles

Consider the following modified version of the coroutine (yes the struct there is senseless in this context, just think of something where it is useful):

1
2
3
4
5
6
7
8
9
generator seq(const int& start) {
  for (int i = start;; ++i) {
    struct S { int t; char c; };

    S s;

    co_yield i;
  }
}

Look at struct S. This is introduced within the body of the for-loop and within the body of a coroutine. It has to be placed int the coroutine frame. To make the frame definition available in both functions (the first and the made up one), it is declared before the coroutine. Do you already see the problem? The type S is not known outside of seq or more precise outside of the for-loop inside seq. One option is, to collect all record definitions in the coroutine body and move them into the coroutine frame. That makes them more visible as they in reality are. Once again I'm in my own land, as this is not what the AST shows. However, that way this code part would compile. That is a pity because either way it is somewhat wrong. To have compiling code in C++ Insights I chose this approach. This makes to resulting transformation look like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
struct __seqFrame
{
  std::experimental::__coroutine_traits_sfinae<generator<int>, void>::promise_type __promise;
  int suspend_index;
  struct S
  {
    int t;
    char c;
    // inline S() noexcept = default;
    // inline constexpr S(const S &) = default;
    // inline constexpr S(S &&) = default;
  };

  const int & start;
  int i;
  S s;
};

With this comes the fact, that to get code that compiles, all access to the type S within the coroutine body new need to be prefixed with the namespace of the frame. In this case __seqFrame::.

Going back to how the variables are constructed in the coroutine frame, lets have the constructor solution in mind and take a closer look at S s; in the for-loops body. Including s in the constructor of __seqFrame would be wrong. Technically, it is constructed and destructed during each iteration. It may not make a difference in this simple example, but I will be in an appropriate one.

Next, what if S s; instead would be const S s;? Say it also takes a parameter which comes from the for-loop. Well, then of course it cannot be initialized after the constructor of struct frame has run. But initializing it outside of the for-loop would be wrong, as this is not the order which takes place. A dead end for now.

There is more

Whenever void __seqStateMachine(__seqFrame* __f); is called, it needs to now where to resume. For that, one approach is to work with labels and goto. This then requires a switch at the beginning of the function to jump to the appropriate label. This is the reason why in the coroutine frame above you can see a member int suspend_index;. This is to store the resume point. Then, each suspend expression needs to create the label and set the index appropriately. The for-loop will look like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
for( __f->i = __f->start; ; ++__f->i)
{
  if(not __f->__promise.yield_value(__f->i).await_ready())
  {
    __f->__promise.yield_value(__f->i).await_suspend(
            std::experimental::coroutine_handle<void>(
                std::experimental::coroutine_handle<generator::promise_type>::from_address(
                    __builtin_coro_frame())));
    __f->suspend_index = 2;
    return;
  }

  __resume_seq_2:
  __f->__promise.yield_value(__f->i).await_resume();
}

Once again, this is hand-crafted code. With parts from the AST, but mostly hand-crafted.

Gor pointed something out to me, I wasn't aware. Clang has a OpaqueValueExpr. It looks like a way of saying, hey this expression here appears multiple times. Make a temporary, store the result and refer to that temporary. This saves subsequent calls. In the code above you can see such a pattern with __promise.yield_value(__f->i). It appears three times. A better version is, to add an element of this type to __seqFrame, initialize it and refer to it subsequently. This changes to code above to something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
for( __f->i = __f->start; ; ++__f->i)
{
  __f->__promise_N_X = __f->__promise.yield_value(__f->i);

  if(not __f->__promise_N_X.await_ready())
  {
    __f->__promise_N_X.await_suspend(
            std::experimental::coroutine_handle<void>(
                std::experimental::coroutine_handle<generator::promise_type>::from_address(
                    __builtin_coro_frame())));
    __f->suspend_index = 2;
    return;
  }

  __resume_seq_2:
  __f->__promise_N_X.await_resume();
}

Probably a bit more correct, but it comes with more handcrafting. Thank you Gor for that tip.

After I finished that, I looked around a little what might be possible and stumbled over this:

1
2
3
4
5
generator seq(const int& start) {
  for (int i = start;; ++i) {
    (void)(co_yield i);
  }
}

You can cast a co_yield expression to void with a C-style cast. Isn't C++ wonderful? What does it mean to my transformation? Sadly, as I have to make up the if(not __f...) part this entire part lands in a static_cast<void> make it look a bit like this:

1
static_cast<void>(if(not __f->__promise_N_X.await_ready()) ... )

Not really code that would compile. Another filter is required to suppress the static_cast here. I bet there is more such oddity lying around.

There is even more

So far, I only talked about co_yield and co_wait but co_return has its additional obstacles. For example, a co_return statement can contain a co_await. If so, it needs to go before the return. This means, that there is some kind of forward-looking involved.

Why is it so hard?

Probably because in the implementation of Clang the heavy lifting is done in the back end. The front end, which C++ Insights uses, does only add some kind of annotation. Basically the code after the transformation as I described it, is more or less what the back end does. But it does more. It can do optimizations. It has the power to construct even the const variables correctly and so on. I think that the Clang implementation is great. However, sadly it is impossible to peak behind it in a stable way as it is possible with other features.

Library support

Aside from all the issues doing the transformation, there is something else. It looks to me, that as of now, only libc++ implemented the coroutines header in experimental. The website of C++ Insights uses libstdc++ as library. May it as it be, this problem is solvable and it comes with a nice side effect. I will add an option to the website for selecting libc++ instead of libstdc++ for the transformation. If the coroutine support is selected, for now that will enable using libc++. The nice side effect is, that you can see a couple of implementation differences. In case, you use libc++ for your project you can now get the matching insights.

What should a transformation show?

In general, I'm not sure, please tell me what you like to see. In Clang most of the lifting is done in the back end. That makes it hard to do the transformation and is the reason why there are so many obstacles.

One thing that a transformation could do, is to visualize the life-time of objects. Something like dangling references as Arthur O’Dwyer pointed out in his blog post C++2a Coroutines and dangling references.

For a full picture, the current implementation transform this code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
generator seq(const int& start) {
  for (int i = start;; ++i) {
    co_yield i+1;
  }
}

int main() {
  auto s = seq(3);

  for(auto&& i : s ) {}
}

into this:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
struct __seqFrame
{
    std::experimental::__coroutine_traits_sfinae<generator, void>::promise_type __promise;
    int                                                                         suspend_index;
    void*                                                                       instruction_pointer;
    stdx::suspend_always                                                        __promise_3_11;
    const int&                                                                  start;
    int                                                                         i;
    stdx::suspend_always                                                        __promise_5_5;
    stdx::suspend_always                                                        __promise_3_11_1;
};

generator seq(const int& start)
{
    __seqFrame* __f = reinterpret_cast<__seqFrame*>(operator new(__builtin_coro_size(), std::nothrow));

    if(nullptr == __f) {
        return generator::promise_type::get_return_object_on_allocation_failure();
    }

    __f->suspend_index = 0;
    __f->start         = std::forward<decltype(start)>(start);

    new(&__f->__promise) std::experimental::__coroutine_traits_sfinae<generator, void>::promise_type{};

    generator __coro_gro = __f->__promise.get_return_object() /* NRVO variable */;

    void __seqStateMachine(__seqFrame*);
    __seqStateMachine(__f);

    return __coro_gro;
}

void __seqStateMachine(__seqFrame* __f)
{
    try {
        switch(__f->suspend_index) {
            case 1: goto __resume_seq_1;
            case 2: goto __resume_seq_2;
            case 3: goto __resume_seq_3;
        }

        __f->__promise_3_11 = __f->__promise.initial_suspend();
        if(not __f->__promise_3_11.await_ready()) {
            __f->__promise_3_11.await_suspend(std::experimental::coroutine_handle<void>(
                std::experimental::coroutine_handle<generator::promise_type>::from_address(__builtin_coro_frame())));
            __f->suspend_index = 1;
            return;
        }

    __resume_seq_1:
        __f->__promise_3_11.await_resume();

        for(__f->i = __f->start;; ++__f->i) {

            __f->__promise_5_5 = __f->__promise.yield_value(__f->i + 1);
            if(not __f->__promise_5_5.await_ready()) {
                __f->__promise_5_5.await_suspend(std::experimental::coroutine_handle<void>(
                    std::experimental::coroutine_handle<generator::promise_type>::from_address(
                        __builtin_coro_frame())));
                __f->suspend_index = 2;
                return;
            }

        __resume_seq_2:
            __f->__promise_5_5.await_resume();
            ;
        }

        goto __final_suspend;

    } catch(...) {
        __f->__promise.unhandled_exception();
    }

__final_suspend:

    __f->__promise_3_11_1 = __f->__promise.final_suspend();
    if(not __f->__promise_3_11_1.await_ready()) {
        __f->__promise_3_11_1.await_suspend(std::experimental::coroutine_handle<void>(
            std::experimental::coroutine_handle<generator::promise_type>::from_address(__builtin_coro_frame())));
        __f->suspend_index = 3;
        return;
    }

__resume_seq_3:
    __f->__promise_3_11_1.await_resume();
}

int main()
{
    generator s = seq(3);
    {
        generator&          __range1 = s;
        generator::iterator __begin1 = __range1.begin();
        generator::iterator __end1   = __range1.end();
        for(; __begin1.operator!=(__end1); __begin1.operator++()) {
            const int& i = __begin1.operator*();
        }
    }
}

Conclusion

All these obstacles are the reason why I decided to hide coroutine transformations by default. My current plan is, that a user can active them with the switch show-coroutine-transformation, being aware that it is a questionable transformation. Is this a good idea?

If you happen to have more knowledge about this topic or strong feeling about how it should be, please let me know. You can also participate in the Twitter poll here.

I will release support for coroutines after Clang 9 is available for the platforms C++ Insights runs on (currently waiting for macOS). First it the binary will be released and after that, I will add the support for the website and with that bringing support for libc++.

Andreas