Friday, April 30, 2021

Taking a first stab at C++20 coroutines

Philosophers' Syndrome: mistaking a failure of the imagination for an insight into necessity.

--Daniel C. Dennett, Consciousness Explained

There is no doubt that the C++20 coroutine framework gives the impression of being a rather complex piece of machinery.

If you are just starting out with C++20 coroutines, I would suggest that you read this blog together with a number of other blogs targeting C++20 coroutines, as well as the specification and the C++ online cpp reference pages.

Here are a few references to articles, papers and blogs that I have found useful:

Before getting started, I apologies for that the C++ syntax highlighting no longer seems to be working in this blog. Tracking down the problem is too time consuming right now and I will try to fix it later.

Writing C++20 coroutines can initially be a duanting task

Before getting started, it might be worth trying to understand why the C++20 coroutine machinery appears to be rather complicated, just so to get an idea of what to watch out for:

  • The most simple patterns and examples are not supported out of the box by the framework. You have to write a bunch of filler code to get anything working
  • The framework has many moving parts and each part internally is constructed from chewing gum. In fact, it’s hard to just understand where the variability ends in the framework
  • The chewing gum aspect makes the framework incredible flexible at the cost of making even simple examples non-trivial to implement
  • The names given to the various types in the framework are confusing
  • To understand what can and cannot be done you literally have to understand everything – understanding bits and pieces just won’t cut it when you want to implement something non-trivial

No ... the framework is not really that complicated ... it takes a while to get used to ... thats's all.

A simple generator based on C++20 coroutines

The problem: we want to write a generator that generates n numbers as a sequence: 0, 1, 2, ... n-1.

That's not really spectacularly hard to achieve using standard programming techniques. However, if the generator function maintains a more complicated state (where to continue in the function, temporary variables, etc.) it can become quite hairy.

The solution: use C++20 coroutines to implement the generator.

The coroutine function

Below is the code for a simple generator, generating a sequence of integers as: 0, 1, 2, ... n-1:

#include <coroutine> 
#include <iostream>
#include <optional>


// filler code
…

template<typename T>
struct Gen{
  T value()const;
  optional<T>next();
};

Gen<int>genint(int n){
  for(int i=0;i<n;++i)co_yield i;
}

By filler code I mean customized boiler type code that we have to write so that our code works with the C++20 coroutine machinery.

We are missing the definition of Gen in the code. But had we had it we could have printed the sequence as:

int main(){ 
  auto gen=genint(5);
  while(auto val=gen.next()){
    cout<<val.value()<<endl;
  }
}

with the output:

0
1
2
3
4

Before writing the Gen class, let us take a look at the statement:

co_yield i;

Conceptually the statement says:

  • Save the current state of the coroutine genint
  • Save the value i somewhere so that the caller can retrieve it by calling value() (or next()) on the gen object
  • Suspend the coroutine and return to the caller

Exactly how we accomplish this we will see in the implementation of Gen. As you will see, this is where we enter the somewhat confusing and flexible machinery of the C++20 coroutines.

The coroutine function return object

The Gen class looks like:

// definition of val_promise_type
// (to be implemented later)
// …

// Gen implementation
template<typename T>
struct Gen{
  // some typedefs
  using value_t=T;
  using promise_type=val_promise_type<Gen>;

  // keep handle so we can get to promise
  coroutine_handle<promise_type>handle_;

  // ctor
  // (called from 'promise_type::get_return_object()')
  Gen(coroutine_handle<promise_type>handle):handle_(handle){}
  
  // dtor
  ~Gen(){handle_.destroy();}

  // get current value
  value_t value()const{
    return handle_.promise().value();
  }
  // get next item
  optional<value_t>next(){
    if(handle_.done())return std::nullopt;
    handle_.resume();
    return handle_.done()?optional<value_t>{}:value();
  }
}

To start with, the Gen class is using a type called promise_type. In fact, the C++20 framework requires (actually no … but for now let’s pretend it does) the Gen class to contain a type called promise_type. We’ll get to that later, but for now, let’s just assume we have that type defined.

Before continuing, notice that the class Gen< int> is the class the application code deals with when it wants data from a sequence. The code will not deal further with the coroutine function (genint) once it has gotten hold of the Gen<int> object.

Let’s take note of a few things in the code:

  • Gen implements a method value() to retrieve the last value that the coroutine co_yielded
  • Gen implements a method next() for stepping to the next value in the sequence

Notice that these are our methods – they are not part of the framework - so we could have named them anything we wanted. In fact, we could even have supplied an iterator interface inside the Gen class. In a more thought through design, we probably would have implemented an iterator interface inside the generator.

So far we have not touched much related to the C++20 coroutine framework. Now we will do just that:

  • Inside the value() method, we retrieve the yielded value from the promise object. When we implement the promise_type we will see how the value ends up there
  • Inside the next() method, we use something called a coroutine_handle to check if the coroutine is done. Then we use the same handle to resume the coroutine. Finally, we return the value – a non-null value if the coroutine is not done, and otherwise a null value.

From now on, I will call the object returned by a coroutine – in this case an instance of Gen – the return object. Some articles and blogs calls it a future. However, I think the term future is misleading and clashes with the C++ type std::future typically used in multi threaded programming.

The promise type

Finally we will look at what the promise_type is all about. I will give the implementation right here and I will explain it afterwards:

Before continuing, you might want to take a quick look this diagram: What is inside the runtime coroutine object?

template<typename CRET>
class val_promise_type{
public:
  // we (not the C++20 coroutine framework) require 'return_object' to have a 'value_t' typedef
  using value_t=typename CRET::value_t;

  CRET get_return_object(){return CRET(coroutine_handle<val_promise_type>::from_promise(*this));}
  void return_void(){}
  std::suspend_always yield_value(value_t val){val_=val;return {};}
  std::suspend_always initial_suspend(){return {};}
  std::suspend_always final_suspend(){return {};}
  value_t value(){return val_;}
private:
  value_t val_;
};

The promise_type is a class used by and required by the C++20 coroutine framework. This is what the promise_type functions mean:

  • get_return_object: called by the framework when the return object from the coroutine is created and returned to the caller
  • return_void: called when we do a co_return from the coroutine without specifying a return value. In our example, we are not doing this just yet!
  • yield_value: called by the framework when co_yield is called in the coroutine functions
  • initial_suspend: when the coroutine is first called, this method controls how the execution of the routine should proceed. In our promise_type we return an object suspend_always which means that the coroutine body will not execute on the first call
  • final_suspend: similar to initial_suspend except this method is called right before the coroutine complets execution

There is a lot more to say about what methods are called when and under what circumstance they are called, what the implications are and as well as how the compiler finds the promise_type. For right now I won’t get into this since it would muddle the generator example.

The complete code

We now have all the parts: the promise_type, the return object and the coroutine function. Let’s put all of it inside a single code snippet:

#include <iostream>
#include <vector>
#include <optional>
#include <queue>
#include <coroutine>
using namespace std;

// --- (begin) promise object
// promise used when we need to store a single value
template<typename CRET>
class val_promise_type{
public:
  // we require 'return_object' to have a 'value_t' typedef
  using value_t=typename CRET::value_t;

  // these functions are required by teh C++20 coroutine machinery
  CRET get_return_object(){return CRET(coroutine_handle<val_promise_type>::from_promise(*this));}
  void return_void(){}
  std::suspend_always yield_value(value_t val){val_=val;return {};}
  std::suspend_always initial_suspend(){return {};}
  std::suspend_always final_suspend(){return {};}
  
  // this is our own function (not required by the C++20 coroutine machinery
  value_t value(){return val_;}
private:
  value_t val_;
};
// --- (end) promise object

// -- (begin) couroutine return object for generator
template<typename T>
struct Gen{
  // some typedefs
  using value_t=T;
  using promise_type=val_promise_type<Gen>;

  // keep handle so we can get to promise
  coroutine_handle<promise_type>handle_;

  // ctor
  // (called from 'promise_type::get_return_object()')
  Gen(coroutine_handle<promise_type>handle):handle_(handle){}
  
  // dtor
  ~Gen(){handle_.destroy();}

  // get current value
  value_t value()const{
    return handle_.promise().value();
  }
  // get next item
  optional<value_t>next(){
    if(handle_.done())return std::nullopt;
    handle_.resume();
    return handle_.done()?optional<value_t>{}:value();
  }

};
// -- (end) coroutine return object

// --- (begin) generators

// simple integer sequence
Gen<int>genint(int n){
  for(int i=0;i<n;++i)co_yield i;
}
// we’ll add a few more generators here later
…

// --- (end) generators

// run generator and print result
void genprint(auto&gen){
  while(auto val=gen.next()){
    cout<<val.value()<<endl;
  }
}
// test program
int main(){
  cout<<"----- int sequence"<<endl;
  auto gen=genint(5);
  genprint(gen);
}

Using gcc (GCC) 10.2.0 you can compile and link the code with the following command:

g++  -std=c++20 -fcoroutines  -fno-exceptions -o test1 test1.cc

What just happened?

Before continuing, you might want to take a quick look this diagram: What is inside the runtime coroutine object?

Here are a few points that might help in understanding how the code works:

  • when the coroutine genint is called the framework creates (among other things) a promise_type object by calling the get_return_object on the promise object
  • in the get_return_object we construct a Gen<int> object which is then returned to the caller. Notice that we stuff a coroutine_handle object in the returned object
  • later we call next on the returned object which in turn calls resume on the coroutine_handle
  • the resume call tells the framework to resume execution of the coroutine where it last left off. In our case, if this is the first time it will enter the for-loop in the coroutine function
  • the coroutine function executes until it gets to the co_yield statement. This statement triggers the framework to execute the yield_value function in the promise object passing the value we gave to co_yield as parameter
  • in the yield_value function we stuff the value in the val_ attribute so it can later be retrieved
  • now we are back in the next function in the Gen object. There we check if the coroutine is done. If the coroutine is not done we retrieve the value we stuffed in the promise and return it to the caller
  • the caller (in our case genprint) prints the value to stdout

There are a number of rules for how the framework behaves depending on what we do and what we return from methods in the return object and methods in the promise object. But for now, we won’t delve into those rules until we have a better grasp on how the whole machinery works.

More generators

The bad thing is that we had to write a lot of filler code to get a simple generator to work. The good thing is that we can now write more generators without writing more filler code.

BFS graph traversal

We can now use the generator we wrote to traverse a graph in BFS order.

The graph is defined using an adjacency list representation – that is, if vertex 5 has edges pointing to vertex 7, 8 and 10 then g[5]={7, 8, 10}:

// simple BFS traversal of graph
using graph_t=vector<vector<int>>;
enum color_t{white=0,grey=1,black=2};

Gen<int>genbfs(graph_t const&g,int ustart){
  if(g.size()==0)co_return;
  vector<color_t>color(g.size(),white);
  queue<int>q;
  q.push(ustart);
  color[ustart]=grey;
  while(!q.empty()){
    int u=q.front();
    q.pop();
    co_yield(u);
    for(int v:g[u]){
      if(color[v]!=white)continue;
      q.push(v);
      color[v]=grey;
    }
    color[u]=black;
  }
}

An example of BFS traversal of a graph is the following code which can be copied and pasted into the main program:

cout<<"----- bfs graph"<<endl;
graph_t g{
    {1, 2},
    {4},
    {3},
    {4},
    {5},
    {}
  };
  auto bfsgen=genbfs(g,0);
  genprint(bfsgen);
  

Character reader + tokenizer

A more interesting example, where two coroutines cooperate, is an example where one coroutine reads data from a source and then feeds the data to a tokenizer:

// character reader + tokenizer
Gen<char>genreadchar(string const&str){
  for(char c:str)co_yield c;
}
Gen<string>gentok(Gen<char>&creader){
  string currtok;
  while(creader.next()){
    char c=creader.value();
    if(c==' '){
      if(currtok.length()){
        co_yield currtok;
        currtok.clear();
      }
    }else{
      currtok.push_back(c);
    }
  }
  if(currtok.length())co_yield currtok;
}

The code to run this from the main program is then:

  cout<<"----- tokenizer graph"<<endl;
  string inpstring="Hello again where is the cheese";
  auto creader=genreadchar(inpstring);
  auto tokreader=gentok(creader);
  genprint(tokreader);

Permutations

We can write a final generator that creates all permutations of a string. We’ll use the Knuth algorithm (Knuth, Volume 4A, page: 319) that generates permutations in lexicographical increasing order:

// generate permutations in lexicographical order
Gen<string>genperm(string s){
  int n=s.length();

  // sort in ascending order and return the result as the first permutation
  sort(begin(s),end(s));
  co_yield(s);

  if(n==1)co_return; 
  while(true){
    // find largest k such that s[k]<s[k+1]
    int k=n-2;
    for(;k>=0;--k){
      if(s[k]<s[k+1])break;
    }
    if(k<0)co_return;

    // find largest l>k such that s[k]<s[l]
    int l=n-1;
    for(;l>k;--l){
      if(s[k]<s[l])break;
    }
    swap(s[k],s[l]);
    reverse(begin(s)+k+1,end(s));

    // we now have the next permutation
    co_yield s;
  }
}

And the call from the main program is:

  // generate permutation of string in lexographical order
  string str="1234";
  auto permgen=genperm(str);
  genprint(permgen);

What is inside the runtime coroutine object?

I will start by drawing a picture that conceptually describes what is going on:

Converted document

An important interface to the C++20 coroutine machinery is the awaitable/awaiter interfaces. These interfaces are not part of this diagram. I excluded them since there are already enough moving parts in the diagram. I will describe them in the next blog entry.

An important observation is the triplet [bookkeeping, frame, promise]. The triplet is the bundle that represents a coroutine object. The bookkeeping part contains information about where to start when a coroutine is resumed. The frame contains temporary variables, variables in the coroutine and parameters to the coroutine. The promise is the part we have some control over. For example, the yield_value function is called when co_yield is invoked from the coroutine. In that function we have a hook to store the value so we later can retrieve it.

Using gcc10 in a Docker based environment

I wrote and tested the code within a Docker image to avoid polluting the machine with more versions of C++ compilers and C++ environments. However, if you already have a C++20 compliant compiler installed, you can safely skip this section. Below is a brief description of how my environment was configured.

Be aware that I'm not an expert in configuring Docker. However, my basic setup appears to work OK.

The setup contains two files:

  • Dockerfile
  • Makefile

The Dockerfile lays out the docker image and looks like this:

FROM gcc:latest

# setup init process
ENV TINI_VERSION v0.19.0
ADD https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini /tini
RUN chmod +x /tini
ENTRYPOINT ["/tini", "-g", "-p", "SIGTERM", "--"]

# install vim (does not come with gcc:lates
RUN apt-get update && apt-get install -y vim

# create/configure new user
ARG arg_user
ARG arg_group
ARG arg_uid
ARG arg_gid

RUN echo $arg_gid
RUN echo $arg_group

RUN groupadd -g $arg_gid $arg_group
RUN useradd -u $arg_uid -g $arg_gid -ms /bin/bash $arg_user
WORKDIR /home/$arg_user

COPY vimrc /home/$arg_user/.vimrc
RUN chown $arg_user .vimrc
RUN chgrp $arg_group .vimrc

COPY bashrc /home/$arg_user/.bashrc
RUN chown $arg_user .bashrc
RUN chgrp $arg_group .bashrc

# run container as $user_arg
USER $arg_user

# start bash
CMD /bin/bash

As you probably can infer from the Dockerfile that the user within the docker image will have the same user name and user id as the user building the image (not as the one running the container).

The Makefile is a quick hack that contains simple commands for building the Docker image as well as starting a container. Additionally it contains a few commands for removing the image, removing containers etc.:

IMAGE=hans-cpp20

all: usage

run:
        docker run -v `pwd`/src://home/$${USER}/src --name $(IMAGE) --rm  -it $(IMAGE) 

build:
        docker build --build-arg arg_user=$$(id -un) --build-arg arg_group=$$(id -gn) --build-arg arg_uid=$$(id -u) --build-arg arg_gid=$$(id -g)  -t $(IMAGE):latest .

rmimage:
        docker image ls | grep $(IMAGE) | awk '{print $$3}' | xargs -I{} docker image rm "{}"

lsimage:
        docker image ls | grep $(IMAGE) 

rmcont:
        docker container ls -a | grep $(IMAGE) | awk '{print $$1}' | xargs -I{} docker container rm "{}"

lscont: 
        docker container ls -a | grep $(IMAGE)

save:
        docker save -o $(IMAGE).tar hans-centos-test1

load:
        docker load -i $(IMAGE).tar

prune:
        docker image prune -f
        docker container prune -f

clean: rmcont rmimage
        rm -f junk.txt *.tar

usage:
        @echo "make all                 # print this message"
        @echo "make run                 # run docker image"
        @echo "make build               # build docker image"
        @echo "make rmimage             # remove image"
        @echo "make lsimage             # list image"
        @echo "make rmcont              # remove container(s)"
        @echo "make lscont              # list container(s)"
        @echo "make save                # save image to tar file"
        @echo "make load                # load image from a saved tar file"
        @echo "make prune               # prune stopped containers and dangling images"
        @echo "make clean               # remove container + image"

To build the Docker image run the following command:

make build

To run a container, execute the following command:

make run

To exit the container do:

^D

The setup maps the src directrory in the current directory to directory /home/<user>/src iside the Docker container. So anything you do inside or outside the container to this directory is reflected inside and outside Docker.

Next: handling exceptions and scheduling coroutines for execution in a thread pool

In the first followup blog entry I will show how exceptions are managed within the framework of C++20 coroutines. The examples in this blog did not deal with exceptions. In fact, on the compile line we explicitly specified that exceptions would not be supported.

In the blog entry following exception handling I will show how to move coroutine objects into a thread pool for scheduled execution. The code will use awaiter objects to control how a coroutine is suspended when the co_await operator is executed.