Wednesday, June 1, 2022

A few notes on the C++20 constraints machinery

Ruthless in purpose insidious in method

-- Orphan Black TV Series, season 3 Episode 8

In this entry I will make a few comments on how the constraint machinery introduced in C++20 works. Most of the comments I have touches on issues which were not completely clear to me to when trying out the new C++20 constraint related constructs the first time. However, the content of this entry is fairly basic and should not introduce new ideas or clever solutions to problems.

I will not get into all the syntactic and semantic details of the C++20 constraint machinery. Instead I'll make an attempt to categorize various ideas so as to get a better understanding of how it all fits together.

Three types of basic constraint constructs

There are three(3) types of basic contraints used when constraining the interface of templates:

  • plain boolean expressions evaluated at compile time
  • concepts
  • requires expressions

Constraints with plain boolean expressions

Let's take a look at constraining a template parameter using a plain boolean expression:

template<typename T>
requires is_integral_v<T>
int foo(T t){
  return t+17;
}
foo(1);     // OK '1' is integral
foo(1.0);   // ERROR '1.0' is not integral
As expected we can overload foo if we want to support floating point variables:
template<typename T>
requires is_floating_point_v<T>
int foo(T t){
  return t+3.14;
}
foo(1.0);   // OK now

There is nothing new here as far as the boolean expression is concerned. Only the requires keyword is new which ties the requirement to the template.

Constraints with concepts

concept is a new keyword in C++20. A concept lets you assign a name to a collection of requirements. As mentioned earlier, a constraint comes as one of three types (or as a combination of them) and a concept can mix multiple constraints. For example:

template<typename T>
requires integral<T>
int foo(T t){
  return t+17;
}
Here integral is a concept which evaluates to either true or false. We can simplify the code slightly by constraining T directly in the template (replace typename with the name of the concept):
template<integral T>
int foo(T t){
  return t+17;
}

Constraints using requires expressions

The third type of constraint is a requires expression. A simple example is:

template<typename T>
requires requires(T t){
  t+1;
}
int foo(T t){
  return t+17;
}
Notice that requires is part of the requires expression. The first requires starts what is called a requires clause (see the following section). The second requires starts the require expression. There are four(4) types of conditions that can be expressed inside a requires expression:
  • simple requirement (the example above expressed a simple requirement)
  • type requirements
  • compound requirements
  • nested requirements
A simple requirement just checks that it states something that is correct from a language point of view. It does not evaluate the expression. An example is the initial example from this section:

template<typename T>
requires requires(T t){
  t+1;
}
int foo(T t){
  return t+17;
}

A type requirement starts off with the word typename followed by a type. It checks that the type is valid. For example:

template<typename C>
requires requires{
  typename C::value_type;
}
void foo(C&c){
  ...
}
A compound requirement allows us to specify certain properties of an expression. For example, we can specify that the result of dereferencing a variable is convertable to an int:
template<typename IT>
requires requires(IT it){
  {*it}->convertible_to<int>;
}
void foo(IT it){
  return *it;
}

There are two things to notice here. First, to the right of the arrow, we don't specify the type of the expression. Instead, we specify an expression evaluating to either true or false.

Second, the concept is_convertible takes two template parameters. However, in the context in which it is used, the first parameter is the type of result of the expression and cannot be explicitly specified in the parameter list to convertible_to.

This may look strange. However, it is no different than when we replace typedef with the name of a concept and do not explicitly specify the parameter to the concept:

template<integral T>
void foo(T t){
   ...
}

integral takes one parameter. But because of the context the parameter is T and we cannot explicitly specify it.

A nested requirement nests a requires expression inside another requires expression:

template<typename IT>
requires requires{
  requires same_as<IT,int*>;
}
...

Conjunctions and disjunctions

A constraints can be formed as the conjunctions of two constraints. For example:

template<typename T,typename U>
requires integral<U>&&integral<T>
void foo(T t,U u){
  ...
} 
Here we require that both T and U are of integral type.

Disjunctions are similar with the exception that one one of the two participating constraints must be true. Disjunctions are formed using the || operator.

Requires clause

A requires clause is the clause that attaches constraints to template arguments or a function declaration. As shown in the example it starts with the keyword requires and is followed by a constraint.

Notice that both a requires clause and a requires expression start with a requires word. Therefore when introducing a requires expression as a constrint in a requires clause we end up with two requires words as:

template<typename T>
requires requires(T t){t+1;}
void foo(T t){
  return t+17;
}

constraints and auto

Constraints also work with auto declared parameters. An auto declared parameter is constrained by writing the requirement before the auto keyword.

As usual when comparing code using auto with template based code, we see that in the case of auto we have to write more code. For example, to get the type of a parameter we must often use decltype in conjunction with decay which makes the code more cryptic and less readable.

Let's now take a look and compare constraints on template parameters with constraints on code using auto declared parameter.

Using plain templates

Say we want to constrain a template function so that it is only called on collections supporting push_back of values. The syntax for a concept that supports this is fairly straight foward.

template<typename C>
concept SupportPushback=requires(C c){
  typename C::value_type;
  c.push_back(declval<typename C::value_type>());
};

We can use this concept in the implementation of a function template that adds values to a container:

// using plain templates
template<SupportPushback C,typename ...T>
void add(C&c,T const&...ts)
requires (convertible_to<T,typename C::value_type>,...)
{
  (c.push_back(ts),...);
}

For safety we added a bool expression that ensures that the values we insert are convertible to the value type of the container.

Using auto parameters

We can re-write this function template slightly, but this time with auto parameters:

// using auto
void add(SupportPushback auto&c,auto&&...ts)
requires (convertible_to<decay_t<decltype(ts)>,typename decay_t<decltype(c)>::value_type>,...)
{                                           // note: std::decay is mandatory so we can check convertibility                                                                                                             
  (c.push_back(std::forward<decltype(ts)>(ts)),...);                               
}

Clearly it becomes more verbose as well as slightly more messy.

Remember that decay converts from l-value to r-value after which it removes references as well as any top level cv-qualifications. It allows us to cleanly compare two types without cv-qualifications and references involved.

Notice that we constrain the auto. declared parameter by writing the name of the concept before auto Since we don't have a name for the type of the container nor the type of the variables we store in the container we have extract the type using decltype. To get the convertible_to to work we also have to use decay so that types can be compared properly or it will not compile correctly.

Constraints and constexpr

The C++ constraint constructs (concepts, requires expressions and obviously plain compile time boolean expressions) all evaluate to bool at compile time. This means that they can be used in constexpr conditional statements as an alternative to function overloading.

The following example shows how we can use concepts in conjunction with constexpr conditionals to map out the execution path during compilation:

template<typename C,typename T>
void add2collection(C&c,T const&val){
  // make sure the collection supports either push_back(...) or insert(...) of elements
  static_assert(SupportPushback<C>||SupportInsert<C>,"invalid collection type in 'add2collection'");
  
  // check if we have a push_back(...) or insert(...) collection
  if constexpr(SupportPushback<C>){
    c.push_back(val);
  }else
  if constexpr(SupportInsert<C>){
    c.insert(val);
  }
}

Constraints on non-templates

Yes, it is possible (and sometimes also meaningfull) to constrain functions that are non-templates. Here I'll show a mini demo example that at first might look like cheating since it is based on a class template. However, the functions are plain functions - not function templates:

...
template<typename T>
struct Kar{
  void foo()requires integral<T>{cout<<"integral"<<endl;};
  void foo()requires floating_point<T>{cout<<"floating_point"<<endl;};
};
...

  Kar<int>karInt;         // prints 'integral'
  Kar<double>karFloat;    // prints 'floating_point'
  karInt.foo();
  karFloat.foo();
  ...

Constraints and lambdas

Lambdas using template parameters as well lambdas using auto parameters can be constrained. However, I have difficulty in seing many use cases where this might be useful. Clearly I might be missing something here. In any case, I'll give a small example where we actually do overload resolution across a set of lambda functions.

Say we want to apply a visitor to a value in a variant object. The following idiomatic construct together with overloading using concepts lets us in a fairly simple but convoluted double dispatch way seek out the correct lambda function to call.

First we'll use an idiomatic construct defining a class containing multiple overloaded operator() functions:

...
// class holding multiple overloaded operator() functions
template<typename ... Ts> struct overloaded : Ts... { using Ts::operator()...; };
...

Next we'll create a set of lambda functions which we package inside a single function object:

   ...
  // callable function objects
  auto f_integral=[]<integral T>(T const&t){cout<<"integral"<<endl;};
  auto f_int=[](int t){cout<<"int"<<endl;};
  auto f_floating_point=[](floating_point auto const&t){cout<<"floating_point"<<endl;};
  auto f_string=[](string const&t){cout<<"string"<<endl;};
  auto f_push_back=[](SupportPushback auto const&t){cout<<"pushback"<<endl;};
  auto f_insert=[](SupportInsert auto const&t){cout<<"insert"<<endl;};

  // merge all callable objects into a single one having multiple operator()
  auto fvisit=overloaded{f_integral,f_int,f_floating_point,f_string,f_push_back,f_insert};
  ...

Finally we'll apply the function object using double dispatch to the the optional object:

  ...
  // for a variable 'val' visit the correct function object
  variant<long,int,double,string,vector<int>,set<string>>w=set<string>{};
  visit(fvisit,w);    // will call fvisit.operator()(SupportInsert auto const&t)
  ...

The output is:

insert

Ordering of constraints

The C++ reference https://en.cppreference.com/w/cpp/language/constraints defines how to impose an ordering of constraints:

A constraint P is said to subsume constraint Q if it can be proven that P implies Q up to the identity of atomic constraints in P and Q.

I won't get into the details here, but it suffices to say that the constraint P is narrower than the constraint Q if P subsumes Q. The (or one of the) benefit I can see of ordering of contraints is that if P subsumes Q, and P is satisfied, then P will be selected during overload resolution even though Q is also satisfied

Say we need a wrapper class storing a pointer to some data. We want to have the option of either storing the data as a raw pointer or as a smart pointer. In the destructor we want to delete the data if the pointer is a raw pointer. In the case of a smart pointer we want call release on the (smart) pointer. Depending on the behaviour of the smart pointer a release on the smart pointer may or may not delete the data held by the pointer.

Both raw and smart pointers act as pointers so we will impose a constrait that the pointer in the wrapper class satisfies a concept I named IsPointer. A smart pointer additionally has the feature of releaseing the held memory. Since the smart pointer is a pointer but has an additional constraint attached we say that the smart pointer constraint subsumes the raw pointer constraint and is therefore a more narrow constraint.

An example (that also incidentally is a better and more realistic example than my previous one of constraints applied to functions that are not function templates) will clarify.

Notice that in this implementation I've not made the effort that probably should be done to separate raw pointers from arrays. In real production code this should most likely be analysed and handled.

template<typename T>
concept IsPointer=requires{is_pointer_v<T>;};

template<typename T>
concept IsResettable=requires(T t){t.reset();};

template<IsPointer DATA>
class DataWrapper{
public:
  // ctor for moving/copying data pointer
  DataWrapper(DATA&&data):data_(std::forward<DATA>(data)){}
  DataWrapper(DATA const&data):data_(data){}

  // destructors for data satisfying 'pointer' requirement as well as for 'pointer AND resettable' requirement
  ~DataWrapper()requires IsPointer<DATA>{delete data_;}
  ~DataWrapper()requires IsPointer<DATA> && IsResettable<DATA>{data_.reset();}
private:
  DATA data_;
};
...
  int*rp=new int(100);                  // store a pointer to an int and delete memory when (first) destructor is called
  DataWrapper fd1(rp);
  auto sp=make_shared<int>(200);        // store a shared_ptr<> holding an int pointer and call reset on shared_pointer<> when (second) destructor is called
  DataWrapper d2(sp);
  auto up=make_unique<int>(300);        // store a unique_ptr<> to an int and call reset on pointer when unique_ptr<> when (second) destructor is called
  DataWrapper d3(std::move(up));
...

The destructors is where the interesting thing happends. The constraint in the second destructor subsumes the constraint in the first destructor and is chosen by the compiler for smart pointers.

Finally, it is important to undesratnd that for a constraint to subsume another constraint, the constraints must be expressed as concepts. The constraints cannot be expressed as requires expressions or plain boolean expressions.

Benefits of using constraints

Over time, I'm positve that the new constraint mechinery in C++20 will flush out a number of design paradigms as well as idiomatic constructs combining various C++ features that are not obvious at this point in time. In general, right now with limited knowldege, I would say that the biggest benefit of concepts and contraints is overload resolution. Overload resolution with constraints gives us clear un-convoluted scalpel like precision when controlling what functions to call without resorting to indirect constructs that happends to constrain template parameters by convoluted side effects.

C++20 concepts are sometimes touted as a panacea for better error messages. I do agree with that better error messages are sometimes a benefit of concepts and constraints - but not always. Make eneough errors in template based code and you will still generate possibly hundreds of lines, or thousands of lines of error messages.

A soft benefit of concepts and constraints is that it gives us a semi formal way of making the intent of template parameters more explicit. The trick here is to not go down the path where we focus on the syntax that is allowed, but more on the semantics of template parameters. Clearly, it is not possible to in a formal way specify the semantics of template parameters, but via the syntax we can make an attempt.

For example, in a previous section where we specified that a collection is either a SupportPushback or a SupportInsert colection, we specified the constraints using syntax. However, the intent is to separate between the semantics of adding a (possibly duplicate) value at the end of a collection and adding a unique value to the collection. Possibly for this semantic we should have used different names for the concepts. However, in C++ the semantics of push_back and insert on collections are fairly well agreed on.

A mini library for easy streaming of various collections

When printing debug messages, it's often convenient to print elements of a collections inline in a cout statement. Having files littered with ad-hoc code printing out elements in collections quickly becomes messy.

A library with a single function strcat(...) could be used as follows where the first parameter is the separator to be used between elements in the collection being printed:

  ...
  tuple<int,string,double>mytuple(1,"Hello",3.14);
  optional<int>myoptional(17);
  set<string>myset{"1","2","3"};
  cout<<"my tupple: ["<<strcat(", ",mytuple)<<"], myoptional: ["<<strcat(", ",myoptional)<<"], myset: ["<<strcat(", ",myset)<<"]"<<endl;
  ...

The output becomes:

my tupple: [1, Hello, 3.14], myoptional: [17], myset: [1, 2, 3]

There are of course many ways this can be implemented. Combining overloaded functions templates with the C++20 constraint machinery turns out to be suprisingly straight forward.

We'll start with a support concept that checks if a list of types can be streamed:

// check if types in a type pack are all streamable
template <typename...Ts>
concept Streamable=requires(std::ostream&os,Ts...ts){
  {((os<<ts),...)}->std::same_as<std::ostream&>;
};

Next, we'll implement a support function making std::pair streamable:

// declaration: make 'pair' streamable
template<Streamable S1,Streamable S2>
std::ostream&operator<<(std::ostream&os,std::pair<S1,S2>const&p);

We'll treat the class std::optional as a collection having zero or one elements:

// merge a single (or no)_ value from an optional parameter into a string
// (note: the 'sep' parameter is never used since an 'optional' (collection) has at most one element)
template<Streamable S,Streamable T>
[[nodiscard]]std::string strcat(S const&sep,std::optional<T>const&p){
  std::stringstream ret;
  if(p)ret<<p.value();
  return ret.str();
}

In the implementation below, the call to boost::lexical_cast should be updated to use 'normal' streaming to a std::stringstream for consitency (I have simply not done this due to time constraints). Additionally, the implemenation should probably be extended to support a few more data structures.

The remainder of the implementation follows here:

// merge streamable values from a collection with separator in between elements
// (we don't want to use this function if C<T> is a string, instead we want the 'strcat(S, T, Ts ...)
template<Streamable S,Streamable T,template<class>class C>
requires requires(S s,C<T>c,std::ostream&os){
  begin(c);
  os<<s;
  os<<*begin(c);
  begin(c)!=end(c);
}
[[nodiscard]]std::string strcat(S const&sep,C<T>const&c){
  std::stringstream ret;
  for(auto it=begin(c);it!=end(c);){
    ret<<*it;
    if(++it!=end(c))ret<<sep;
  }
  return ret.str();
}
// merge streamable values from a pack into a string with separator between values
template<Streamable S,Streamable T,Streamable...Ts>
[[nodiscard]]std::string strcat(S const&sep,T const&t,Ts const&...ts){
  std::stringstream ret;
  ret<<t,((ret<<sep<<ts),...);
  return ret.str();
}
// merge streamable tuple values into a string with separator between values
template<Streamable S,Streamable...Ts>
[[nodiscard]]std::string strcat(S const&sep,std::tuple<Ts...>const&tu){
  constexpr std::size_t LEN=sizeof...(Ts);               // get length of tuple
  std::stringstream ret;                                 // write tuple to a string stream
  std::string ssep=boost::lexical_cast<std::string>(sep);// convert sep to string
  std::string emptysep;                                  // get empty separator as a string
  [&]<std::size_t...Is>(std::index_sequence<Is...>){     // lambda template writing tuple elements
    ((ret<<std::get<Is>(tu)<<(Is<(LEN-1)?ssep:emptysep)),...); // don't write separator after last element
  }(std::make_index_sequence<LEN>());                    // call lambda and pass an index sequence as parameter so we can walk through tuple elements
  return ret.str();                                      // get string from string stream
}
// merge streamable pair value into a string with separator between values
template<Streamable S,Streamable T1,Streamable T2>
[[nodiscard]]std::string strcat(S const&sep,std::pair<T1,T2>const&p){
  std::stringstream ret;                                 // write tuple to a string stream
  std::string ssep=boost::lexical_cast<std::string>(sep);// convert sep to string
  ret<<p.first<<ssep<<p.second;
  return ret.str();                                      // get string from string stream
}

Friday, December 10, 2021

C++ coroutine_traits: short notes

Understanding reality. It’s not about personal growth. It shouldn’t be. It’s trying to understand how reality works at a deep level. It’s not that we want the worlds to be there; it’s just the simplest, most austere way of understanding the data.

--Wired Magazine, Interview with Sean Carrol: https://www.wired.com/story/sean-carroll-thinks-we-all-exist-on-multiple-worlds/

A slight digression

Before continuing on the next post which will be about co_await, awaitable/awaiter interfaces and scheduling of coroutines across threads, I would like to implement a small variant of the previous blog entry.

In this entry I make a small dig into decoupling the future object (coroutine return object) from the promise_type. Well, at least decoupling as far as referencing the promise_type directly in the future class.

Looking up the promise_type using coroutine_traits

In the previous post I specified the promise type inside the future class as:

template<typename T>
struct Gen{
  using promise_type=val_promise_type<Gen>;
  ...
};

What if I don't want to explicitly name the promise_type inside the future? Can I do that? Yup, that is doable within the framework of C++20 coroutines.

A C++ traits class is simply a lookup table even though a rather convoluted one, where types are looked up using types as keys. A std::coroutine_traits class pointpoints a promise_type given the name of a future (coroutine return object) together with the parameter types to the coroutine.

In our example, the promise type is Gen<T> and the coroutine has the following signature where T is of type int:

Gen<int>genint(int n);

To lookup the corresponding future in the std::coroutine_traits the compilation machinery will use the following specialization we write for std::coroutine_trait:

// traits for looking up the promise object from a Gen<T> type
template <typename T, typename... Args>
struct std::coroutine_traits<Gen<T>,Args...>{
  class promise_type{
    ...
  };
};

The specialization is wider than needed since it allows for any number of coroutine parameters of unspecified types. In our case, we could have written the trait simply as:

template <>
struct std::coroutine_traits<Gen<int>,int>{
  class promise_type{
    ...
};
};

or as this if we want to force one parameter to the coroutine but leave the parameter type unspecified:

template <typename T>
struct std::coroutine_traits<Gen<T>,T>{
  class promise_type{
    ...
  };
};

From here on, I will stay with the last std::coroutine_traits specification.

The complete code

To summerize I'll show the complete code here. There are some noticeable differences from the code in the previous blog post that I will point out.

The first difference is that the promise_type is not explicitly mentioned in the future (Gen<T> class). Instead, the compilation machinery finds the promise_type by looking it up inside the std::coroutine_traits class that we defined.

A consequence of that we don't know the promise_type inside the future is that we cannot call a method on the promise_type<void> to retrieve the yield value. That is, we cannot retrieve the promise_type object by calling promise() on the coroutine handle. Instead, in the promise_type where the future is created, we pass as a second parameter to the future constructor a reference to the value stored inside the promise_type. Now, we can access the value stored inside the promise_type object.

This may not look too elegant, but it does solve our problem of not knowing the structure of the promise_type from inside the future when we need to get hold of the value.

The C++ documentation states that coroutine_handle<void> erases the promise_type. It should be noted that the coroutine_handle<void> is still a perfectly good handle for controlling the coroutine., With the handle we can resume the coroutine, or we can check if the coroutine completed (using done()) without knowing anything about the promise_type.

#include <iostream>
#include <vector>
#include <optional>
#include <coroutine>
using namespace std;

// future that does not know the promise_type
// (promise_type is looked up by the compiler through the std::coroutine_traits<Gen<T>,T>)
template<typename T>
class Gen{
public:
  // ctor, dtor
  Gen(coroutine_handle<void>handle,T const&valref):handle_(handle),valref_(valref){}
  ~Gen(){handle_.destroy();}

  // get current value
  // (we have a reference to the last yield value in the promise object)(
  T value()const{return valref_;}

  // get next item
  optional<T>next(){
    if(handle_.done())return std::nullopt;
    handle_();    // resume coroutine: handle_.resume <--> handle_()
    return handle_.done()?optional<T>{}:value();
  }
private:
  coroutine_handle<void>handle_; // keep handle so we can control coroutine (we don't know the promise_type - so void)
  T const&valref_;               // ref to value we store inside promise
};

// traits for looking up the promise object from a Gen<T> type
template <typename T>
struct std::coroutine_traits<Gen<T>,T>{
  // promise used when we need to store a single yield value
  class promise_type{
  public:
    Gen<T>get_return_object(){return Gen<T>(coroutine_handle<promise_type>::from_promise(*this),val_);}
    void return_void(){}
    std::suspend_always yield_value(T val){val_=val;return {};}
    std::suspend_always initial_suspend(){return {};}
    std::suspend_always final_suspend(){return {};}
    void unhandled_exception(){std::terminate();}
  private:
    T val_;
  };
};

// simple integer sequence
Gen<int>genint(int n){
  for(int i=0;i<n;++i)co_yield i;
}

// test program
int main(){
  auto gen=genint(5);
  while(auto val=gen.next())cout<<val.value()<<endl;
}

A quick conceptual snapshot

It's time to take birds eye snapshot view of what we have done so for. It's difficult to not get lost in the technicalities of the C++20 coroutine machinations.

The four main constructs we have dealt with are the promise_type, the future object, the coroutine itself and the coroutine_handle. The relationships between them are roughly (in our example):

Converted document

Before closing, it's worth mentioning that we could also have stored the value in the future object. Each time a new yield value was produced, we could have propagate it to the future so that a user could get hold of it. Of course, we would then have had to store a reference to the future inside the promise_type object.

Friday, April 30, 2021

Taking a first stab at C++20 coroutines

Philosophers' Syndrome: mistaking a failure of the imagination for an insight into necessity.

--Daniel C. Dennett, Consciousness Explained

There is no doubt that the C++20 coroutine framework gives the impression of being a rather complex piece of machinery.

If you are just starting out with C++20 coroutines, I would suggest that you read this blog together with a number of other blogs targeting C++20 coroutines, as well as the specification and the C++ online cpp reference pages.

Here are a few references to articles, papers and blogs that I have found useful:

Before getting started, I apologies for that the C++ syntax highlighting no longer seems to be working in this blog. Tracking down the problem is too time consuming right now and I will try to fix it later.

Writing C++20 coroutines can initially be a duanting task

Before getting started, it might be worth trying to understand why the C++20 coroutine machinery appears to be rather complicated, just so to get an idea of what to watch out for:

  • The most simple patterns and examples are not supported out of the box by the framework. You have to write a bunch of filler code to get anything working
  • The framework has many moving parts and each part internally is constructed from chewing gum. In fact, it’s hard to just understand where the variability ends in the framework
  • The chewing gum aspect makes the framework incredible flexible at the cost of making even simple examples non-trivial to implement
  • The names given to the various types in the framework are confusing
  • To understand what can and cannot be done you literally have to understand everything – understanding bits and pieces just won’t cut it when you want to implement something non-trivial

No ... the framework is not really that complicated ... it takes a while to get used to ... thats's all.

A simple generator based on C++20 coroutines

The problem: we want to write a generator that generates n numbers as a sequence: 0, 1, 2, ... n-1.

That's not really spectacularly hard to achieve using standard programming techniques. However, if the generator function maintains a more complicated state (where to continue in the function, temporary variables, etc.) it can become quite hairy.

The solution: use C++20 coroutines to implement the generator.

The coroutine function

Below is the code for a simple generator, generating a sequence of integers as: 0, 1, 2, ... n-1:

#include <coroutine> 
#include <iostream>
#include <optional>


// filler code
…

template<typename T>
struct Gen{
  T value()const;
  optional<T>next();
};

Gen<int>genint(int n){
  for(int i=0;i<n;++i)co_yield i;
}

By filler code I mean customized boiler type code that we have to write so that our code works with the C++20 coroutine machinery.

We are missing the definition of Gen in the code. But had we had it we could have printed the sequence as:

int main(){ 
  auto gen=genint(5);
  while(auto val=gen.next()){
    cout<<val.value()<<endl;
  }
}

with the output:

0
1
2
3
4

Before writing the Gen class, let us take a look at the statement:

co_yield i;

Conceptually the statement says:

  • Save the current state of the coroutine genint
  • Save the value i somewhere so that the caller can retrieve it by calling value() (or next()) on the gen object
  • Suspend the coroutine and return to the caller

Exactly how we accomplish this we will see in the implementation of Gen. As you will see, this is where we enter the somewhat confusing and flexible machinery of the C++20 coroutines.

The coroutine function return object

The Gen class looks like:

// definition of val_promise_type
// (to be implemented later)
// …

// Gen implementation
template<typename T>
struct Gen{
  // some typedefs
  using value_t=T;
  using promise_type=val_promise_type<Gen>;

  // keep handle so we can get to promise
  coroutine_handle<promise_type>handle_;

  // ctor
  // (called from 'promise_type::get_return_object()')
  Gen(coroutine_handle<promise_type>handle):handle_(handle){}
  
  // dtor
  ~Gen(){handle_.destroy();}

  // get current value
  value_t value()const{
    return handle_.promise().value();
  }
  // get next item
  optional<value_t>next(){
    if(handle_.done())return std::nullopt;
    handle_.resume();
    return handle_.done()?optional<value_t>{}:value();
  }
}

To start with, the Gen class is using a type called promise_type. In fact, the C++20 framework requires (actually no … but for now let’s pretend it does) the Gen class to contain a type called promise_type. We’ll get to that later, but for now, let’s just assume we have that type defined.

Before continuing, notice that the class Gen< int> is the class the application code deals with when it wants data from a sequence. The code will not deal further with the coroutine function (genint) once it has gotten hold of the Gen<int> object.

Let’s take note of a few things in the code:

  • Gen implements a method value() to retrieve the last value that the coroutine co_yielded
  • Gen implements a method next() for stepping to the next value in the sequence

Notice that these are our methods – they are not part of the framework - so we could have named them anything we wanted. In fact, we could even have supplied an iterator interface inside the Gen class. In a more thought through design, we probably would have implemented an iterator interface inside the generator.

So far we have not touched much related to the C++20 coroutine framework. Now we will do just that:

  • Inside the value() method, we retrieve the yielded value from the promise object. When we implement the promise_type we will see how the value ends up there
  • Inside the next() method, we use something called a coroutine_handle to check if the coroutine is done. Then we use the same handle to resume the coroutine. Finally, we return the value – a non-null value if the coroutine is not done, and otherwise a null value.

From now on, I will call the object returned by a coroutine – in this case an instance of Gen – the return object. Some articles and blogs calls it a future. However, I think the term future is misleading and clashes with the C++ type std::future typically used in multi threaded programming.

The promise type

Finally we will look at what the promise_type is all about. I will give the implementation right here and I will explain it afterwards:

Before continuing, you might want to take a quick look this diagram: What is inside the runtime coroutine object?

template<typename CRET>
class val_promise_type{
public:
  // we (not the C++20 coroutine framework) require 'return_object' to have a 'value_t' typedef
  using value_t=typename CRET::value_t;

  CRET get_return_object(){return CRET(coroutine_handle<val_promise_type>::from_promise(*this));}
  void return_void(){}
  std::suspend_always yield_value(value_t val){val_=val;return {};}
  std::suspend_always initial_suspend(){return {};}
  std::suspend_always final_suspend(){return {};}
  value_t value(){return val_;}
private:
  value_t val_;
};

The promise_type is a class used by and required by the C++20 coroutine framework. This is what the promise_type functions mean:

  • get_return_object: called by the framework when the return object from the coroutine is created and returned to the caller
  • return_void: called when we do a co_return from the coroutine without specifying a return value. In our example, we are not doing this just yet!
  • yield_value: called by the framework when co_yield is called in the coroutine functions
  • initial_suspend: when the coroutine is first called, this method controls how the execution of the routine should proceed. In our promise_type we return an object suspend_always which means that the coroutine body will not execute on the first call
  • final_suspend: similar to initial_suspend except this method is called right before the coroutine complets execution

There is a lot more to say about what methods are called when and under what circumstance they are called, what the implications are and as well as how the compiler finds the promise_type. For right now I won’t get into this since it would muddle the generator example.

The complete code

We now have all the parts: the promise_type, the return object and the coroutine function. Let’s put all of it inside a single code snippet:

#include <iostream>
#include <vector>
#include <optional>
#include <queue>
#include <coroutine>
using namespace std;

// --- (begin) promise object
// promise used when we need to store a single value
template<typename CRET>
class val_promise_type{
public:
  // we require 'return_object' to have a 'value_t' typedef
  using value_t=typename CRET::value_t;

  // these functions are required by teh C++20 coroutine machinery
  CRET get_return_object(){return CRET(coroutine_handle<val_promise_type>::from_promise(*this));}
  void return_void(){}
  std::suspend_always yield_value(value_t val){val_=val;return {};}
  std::suspend_always initial_suspend(){return {};}
  std::suspend_always final_suspend(){return {};}
  
  // this is our own function (not required by the C++20 coroutine machinery
  value_t value(){return val_;}
private:
  value_t val_;
};
// --- (end) promise object

// -- (begin) couroutine return object for generator
template<typename T>
struct Gen{
  // some typedefs
  using value_t=T;
  using promise_type=val_promise_type<Gen>;

  // keep handle so we can get to promise
  coroutine_handle<promise_type>handle_;

  // ctor
  // (called from 'promise_type::get_return_object()')
  Gen(coroutine_handle<promise_type>handle):handle_(handle){}
  
  // dtor
  ~Gen(){handle_.destroy();}

  // get current value
  value_t value()const{
    return handle_.promise().value();
  }
  // get next item
  optional<value_t>next(){
    if(handle_.done())return std::nullopt;
    handle_.resume();
    return handle_.done()?optional<value_t>{}:value();
  }

};
// -- (end) coroutine return object

// --- (begin) generators

// simple integer sequence
Gen<int>genint(int n){
  for(int i=0;i<n;++i)co_yield i;
}
// we’ll add a few more generators here later
…

// --- (end) generators

// run generator and print result
void genprint(auto&gen){
  while(auto val=gen.next()){
    cout<<val.value()<<endl;
  }
}
// test program
int main(){
  cout<<"----- int sequence"<<endl;
  auto gen=genint(5);
  genprint(gen);
}

Using gcc (GCC) 10.2.0 you can compile and link the code with the following command:

g++  -std=c++20 -fcoroutines  -fno-exceptions -o test1 test1.cc

What just happened?

Before continuing, you might want to take a quick look this diagram: What is inside the runtime coroutine object?

Here are a few points that might help in understanding how the code works:

  • when the coroutine genint is called the framework creates (among other things) a promise_type object by calling the get_return_object on the promise object
  • in the get_return_object we construct a Gen<int> object which is then returned to the caller. Notice that we stuff a coroutine_handle object in the returned object
  • later we call next on the returned object which in turn calls resume on the coroutine_handle
  • the resume call tells the framework to resume execution of the coroutine where it last left off. In our case, if this is the first time it will enter the for-loop in the coroutine function
  • the coroutine function executes until it gets to the co_yield statement. This statement triggers the framework to execute the yield_value function in the promise object passing the value we gave to co_yield as parameter
  • in the yield_value function we stuff the value in the val_ attribute so it can later be retrieved
  • now we are back in the next function in the Gen object. There we check if the coroutine is done. If the coroutine is not done we retrieve the value we stuffed in the promise and return it to the caller
  • the caller (in our case genprint) prints the value to stdout

There are a number of rules for how the framework behaves depending on what we do and what we return from methods in the return object and methods in the promise object. But for now, we won’t delve into those rules until we have a better grasp on how the whole machinery works.

More generators

The bad thing is that we had to write a lot of filler code to get a simple generator to work. The good thing is that we can now write more generators without writing more filler code.

BFS graph traversal

We can now use the generator we wrote to traverse a graph in BFS order.

The graph is defined using an adjacency list representation – that is, if vertex 5 has edges pointing to vertex 7, 8 and 10 then g[5]={7, 8, 10}:

// simple BFS traversal of graph
using graph_t=vector<vector<int>>;
enum color_t{white=0,grey=1,black=2};

Gen<int>genbfs(graph_t const&g,int ustart){
  if(g.size()==0)co_return;
  vector<color_t>color(g.size(),white);
  queue<int>q;
  q.push(ustart);
  color[ustart]=grey;
  while(!q.empty()){
    int u=q.front();
    q.pop();
    co_yield(u);
    for(int v:g[u]){
      if(color[v]!=white)continue;
      q.push(v);
      color[v]=grey;
    }
    color[u]=black;
  }
}

An example of BFS traversal of a graph is the following code which can be copied and pasted into the main program:

cout<<"----- bfs graph"<<endl;
graph_t g{
    {1, 2},
    {4},
    {3},
    {4},
    {5},
    {}
  };
  auto bfsgen=genbfs(g,0);
  genprint(bfsgen);
  

Character reader + tokenizer

A more interesting example, where two coroutines cooperate, is an example where one coroutine reads data from a source and then feeds the data to a tokenizer:

// character reader + tokenizer
Gen<char>genreadchar(string const&str){
  for(char c:str)co_yield c;
}
Gen<string>gentok(Gen<char>&creader){
  string currtok;
  while(creader.next()){
    char c=creader.value();
    if(c==' '){
      if(currtok.length()){
        co_yield currtok;
        currtok.clear();
      }
    }else{
      currtok.push_back(c);
    }
  }
  if(currtok.length())co_yield currtok;
}

The code to run this from the main program is then:

  cout<<"----- tokenizer graph"<<endl;
  string inpstring="Hello again where is the cheese";
  auto creader=genreadchar(inpstring);
  auto tokreader=gentok(creader);
  genprint(tokreader);

Permutations

We can write a final generator that creates all permutations of a string. We’ll use the Knuth algorithm (Knuth, Volume 4A, page: 319) that generates permutations in lexicographical increasing order:

// generate permutations in lexicographical order
Gen<string>genperm(string s){
  int n=s.length();

  // sort in ascending order and return the result as the first permutation
  sort(begin(s),end(s));
  co_yield(s);

  if(n==1)co_return; 
  while(true){
    // find largest k such that s[k]<s[k+1]
    int k=n-2;
    for(;k>=0;--k){
      if(s[k]<s[k+1])break;
    }
    if(k<0)co_return;

    // find largest l>k such that s[k]<s[l]
    int l=n-1;
    for(;l>k;--l){
      if(s[k]<s[l])break;
    }
    swap(s[k],s[l]);
    reverse(begin(s)+k+1,end(s));

    // we now have the next permutation
    co_yield s;
  }
}

And the call from the main program is:

  // generate permutation of string in lexographical order
  string str="1234";
  auto permgen=genperm(str);
  genprint(permgen);

What is inside the runtime coroutine object?

I will start by drawing a picture that conceptually describes what is going on:

Converted document

An important interface to the C++20 coroutine machinery is the awaitable/awaiter interfaces. These interfaces are not part of this diagram. I excluded them since there are already enough moving parts in the diagram. I will describe them in the next blog entry.

An important observation is the triplet [bookkeeping, frame, promise]. The triplet is the bundle that represents a coroutine object. The bookkeeping part contains information about where to start when a coroutine is resumed. The frame contains temporary variables, variables in the coroutine and parameters to the coroutine. The promise is the part we have some control over. For example, the yield_value function is called when co_yield is invoked from the coroutine. In that function we have a hook to store the value so we later can retrieve it.

Using gcc10 in a Docker based environment

I wrote and tested the code within a Docker image to avoid polluting the machine with more versions of C++ compilers and C++ environments. However, if you already have a C++20 compliant compiler installed, you can safely skip this section. Below is a brief description of how my environment was configured.

Be aware that I'm not an expert in configuring Docker. However, my basic setup appears to work OK.

The setup contains two files:

  • Dockerfile
  • Makefile

The Dockerfile lays out the docker image and looks like this:

FROM gcc:latest

# setup init process
ENV TINI_VERSION v0.19.0
ADD https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini /tini
RUN chmod +x /tini
ENTRYPOINT ["/tini", "-g", "-p", "SIGTERM", "--"]

# install vim (does not come with gcc:lates
RUN apt-get update && apt-get install -y vim

# create/configure new user
ARG arg_user
ARG arg_group
ARG arg_uid
ARG arg_gid

RUN echo $arg_gid
RUN echo $arg_group

RUN groupadd -g $arg_gid $arg_group
RUN useradd -u $arg_uid -g $arg_gid -ms /bin/bash $arg_user
WORKDIR /home/$arg_user

COPY vimrc /home/$arg_user/.vimrc
RUN chown $arg_user .vimrc
RUN chgrp $arg_group .vimrc

COPY bashrc /home/$arg_user/.bashrc
RUN chown $arg_user .bashrc
RUN chgrp $arg_group .bashrc

# run container as $user_arg
USER $arg_user

# start bash
CMD /bin/bash

As you probably can infer from the Dockerfile that the user within the docker image will have the same user name and user id as the user building the image (not as the one running the container).

The Makefile is a quick hack that contains simple commands for building the Docker image as well as starting a container. Additionally it contains a few commands for removing the image, removing containers etc.:

IMAGE=hans-cpp20

all: usage

run:
        docker run -v `pwd`/src://home/$${USER}/src --name $(IMAGE) --rm  -it $(IMAGE) 

build:
        docker build --build-arg arg_user=$$(id -un) --build-arg arg_group=$$(id -gn) --build-arg arg_uid=$$(id -u) --build-arg arg_gid=$$(id -g)  -t $(IMAGE):latest .

rmimage:
        docker image ls | grep $(IMAGE) | awk '{print $$3}' | xargs -I{} docker image rm "{}"

lsimage:
        docker image ls | grep $(IMAGE) 

rmcont:
        docker container ls -a | grep $(IMAGE) | awk '{print $$1}' | xargs -I{} docker container rm "{}"

lscont: 
        docker container ls -a | grep $(IMAGE)

save:
        docker save -o $(IMAGE).tar hans-centos-test1

load:
        docker load -i $(IMAGE).tar

prune:
        docker image prune -f
        docker container prune -f

clean: rmcont rmimage
        rm -f junk.txt *.tar

usage:
        @echo "make all                 # print this message"
        @echo "make run                 # run docker image"
        @echo "make build               # build docker image"
        @echo "make rmimage             # remove image"
        @echo "make lsimage             # list image"
        @echo "make rmcont              # remove container(s)"
        @echo "make lscont              # list container(s)"
        @echo "make save                # save image to tar file"
        @echo "make load                # load image from a saved tar file"
        @echo "make prune               # prune stopped containers and dangling images"
        @echo "make clean               # remove container + image"

To build the Docker image run the following command:

make build

To run a container, execute the following command:

make run

To exit the container do:

^D

The setup maps the src directrory in the current directory to directory /home/<user>/src iside the Docker container. So anything you do inside or outside the container to this directory is reflected inside and outside Docker.

Next: handling exceptions and scheduling coroutines for execution in a thread pool

In the first followup blog entry I will show how exceptions are managed within the framework of C++20 coroutines. The examples in this blog did not deal with exceptions. In fact, on the compile line we explicitly specified that exceptions would not be supported.

In the blog entry following exception handling I will show how to move coroutine objects into a thread pool for scheduled execution. The code will use awaiter objects to control how a coroutine is suspended when the co_await operator is executed.