Wednesday, April 10, 2013

Elegance in C++ parameter passing



You have disengaged from the planetary brain and no longer serve a useful purpose

-- The Outer Limits, last episode of 3rd season


There is no doubt that C++ Meta programming is here to stay. Ignoring it as being too complex or too convoluted will over time render anyone's C++ programming skills outdated and obsolete

With the C++11 standard at the finger tips many Meta programming functions are readily available. I believe that with a set of standard Meta functions incorporated into the C++11 standard together with the drift of C++ applications towards incorporation of Meta programming techniques, a host of new clever and possibly insidious idioms will appear.

In this post I'll show a Meta programming hack I picked up on the WEB that is simple but yet elegant. I did not invent or come up with this idiom and unfortunately I cannot remember where I got it from so regrettably I can't give credit to anyone.

A problem that can easily be solved using Meta programming is to transparently modify the parameter type when calling a function. Why would anyone want to do that you may ask? The reason is simple: when passing integral types we want to pass them by value whereas when passing more complex types we want to pass them by const reference – unless of course, they can be modified by the function. This will allow (some) compilers to pass parameters through registers as opposed to on the stack. Passing parameters through registers is significantly faster and Meta programming techniques will help us out to do just that.

By using a few Meta functions from the C++ library it is surprisingly easy to do manipulate types so that we automatically pass them in the right way. The following example illustrates how it can be done:


Here it goes!

#include <iostream>
#include <type_traits>
using namespace std;

// type converter for passing parameters
template<typename T>
struct param{
  typedef typename conditional<is_integral<T>::value,
T,                                                                                             // (1)
typename add_lvalue_reference<const T>::type>::type           // (2)
type;                                                                                        // (3)
};

// a sample function for testing the type converter                                   // (4)
template<typename T>
void foo(typename param<T>::type t){
  cerr<<boolalpha<<"parameter type is reference: "<<is_reference<decltype(t)>::value<<endl;
}

int main(){                                                                                               // (5)
  int  i=5;
  string s="Hello";

  foo<int>(i);
  foo<string>(s);
}

The output is:

parameter type is reference: false
parameter type is reference: true

The param meta function is written in a standard way where the result is retrieved through the type in the Meta function. Point (1) is selected when the type is an integral type whereas point (2) is otherwise selected. At point (2) a new type is constructed by creating a const l-value reference from the template parameter.

That's really all there is to it. A small sample program (5) using the test function (4) shows that the type generator works correctly.

Now, in terms of performance does it really make a difference? To test if there really is a performance increase by passing an integral type by value, I will modify the foo function slightly to avoid the gcc optimizer in-lining foo by prepending it with a gcc specific attribute. The test harness is simple and somewhat artificial but I want to get some indication of what type of performance enhancement I can expect.

First I'll modify the foo() function to not be inlined by the compiler. I'll also have to call a function in a different translation unit – bar() – or gcc will still inline calls to foo():

template<typename T>
__attribute__ ((noinline)) void foo(typename param<T>::type t)){
  bar();
}

The function bar() is defined in a different translation unit:

void bar(){}


The test program is a simple main() that runs two tests. The first one calls a function foo_optimized() which makes use of the param type conversion meta function whereas the second test calls foo_normal() which is defined as:

template<typename T>
__attribute__ ((noinline)) void foo_normal(T const&t){
  bar();
}

Here is the test program using C++11 std::chrono library to measure time:

int main(){
  using Clock=chrono::high_resolution_clock;
  using TP=Clock::time_point;
  using DURATION=Clock::duration;
  using PARAM=unsigned long long;

  Clock clock;
  const size_t niter=1000000000;

  TP tpStart1{clock.now()};
  for(PARAM i=0;i<niter;++i)foo_normal<PARAM>(i);
  auto duration1=chrono::duration_cast<DURATION>(clock.now()-tpStart1);
  auto t1=chrono::duration_cast<chrono::duration<double,std::milli>>(duration1).count();

  TP tpStart2{clock.now()};
  for(PARAM i=0;i<niter;++i)foo_optimized<PARAM>(i);
  auto duration2=chrono::duration_cast<DURATION>(clock.now()-tpStart2);
  auto t2=chrono::duration_cast<chrono::duration<double,std::milli>>(duration2).count();

  cerr<<"milli sec time passing integer by reference: "<<t1<<endl;
  cerr<<"milli sec time passing integer by value: "<<t2<<endl;
  cerr<<"%time decrease: "<<100*(t1-t2)/t1<<"%"<<endl;
}

As you can see, most of the code is just time measurements. Normally this would be managed by some StopWatch class, but here I prefer to make it explicit.

The printout when executed on a virtual Linux box and compiled with –O3 flag is:

milli sec time passing integer by reference: 4984.19
milli sec time passing integer by value: 3388.31
%time decrease: 32.0189%

Now, a 32% decrease in execution time is not too bad I would say!

No comments:

Post a Comment