Arbitrary binding pt3 – type safe version

In this installment I’m going to explain how to take the concepts we used in the last article and apply them, with a lot of template magic, to create a bind expression system that works with the static type system rather than RTTI. This version requires much more familiarity with templates and a lot more repetition, as you’ll soon see. I am going to use constructs from boost that match the constructs we used before, leveraging other peoples’ work to implement a binder expression system. In the next article I will show how boost’s bind system implements these things in a custom manner as well as some additional things that library does and why. This implementation I’m about to show is not meant to replace your boost bind or std bind but may help you understand them.

Addendum to pt2 – composition

First though I would like to create an addendum to the last article. In the last article we did not make it possible to compose bind expressions. This is a very important omission that I will now remedy. The test:

BOOST_AUTO_TEST_CASE(composition) {

	Integer i1{42};
	Integer i2{5};

	FUNPTR fun = reinterpret_cast<FUNPTR>(add_two);

	binder bound(fun, a1, new binder(fun, a1, a2, END_ARGS), END_ARGS);

	BOOST_CHECK(static_cast<Integer*>(bound(&i1,&i2,END_ARGS))->get() == 89);
}

Now all that is needed is an additional check in the lookup function followed by a simple, added function to the binder class:

Object* select_arg(std::vector<Object*> const& args, Object * key) {
	if (Placeholder * ph = dynamic_cast<Placeholder*>(key)) {
		return args[ph->get()];
	}
	if (binder * bound = dynamic_cast<binder*>(key))
	{
		return bound->eval(args);
	}
	return key;
}

Object * binder::eval(std::vector<Object*> const& args_b) const {
	return invokers[args.size()](fun, args, args_b);
}

As you can see, no big deal but it greatly improves the utility of the system.

Also, I was mistaken when I said that the standard for return type deduction was to have a result_of typedef in your functor. The actual typename is result_type just as the STL uses in unary and binary functions.

Review

If you’ll remember from the last article, the bind system requires a few basic parts:

  1. Two lists of arguments, the first may contain placeholders or bind expressions.
  2. A translation function that will take an argument and either return it, evaluate it if it’s a bind, or return the item in the list if it’s a placeholder.
  3. A set of invokers that use the above to generate a final list of arguments and invoke the function
  4. An object to track the initial bind and provide a way to invoke it.

In the last article I used std::vector to implement argument lists; in this case I have chosen boost::fusion::vector to do the same. This structure guarantees constant time lookup of its elements, just as std::vector does, but does so through the type system rather than as an allocated block of memory. Each element in this structure has its static type attached to it and you access these values through templated functions, passing in the index to the item you want. I could have used boost::tuple, which would be more in line with what the now current standard has (std::tuple) but this construct is a cons list that requires linear lookup rather than constant. This translates not into runtime complexity but into compile time complexity, making your builds take longer. One frustrating part of working in C++, especially when you start working with templates and more so TMP, is that it takes a while to compile.

Lookup and placeholders

Previously we needed a placeholder object that inherited from the base object in order to allow RTTI to decide what to do during lookup. This time we simply need to wrap an integral into a type. We could use something like boost::mpl::int_ but it’s more complex than we need (not by much) and it’s nice to have our own name to do pattern matching against. Thus we just make a template and stick some instances into an anonymous namespace:

template < int I >
struct placeholder {};

namespace {
placeholder<0> a1;
placeholder<1> a2;
placeholder<2> a3;
}

When doing generic programming in C++, and most especially when doing TMP, you should be thinking more in the functional paradigm than most others. In functional programming there is the concept of “pattern matching” that lets you dictate various different things to do based on the type of the object passed into your function. In C++ this is done through template specialization, partial specialization and function overloading. What we want to do here is the same as before, if we get a placeholder as our index we dereference the argument list at that index. Anything else and we simply return that index (I’ll go over composition at the end). With that in mind, here’s the test code for our lookup function:

BOOST_AUTO_TEST_CASE(lookup_check)
{
	boost::fusion::vector<int, char, std::string> args(42, 'c', "Hello");

	BOOST_CHECK(lookup(args, 13) == 13);
	BOOST_CHECK(lookup(args, a3) == "Hello");
}

Next, our two overloads of lookup. The first covers the general case and the second specializes for placeholder types:

template < typename ARGS
         , typename IDX >
IDX lookup(ARGS&, IDX idx) { return idx; }

template < typename ARGS
         , int IDX >
typename boost::fusion::result_of::at
<
  ARGS
, boost::mpl::int_<IDX>
>::type lookup(ARGS & args, placeholder<IDX>)
{
	return boost::fusion::at_c<IDX>(args);
}

Note how the second version uses a template metafunction, or a type trait if you prefer (though it’s actually much more complicated than a simple trait usually is), to discover what the return type should be. Recall that now we’re using type safe construct that will leverage the template system rather than dynamic type information to do our thing. We can’t just specify any old return, we need to ask the fusion sequence (a vector in our case) what the type at the given index will be. Unfortunately there’s no at_c within the result_of namespace so we have to wrap the int into an MPL integral (per requirements of the function we’re calling).

If I were assuming that we could use C++11 constructs then this function could have been turned into something more simple, something like this:

template < typename ARGS
         , int IDX >
auto lookup(ARGS & args, placeholder<IDX>) -> decltype(boost::fusion::at_c<IDX>(args))
{
	return boost::fusion::at_c<IDX>(args);
}

Clearly this is simpler and easier to understand if you are familiar with the new constructs, but I have the ability to code this with C++03 in mind and intend to make it useful in that context. Furthermore, the boost version works in C++03 and I figure it will be easier to instruct how it’s doing its thing if we focus on learning constructs that work in the old C++. This shows though that we have a lot to look forward to when everyone is coding in C++11 rather than C++03 and earlier.

The Invoker

The invoker in this case doesn’t work a whole lot differently than in the previous version. We’re still going to use the count of arguments in the first list to decide which invoker to use. With all the templating stuff going on though, I found it easier to approach this as a distinct feature that needs unit testing. Here’s the test:

int fun(int i, int j)
{
	return i + j;
}

BOOST_AUTO_TEST_CASE(invoke_check)
{
	boost::fusion::vector<int, placeholder<1> > args1(42, a2);
	boost::fusion::vector<double, int> args2(10.3, 8);

	BOOST_CHECK(invoke<int>(fun, args1, args2) == 50);
}

The code that implements it uses a templated struct parameterized by an integral representing the size of the argument list. The invoke function then performs this calculation and uses a static call function within that template. Again, we’re going to limit ourselves to three arguments for the sake of sanity:

template < int I >
struct invoker;

template <>
struct invoker<0>
{
	template < typename R
	         , typename F
	         , typename L1
	         , typename L2 >
	static R call(F f, L1 & l1, L2 & l2) { return f(); }
};

template <>
struct invoker<1>
{
	template < typename R
	         , typename F
	         , typename L1
	         , typename L2 >
	static R call(F f, L1 & l1, L2 & l2)
	{
		return f(lookup(l2, boost::fusion::at_c<0>(l1)));
	}
};

template <>
struct invoker<2>
{
	template < typename R
	         , typename F
	         , typename L1
	         , typename L2 >
	static R call(F f, L1 & l1, L2 & l2)
	{
		return f( lookup(l2, boost::fusion::at_c<0>(l1))
	            , lookup(l2, boost::fusion::at_c<1>(l1)) );
	}
};

template <>
struct invoker<3>
{
	template < typename R
	         , typename F
	         , typename L1
	         , typename L2 >
	static R call(F f, L1 & l1, L2 & l2)
	{
		return f( lookup(l2, boost::fusion::at_c<0>(l1))
				, lookup(l2, boost::fusion::at_c<1>(l1))
				, lookup(l2, boost::fusion::at_c<2>(l1)) );
	}
};

template < typename R
         , typename F
         , typename L1
         , typename L2 >
R invoke(F f, L1 & l1, L2 & l2)
{
	return invoker
		   <
		     boost::fusion::result_of::size<L1>::type::value
		   >::template call<R>(f, l1, l2);
}

Although the syntax is a lot less intuitive, if you compare this version to the version in the last article you will see that they essentially do exactly the same thing except this one does it through types and pattern matching while the other used dynamic dispatch and a simple jump table.

Putting it together

While last time I talked about building the bound argument list before I talked about the invoked one, this time I have to reverse that trend. The type of the binder object is going to be governed by the things you are attempting to bind. First I’ll show how such a thing is constructed by hand and then activated, and how that activation is done, and then next I will show you how to use helper functions to discover the binder template parameters and build the thing. With the helper function you can then create bind expressions without having to declare their types unless you want a variable of that type (and in C++11 you can use auto–C++03 I recommend using boost::function even though it introduces indirection).

First things first, the test for creating and invoking a binder:

BOOST_AUTO_TEST_CASE(binder_t_check)
{
	boost::fusion::vector<int, placeholder<1> > args1(42, a2);
	binder_t<int, int(*)(int,int), boost::fusion::vector<int, placeholder<1> > > bound(fun, args1);

	BOOST_CHECK(bound(10.3, 8) == 50);
}

The binder itself is not difficult to create, just repetitive. First of all, we simply store the function (or functor) and the bound arguments as member variables:

template < typename R, typename F, typename L1 >
struct binder_t
{
	...

	binder_t(F f, L1 l1) : f(f), l1(l1) {}

private:
	F f;
	L1 l1;
};

The next part we need is the functor operator and we need as many of these as we allow arguments, from zero to some maximum. Again, I chose three here for my own sanity (boost allows 10):

template < typename R, typename F, typename L1 >
struct binder_t
{
	R operator() ()
	{
		boost::fusion::vector<> l2();
		return invoke<R>(f, l1, l2);
	}
	template < typename A1 >
	R operator()(A1 & a1)
	{
		boost::fusion::vector<A1> l2(a1);
		return invoke<R>(f, l1, l2);
	}
	template < typename A1
	         , typename A2 >
	R operator()(A1 a1, A2 a2)
	{
		boost::fusion::vector<A1,A2> l2(a1,a2);
		return invoke<R>(f, l1, l2);
	}
	template < typename A1
	         , typename A2
	         , typename A3 >
	R operator()(A1 a1, A2 a2, A3 a3)
	{
		boost::fusion::vector<A1,A2,A3> l2(a1,a2,a3);
		return invoke<R>(f, l1, l2);
	}
	...
};

In each case here we simply construct the argument list and use it to call invoke. This is similar to the variadic function iteration we did in the last article except that iteration is not necessary (the overload figures it out) and it’s all done by different functions. You could use the boost preprocessor library to reduce some of this repetition (at least from the programmer’s perspective) but this makes the code harder to understand and neither I nor boost used this technique. I leave it as an exercise for the reader (or you can see how I apply it in a possible future article about self returning function wrappers). This is far from the last time you’ll see repetition being required in this system. Although it makes calling code less repetitious and simpler, the templated bind system contains several header files filled with many repeating constructs (and in boost each one is repeated 10 times).

The next thing we need to do is build one of these binder things. We could leave it like it is, and it works OK, but what a hassle! A helper function is in order. We start with a unit test that checks functionality with functors (defined to have a result_type typedef), functions, and member functions. The boost system actually adds a few more options here but I didn’t feel like writing all that but to instead describe how they did it in the next article. Here is that test:

struct functor
{
	typedef int result_type;

	result_type operator()(int i, int j)
	{
		return i + j;
	}
};

struct something
{
	int add_to_42(int i) { return 42 + i; }
};

BOOST_AUTO_TEST_CASE(helper_check)
{
	BOOST_CHECK(bind(fun, 42, a2)(10.3, 8) == 50);
	BOOST_CHECK(bind(functor(), 42, a2)(10.3, 8) == 50);
	BOOST_CHECK(bind(&something::add_to_42, something(), a2)(10.3, 8) == 50);
}

Here two more test constructs were added, functor and something, and fun from above is reused. This isn’t a particularly great unit test since it fails to address a great many permutations of the interface, instead using the same values and number of arguments, but it’s good enough for this exercise (if this were used in a real product we should have better tests).

The functor version of this bind function is quite easy but we still need four definitions for our rather limited interface:

template < typename F >
binder_t< typename F::result_type
        , F
        , boost::fusion::vector<> >
bind(F f)
{
	typedef typename F::result_type R;
	typedef boost::fusion::vector<> L1;
	return binder_t<R, F, L1>(f, L1());
}

template < typename F, typename A1 >
binder_t< typename F::result_type
        , F
        , boost::fusion::vector<A1> >
bind(F f, A1 a1)
{
	typedef typename F::result_type R;
	typedef boost::fusion::vector<A1> L1;
	return binder_t<R, F, L1>(f, L1(a1));
}

template < typename F, typename A1, typename A2 >
binder_t< typename F::result_type
        , F
        , boost::fusion::vector<A1,A2> >
bind(F f,A1 a1, A2 a2)
{
	typedef typename F::result_type R;
	typedef boost::fusion::vector<A1,A2> L1;
	return binder_t<R, F, L1>(f, L1(a1,a2));
}

template < typename F, typename A1, typename A2, typename A3 >
binder_t< typename F::result_type
        , F
        , boost::fusion::vector<A1,A2,A3> >
bind(F f, A1 a1, A2 a2, A3 a3)
{
	typedef typename F::result_type R;
	typedef boost::fusion::vector<A1,A2,A3> L1;
	return binder_t<R, F, L1>(f, L1(a1,a2,a3));
}

Again, we’re basically implementing a statically variant variadic function through function overloads.

For the function and member function versions I’m just going to show the two argument variation (the one we tested). The interesting thing here is that we need to use almost twice as many template parameters because the function argument types may very well not match the argument types passed into our helper (especially if a placeholder is being used). The function version requires no extra objects:

template < typename R
         , typename P1, typename A1
         , typename P2, typename A2 >
binder_t< R
        , R(*)(P1,P2)
        , boost::fusion::vector<A1,A2> >
bind(R(*f)(P1,P2), A1 a1, A2 a2)
{
	typedef R(*F)(P1,P2);
	typedef boost::fusion::vector<A1,A2> L1;
	return binder_t<R,F,L1>(f,L1(a1,a2));
}

The member function version on the other hand does. Here we want to convert member function pointer call syntax from (var.*ptr)() to mem_fn(var). We do this with a simple wrapper class:

template < typename R, typename T >
struct mf0
{
	R operator()(T t)
	{
		return (t.*f)();
	}

	mf0(R(T::*f)()) : f(f) {}

private:
	R(T::*f)();
};
template < typename R, typename T, typename A1 >
struct mf1
{
	R operator()(T t, A1 a1) { return (t.*f)(a1); }

	mf1(R(T::*f)(A1)) : f(f) {}

private:
	R(T::*f)(A1);
};
template < typename R, typename T, typename A1, typename A2 >
struct mf2
{
	R operator()(T t, A1 a1, A2 a2) { return (t.*f)(a1,a2); }

	mf2(R(T::*f)(A1,A2)) : f(f) {}

private:
	R(T::*f)(A1,A2);
};

Once again, much repetition. The boost library also adds a “dm” type that creates this same call syntax for member variable pointers. I have chosen not to replicate that here. Once we have this construct we can build the many versions of “bind” that will create the binder object (again only the two argument version shown):

template < typename R
         , typename P1, typename A1
         , typename P2, typename A2 >
binder_t< R
        , mf1<R,P1,P2>
        , boost::fusion::vector<A1,A2> >
bind(R(P1::*f)(P2),A1 a1, A2 a2)
{
	typedef mf1<R,P1,P2> F;
	typedef boost::fusion::vector<A1,A2> L1;
	return binder_t<R, F, L1>(F(f), L1(a1,a2));
}

Note that the “mf” we use is mf1, not mf2. That is because that number represents how many arguments the member function takes. We actually call that object with one extra argument, the first argument: this.

Composition

We again want to be able to call this bind expression such that we can compose many functions into one, each taking certain parameters from the invokation parameters and using other, bound values along with them. We do so in pretty much exactly the same way as we did before. We first pattern match on the binder template type in place of IDX or placeholder and call an eval function within that binder with the arguments being passed to lookup:

template < typename R, typename F, typename L1 >
// we need to forward declare binder_t if it appears later in the file, and it does in my case:
template < typename R, typename F, typename L > struct binder_t;

template < typename ARGS
         , typename R
         , typename F
         , typename L1 >
R lookup(ARGS & args, binder_t<R,F,L1> & binder)
{
	return binder.eval(args);
}

struct binder_t
{
	...

	template < typename L2 >
	R eval(L2 l2)
	{
		return invoke<R>(f, l1, l2);
	}
};

Conclusion

Now you should have a starting point to be able to reference the bind expressions in both boost and the C++11 standard library. I did it a very different way, but the same constructs are there all the same. You should now have some idea how monotonous developing template based, variadic functions can be without the ability built into the language. It used to be that any time you wanted to write a variadic interface that was templated you had to do it like this. Now, in C++11, we have variadic templates. In a future installment I may show how you would do all of this with variadic templates. Even though this language feature now exists however, it may not be prudent to use it in all cases where you’d do something like this. Variadic templates, as defined in C++11, require recursive processing. This means you can not replicate the constant time lookup of things like boost::fusion::vector. You could, however, implement much of the helper objects with variadic templates and build the vector argument list with them. Any time you do this though your compile times are going to be slower for the most part…so it’s a cost/benefit analysis that you’ll need to perform.

One very important aspect of this version you should pay special attention to is that at no point is there any virtual function being called, no pointer indirection (unless the F parameter is one) and absolutely no RTTI such as dynamic_cast. This means that although there’s a good 200 lines of code or so here, the compiler can and often will get rid of all of that and just call the function directly! All of the above can be inlined. If you use functor objects instead of function pointers I’ve actually seen compilers generate VERY tight, small, inlined code where no function calls exist. Thus C++ can be a very good language for environments where speed and/or memory footprint are tight, such as embedded platforms (contrary to what many C fans will say :P). Although my last article focused on how binders can be made easily with dynamic dispatch and thus could be implemented in languages like C or Objective-C, and “missed the point of C++” as one silly comenter mentioned, this article extends the exact same concepts but leverages the full power of the C++ typing and template system.

Until next time, where I will discuss how the boost authors implemented the above and the additions they made, have fun.

About these ads

One Response to “Arbitrary binding pt3 – type safe version”

  1. [...] Sometimes, just because I can. « Arbitrary binding pt3 – type safe version [...]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 26 other followers

%d bloggers like this: