Arbitrary binding pt2 – rolling your own.

This article will discuss how arbitrary binding expressions work. Since the template syntax and meta-programming system can be a sticking point for some students, a runtime version will be described first to discuss the fundamentals. This also paves the way for readers interested in implementing this in some language that lacks generics or templates. Following this version will be a simplified template version that will be limited to 3 arguments. Upon conclusion you should be better prepared to understand the source code for boost::bind or std::bind.

Basic Object System

Since the runtime version of arbitrary binding requires a protocol for introspection a bit beyond what the C++ language does for its classes, a basic object system needs to be used and everything that can be an argument in a binder must inherit from a single base class. What follows is a very elementary set of such objects that will be used for further discussion. This can be done in any language–I’ve implemented this in C in the past–but C++ is used here because it’s just plain easier.


// This is basically all that is needed.  Every object in the system
// must inherit from this base class to allow the parameter system to work
// and then RTTI will take care of the rest because it's a polymorphic class.
struct Object {
  virtual ~Object() {}
};

// A couple obvious example classes...
struct Integer : Object {
	Integer(int i) : value(i) {}
	Integer() : value(0) {}

	Integer& operator = (int i) { value = i; return *this; }
	int get() const { return value; }

private:
	int value;
};

struct Double : Object {
	Double(double d) : value(d) {}
	Double() : value(0.) {}

	Double& operator = (double d) { value = d; return *this; }
	double get() const { return value; }

private:
	double value;
};

This basically provides a dynamic type system for C++. If everything inherits from Object then you can pass around anything as this base type. It’s like void pointers with a touch more predictability. This will be important for our non-templated bind expression system because in order to run arbitrary functions we need to be able to pass arbitrary types of information.

Placeholders and argument lists

The bind system is actually a fairly simple set of constructs and operations that correctly dispatch a call to the requested function with the correct arguments. What we will have is two lists of arguments, a function to call, a lookup operation that will translate arguments into values based on the argument lists and the parameters passed in, and a set of invokers that can be used to translate the two argument lists into a function call and invoke that function. The first argument list is what is provided when you bind to a function and it can contain placeholders. These placeholders point to indices within the second set of arguments, which are the arguments the bind type was invoked with. It is a very simple algorithm, but wrapping your head around it can be difficult without examples and explanation.

The first part of the system then is the argument list. In our case a vector of pointers to Object is quite sufficient. To make the lookup operation for this type then is not all that difficult. We first come up with a test:

BOOST_AUTO_TEST_CASE(lookup) {
	std::vector<Object*> args{ new Integer(0), new Integer(1), new Integer(2) };

	BOOST_CHECK(select_arg(args, a2) == args[1]); // pointer to the second element

	Integer * i = new Integer(42);
	BOOST_CHECK(select_arg(args, i) == i); // pointer to the parameter we passed in
}

In this case what we are testing is that when we pass in a placeholder–the a2 variable–we want the item from the argument list but when we pass in something else, anything else, we want that very same thing back instead. A placeholder then needs to simply be an index wrapped up in a very specific type:

struct Placeholder : Object {
	Placeholder(size_t i) : index(i) {}

	size_t get() const { return index; }

private:
	size_t index;
};

We also need access to particular placeholders to make our lives easier and to provide a well understood language for expressing, “I want the nth argument to go here in the invocation of function ‘f’.” In this runtime system we simply declare a few in the header and then later initialize them as constant globals:

// binder.h:
extern Placeholder *const a1;
extern Placeholder *const a2;
extern Placeholder *const a3;

// binder.cpp:
Placeholder *const a1 = new Placeholder(0);
Placeholder *const a2 = new Placeholder(1);
Placeholder *const a3 = new Placeholder(2);

Now the lookup function is rather rudimentary:

Object* select_arg(std::vector<Object*> const& args, Object * key) {
	if (Placeholder * ph = dynamic_cast<Placeholder*>(key)) {
		return args[ph->get()];
	}
	return key;
}

This will later serve as the central point to the bind invocation algorithm set.

Building argument lists

In this system we want to be able to bind to a function of arbitrary parameter count that can then be invoked with as many arguments as the call site provides. This requires variadic functionality and in our runtime system that means C-style variadic functions, the kind with ‘…’ as an argument. This will be done in at least two places: binder creation and binder invocation. Thus it is prudent to make this a separate operation that can be reused. First, the test:

std::vector<Object*> call_build_args(Object * obj, ...) {
	va_list ap{};
	va_start(ap, obj);
	std::vector<Object*> args = build_args(obj, &ap);
	va_end(ap);
	return args;
}

BOOST_AUTO_TEST_CASE(build_args_test) {
	std::vector<Object*> args = call_build_args(new Integer(2), new Integer(3), new Integer(5), END_ARGS);

	BOOST_CHECK(args.size() == 3);
	BOOST_CHECK(static_cast<Integer*>(args[0])->get() == 2);
	BOOST_CHECK(static_cast<Integer*>(args[1])->get() == 3);
	BOOST_CHECK(static_cast<Integer*>(args[2])->get() == 5);

	for(Object * obj : args) {
		delete obj;
	}

	args = call_build_args(END_ARGS);
	BOOST_CHECK(args.size() == 0);
}

The “call_build_args” function simply makes it easier to call our utility function that will actually require a va_list as its second argument. There’s no way to just build a va_list outside the parameters of a variadic function. You’ll note above the use of another constant called “END_ARGS”. This can be implemented with a special type if desired, but I simply implemented it by making a placeholder I’d never, ever use:

// binder.h:
extern Placeholder *const END_ARGS;

//binder.cpp:
Placeholder *const END_ARGS = new Placeholder(-1);

The “build_args” function now is again quite simple:

std::vector<Object*> build_args(Object * obj, va_list * ap) {
	std::vector<Object*> args{};

	while (obj != END_ARGS) {
		args.push_back(obj);
		obj = va_arg(*ap, Object*);
	}
	return args;
}

All we are doing is iterating the va_list we receive until we reach the end, which we must have a way of finding out and has been shown with the END_ARGS value, and shove the arguments into a vector of Object pointers. Now we have a way to easily construct our argument lists from any variadic function.

Invokers

The next major piece of the puzzle is an operation that can take a function, two lists of parameters, convert the two lists into a list of actual arguments and call the target function with them. This operation is actually going to be done here with the use of a dispatch table based on the number of parameters the target function takes. We know how many the target function takes because it’s the same amount that was provided to the binder creation.

To begin with we need a way to hand an arbitrary function type around to the invokers. This requires some type casting and by C++ standard it is type casting of the worst kind: reinterpret_cast. Used correctly though this construct’s use of reinterpret_cast won’t pose a major problem.

I did not write a unit test for this aspect of the system because it’s quite hidden within and is rather trivial. So here it is in full:

// binder.h
typedef void (*FUNPTR)();

// binder.cpp
typedef Object* (*INVOKER_T)(FUNPTR, std::vector<Object*> const&, std::vector<Object*> const&);

namespace {
Object* invoker0(FUNPTR fun, std::vector<Object*> const&, std::vector<Object*> const&) {
	return reinterpret_cast<Object*(*)()>(fun)();
}
Object* invoker1(FUNPTR fun, std::vector<Object*> const& a, std::vector<Object*> const& b) {
	return reinterpret_cast<Object*(*)(Object*)>(fun)(select_arg(b, a[0]));
}
Object* invoker2(FUNPTR fun, std::vector<Object*> const& a, std::vector<Object*> const& b) {
	return reinterpret_cast<Object*(*)(Object*,Object*)>(fun)(select_arg(b, a[0]), select_arg(b, a[1]));
}
Object* invoker3(FUNPTR fun, std::vector<Object*> const& a, std::vector<Object*> const& b) {
	return reinterpret_cast<Object*(*)(Object*,Object*,Object*)>(fun)(select_arg(b, a[0]), select_arg(b, a[1]), select_arg(b, a[2]));
}
std::vector<INVOKER_T> invokers{invoker0, invoker1, invoker2, invoker3 };
}

Our system is thus limited to functions that can take 3 or less parameters, though we are not bound in any way by how many arguments we can provide when we invoke it…which you will see in the next section.

Note how each invoker that invokes a function that has arguments is asking to select the argument from the second list based on the element in the first at the relative position in the argument list. This is the important step that does all the work. If, for example, the list of arguments we created the bind with were “0, 2, 3″ then we want f called with those exact elements. However, if we say instead something like “a2, 2, 3″, then what we’re really creating is a binder that expects at least two arguments and that second argument will form the FIRST in the actual function call.

The last piece

The last piece of the puzzle is the actual binder object that can be created with a bind expression and then later invoked. The test first:

Object* add_two(Object * obj1, Object * obj2) {
	Integer *int1 = dynamic_cast<Integer*>(obj1);
	Integer *int2 = dynamic_cast<Integer*>(obj2);

	return new Integer(int1->get() + int2->get());
}

BOOST_AUTO_TEST_CASE(final_binder_check) {
	Integer i1{42};
	Integer i2{5};

	binder b1{reinterpret_cast<FUNPTR>(add_two), &i1, &i2, END_ARGS};

	Integer * result = static_cast<Integer*>(b1(END_ARGS));
	BOOST_CHECK(result->get() == 47);
	delete result;

	result = static_cast<Integer*>(b1(&i2, END_ARGS));
	BOOST_CHECK(result->get() == 47);
	delete result;

	binder b2(reinterpret_cast<FUNPTR>(add_two), &i1, a2, END_ARGS);
	result = static_cast<Integer*>(b2(0, &i1, END_ARGS));
	BOOST_CHECK(result->get() == 84);
	delete result;
}

Here we make sure that various combinations of placeholder vs. true argument are applied correctly. For example, in the last test for the b2 binder we are making sure that i1 is taken from the creation site, i1 again from the call site, and used together to call “add_two” to get the appropriate result of i1 + i1.

This then gives us the interface we must create:

struct binder : Object {
	binder(FUNPTR, Object * obj, ...);

	Object* operator()(Object * obj, ...) const;

private:
	FUNPTR fun;
	std::vector<Object*> args;
};

The first step then is to create the first set of arguments from the creation site and store them for later:

binder::binder(FUNPTR f, Object * obj, ...) : fun(f), args() {
	va_list ap{};
	va_start(ap, obj);
	args = build_args(obj, &ap);
	va_end(ap);
}

We then simply use our dispatch table to find the invoker that calls functions with the correct set of arguments (based on what was supplied to us at the creation site) and call it:

Object * binder::operator() (Object * obj, ...) const {
	va_list ap{};
	va_start(ap, obj);
	std::vector<Object*> args_b = build_args(obj, &ap);
	Object * rval = invokers[args.size()](fun, args, args_b);
	va_end(ap);
	return rval;
}

And that is that. We are done!

Conclusion

So as you can see, the algorithm for arbitrary binders is actually quite basic. Take two lists of arguments, one given at the bind creation site and the second at the point of invokation; use the first list to decide which elements in the second to use and in what order, and then invoke the bound function with a thus combined set of arguments. Any time you want to create a bind type of expression this is how it is done. In future installment(s) I will show how to take this exact same algorithm and apply it with templates to a type-safe bind expression that is much faster, and also to template meta-programming to recreate the boost MPL lambda construct. You will find that these concepts apply directly in all three cases though the application is of course quite different.

I have left many bugs in this system. The most obvious of course is that this thing doesn’t take any care about memory leaks at all. If I ran my unit tests in valgrind I’d be yelled at a lot. You will need to account for this somehow but the directions you can go here are so numerous and which you use completely beside the point of this exercise that I just left it out both for my own sanity and to keep things here quite simple. When I wrote arbitrary binding expressions in C I used an Objective-C like system of autorelease pools in all objects so the memory issues are fairly easy to deal with. There are other ways however and assuming they’re consistent you should have no problem at all applying this concept to your own object system to get good results.

About these ads

6 Responses to “Arbitrary binding pt2 – rolling your own.”

  1. Base class Object = you have failed to understand the purpose of C++. You lose! You get nothing! Good day, sir!

    • Pardon me, but what’s it like being that stupid? No offense intended, I’m genuinely curious.

      • Well paid, respected expert in my field; so pretty good, as such things go. And I haven’t used an Object base class in C++ since … oh, 1995 or so.

    • For anyone that might be wondering what’s going on here…

      Having an “Object” base class in a class library for C++ is pretty silly. I made it pretty darn clear in the beginning of my article why I was doing it and it was to do with instructional purposes (to get rid of templates for the time being) and because recreating an object system in C wasn’t something I wanted to do just for an article (I did start that way but got tired of it).

      I figured most people would be smart enough to distinguish between an illustrative example and a recommendation, especially if I explained in the beginning. Although there seems to be at least one exception to that, I still think it a reasonable assumption. If you’re seriously bothered by the fact that I did this in C++ though and didn’t use C or Objective-C or something well…that’s just going to have to remain your problem.

      This article is not meant for people who can just read boost source code right off the bat and see how it works, but for those who can be helped by smaller steps on the way. Perhaps my approach is flawed, but to complain about the Object base just because this is C++ is…to be quite honest completely idiotic.

      This ass-clown has also now been banned from commenting on this blog. I’m quite open to legitimate discussion and instruction from peers but I’ve had it up to a foot over my head with all this damn stupid. Keep it civil, or at least intelligent…or GTFO.

  2. [...] Sometimes, just because I can. « Arbitrary binding pt2 – rolling your own. [...]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 26 other followers

%d bloggers like this: