Tutorial on tag dispatching

Posted in Uncategorized on December 15, 2014 by Crazy Eddie

Update: Stephan T. Lavavej (aka STL) provided some feedback: “Overloading on true_type/false_type is still tag dispatching – indeed, it is the simplest, most common form.” I thought I had good reasons for distinguishing situations in which the type is pre-tagged by a trait or typedef as being “tag dispatching” from times when you need to calculate the tag or when the tag isn’t really documenting a concept but just an on/off kind of thing. I now think those reasons were flawed and any time you tag a type and then use that tag as an overload parameter is “tag dispatching”. There is a difference between these methods obviously, but not anything that distinguishes a separate technique. I’ve left the language in this article alone though.

There’s been some confused notions passed around recently that lead me to think there needs to be more information about probably one of the simplest, most powerful metaprogramming techniques that exist in C++: tag dispatching. The reason tag dispatching is powerful is that it leverages the language and the compiler to do work for you so that you don’t have to. It’s a technique whereby you use overload resolution rules to decide between otherwise ambiguous functions.

Concepts and tags

The first part to understand with regard to tag dispatching is the idea of “concepts”. Concepts are not yet a formal part of the C++ language, but they’re integral to discussing many of the components of the standard library–especially those parts that came from the STL.

There’s a good description of concepts over in the boost documentation. There they define a concept as:

A concept is a set of requirements consisting of valid expressions, associated types, invariants, and complexity guarantees. A type that satisfies the requirements is said to model the concept. A concept can extend the requirements of another concept, which is called refinement.

As that link discusses there are different aspects to a concept. There’s the legal expressions available for use on the types that implement the concept. This is the kind of thing that can be analyzed with introspection techniques–in C++ these often center around SFINAE. In this part of the concept we might ask whether, given value ‘x’ of type ‘T’ that implements a concept, whether ‘++x’ is a legal expression.

Other parts of a concept though can’t really be analyzed with C++ metaprogramming. The invariants for example can’t really be asserted at the static level. They can and should be asserted during debug builds, but most of the time that’s the only place where one could get the necessary information to check invariants. Complexity guarantees also can’t be statically analyzed and will be mostly impractical to assert in the product–you’re basically stuck testing if you even have that.

In C++ we address all this using tagging. We tag a type with something that can be accessed during compile time that tells us what concept a type is implementing. This unfortunately means we depend on the author of the type to tag it correctly, but there’s quite literally nothing we can do about that. We might be charged with not writing robust code, but we’re no less beholden to the developer than that we assume someone won’t subclass an interface and disobey the contract of that interface. There are languages such as Rust that try to solve this problem, but C++ does not so there’s really no solution to this problem here.

In C++ we tag types with other types. These tag types are just empty classes that will eventually be compiled out of the program. They may inherit other tags, which will be explained in the next section, but they’ll not be polymorphic so they’ll not have any associated RTTI information. We use them only as values as well and in such ways that they’re never actually used in the program–this allows the compiler to remove them from the final object code.

InputIterator is an example of a concept. The standard library provides the tag for this concept by defining input_iterator_tag. We gain access to this tag through iterator_traits, which contains an internal typedef called iterator_category. The standard also defines a default version of the iterator_traits template that tries to access the iterator_category within its parameter. This means that we can tag our custom iterators either by specializing this template or by just tagging the iterator internally like so:

struct custom_iterator
{
  using iterator_category = std::input_iterator_tag;
  // much more stuff...
};

The reason an external traits template is required is because pointers are also iterators, and we can’t tag them in this way. Thus the standard defines a couple partial specializations for pointers. Here’s the non-const version:

template < typename T >
struct iterator_traits<T*>
{
  using iterator_category = std::random_access_iterator_tag;
  // ... other useful stuff...
};

Even when this is not necessary though it can be beneficial to do provide a metafunction or traits template to retrieve the tag.

Concept refinement

Concepts can refine other concepts similar to how classes can inherit other classes. When a concept refines another concept it means that types implementing the refining concept also implement the refined concept. So for example concept ‘A’ might be refined by concept ‘B’. Say ‘A’ has the requirement that ‘x.foo()’ be a legal expression that results in a type convertible to bool. Concept ‘B’ can require additional expressions and can put further requirements on the result of ‘x.foo()’, saying it requires type bool rather than something convertible to bool for example–since bool is convertible to bool this is fine.

ForwardIterator is an example of a concept that refines another. It refines InputIterator and the main thing that it adds is the ability to iterate over the same range multiple times. This part of the refinement is an essential difference between an InputIterator and a ForwardIterator but it changes nothing about the static interface of the types that implement either. Using introspection alone you’d not be able to decide if an iterator has this ability or not–only tags solve this. Another thing it does is override the ‘*i++’ expression to return a reference. This obeys InputIterator’s version because a reference to ‘value_type’ is convertible to ‘value_type’. This is actually a rather unfortunate and short-sighted part of the standard library because it means filter and transform iterators don’t fit very well into the existing tag structure.

In C++ we represent the refinement relationship using inheritance in our tags. Thus in the standard library the tag for ForwardIterator is the aptly named forward_iterator_tag and it inherits from input_iterator_tag. This allows us to write tag-dispatched functions that specialize at the refinement level we want, catching all below. The advance function makes a good example to explain this because if ‘n’ is some value greater than 1 we want it to just jump directly to that new place given a random access iterator (an iterator for which ‘i += n’ is a valid expression).

Tag dispatching

A tag dispatched function is a function that catches types at a generic level, retrieves a tag labeling the type as implementing the different target concepts, and then uses that tag in an argument for implementation functions. Here’s a version of advance to explain:

// This version actually works more like 'next' in the standard library.
namespace detail_ {

template < typename ForwardIterator >
ForwardIterator advance(ForwardIterator it, int n, std::forward_iterator_tag)
{
    assert(n > -1 && "Can only move a forward iterator forward!");
    while (n--) ++it;
    return it;
}

template < typename RandomAccessIterator >
RandomAccessIterator advance(RandomAccessIterator it, int n, std::random_access_iterator_tag)
{
    it += n;
    return it;
}

}

template < typename Iterator >
Iterator advance(Iterator it, int n)
{
    using tag = typename std::iterator_traits<Iterator>::iterator_category;
    // We're doing this silly and not allowing input iterators, we should make a readable message
    static_assert(std::is_base_of<std::input_iterator_tag, tag>::value, "Can only advance ForwardIterators and their refinements.");

    return detail_::advance(it, n, tag{});
}

It’s actually debatable whether we want to have that static_assert in the entry function–I purposfully made the implementation terrible to raise this concern. We might want to allow people to provide an overload for that base iterator concept. In general though you don’t want to do this–you’re writing your dispatch functions to be as complete as they’ll ever be and you don’t want to try being too generic. YAGNI. Without the static_assert you’ll get what is known far and wide as, “template vomit,” which is where C++ gets its reputation for very long error messages. It probably wouldn’t be all that deep.

The way this code works depends on tags that have direct correlation to the concepts being implemented. RandomAccessIterator is a refinement of BidirectionalIterator, which is in turn a refinement of ForwardIterator. Furthermore we may soon have ContiguousIterator, which will be a refinement of RandomAccessIterator. The tags directly correlate to these concepts and their refinement relations through inheritance:

struct input_iterator_tag {};
struct forward_iterator_tag : input_iterator_tag {};
struct bidirectional_iterator_tag : forward_iterator_tag {};
struct random_access_iterator_tag : bidirectional_iterator_tag {};

// in C++ 17 maybe
struct contiguous_iterator_tag : random_access_iterator_tag {};

// Orthoganal concept that doesn't refine any of the others and isn't refined by them
struct output_iterator_tag {};

C++ has a rule regarding function overloading that requires the compiler to chose the closest match. This means that when we pass a value of type bidirectional_iterator_tag, which due to inheritance is convertible to forward_iterator_tag, the compiler will pick the ForwardIterator version of our overload set. However, if instead random_access_iterator_tag or any subclass is passed in the RandomAccessIterator version will be chosen.

We accept the tag by value because we don’t care about slicing and to give the compiler a better opportunity to optimize that parameter out of the program.

It’s important that these relationships are consistent, otherwise people that use them may run into problems. The purpose of these tags is to filter input types to our functions down to the interface that function depends on. We break that if our tags don’t correlate directly with the concepts we’re claiming to implement.

Getting it wrong

In his technique comparison article, Mr. Fultz compares different methods to resolve generic overloads. Unfortunately the entire article is colored by his desired outcome, the last method. Especially problematic is the tag dispatching version because it confounds tags, which should be representing concepts, with a particular overload ordering. Here is his set of tags:

struct base_tag {};
struct sequence_tag : base_tag {};
struct range_tag : sequence_tag {};

These tags are meant to represent a static if logic rather than concepts but yet they still correlate with concepts. The range_tag is meant to represent dynamic sequences like std::vector, sequence_tag is for static sequences like std::tuple, and base_tag is the default, which assumes “streamable”. The goal, explained by the author elsewhere, is to implement logic like the following:

template < typename T >
void print(T const& t)
{
  if (is_range<T>) range_print(t);
  else if (is_sequence<T>) sequence_print(t);
  else stream_print(t);
}

This rather misses the point of tag dispatching though, which is to leverage the compiler to decide this stuff for you. The problems here can be seen if you consider something he claims about his tag deciding function: “Now even though this logic is little complicated it only needs to be written once.” One would assume he means we could use this logic for dispatching other functions, but if I used it for that I might write something like so:

namespace detail_ {

template < typename T >
void foo(T const& t, base_tag);

template < typename T >
void foo(T const& t, sequence_tag);

}

template < typename T >
void foo(T const& t)
{
  using tag = typename get_print_tag<T>::type;
  detail_::foo(t, tag{});
}

It doesn’t really matter how the detail parts are written, if I pass in a std::vector to foo then bad things are going to happen. I should be able to depend on it behaving correctly but I can’t. Granted, it’s going to error out at compile time given these particular concepts, but it’s not exactly going to be pretty. To avoid the template vomit phenomenon I’d have to write something like this:

template < typename T >
void foo(T const& t, sequence_tag)
{
    static_assert(is_sequence<T>::value, "Passing off a non-sequence as a sequence you silly sod!");
}

This shouldn’t be necessary.

Fixing it

So lets see if we can fix the issues. The first thing we should do is make sure the tags correspond with concepts correctly. We have two orthoganal concepts and one that’s just totally unrelated to the other two:

  1. Sequence: a static sequence
  2. Range: a dynamic sequence
  3. Streamable: something we can use the stream operator on

The first two are mutually incompatible. There’s no such thing as a static, dynamic sequence–a container is one or the other but not both. The third is entirely unrelated to either of the other. A type that is either Sequence or Range could either be or not be Streamable. Our corrected tags are then:

struct streamable_tag {};
struct range_tag {};
struct sequence_tag {};

At this point we can see that the problem doesn’t really decompose into something that tag dispatching can get a purchase on. There’s nothing at all for the compiler to decide. Underneath these different kinds of concept there is certainly room for tag dispatching. Ranges for example are one of a variety of different kinds within a hierarchy of concepts. Sequences also come in different kinds like ForwardSequence or BidirectionalSequence. All of these types will have different tags associated with them that can be dispatched upon, but they’ll not be attached to the same traits metafunction or typedefs as in the orthoganal hierarchy.

We can still do something very much like tag dispatching though by creating the static if/else logic that Mr. Fultz was after:

// Because technically we're just defaulting here, so we shouldn't tag the type as streamable
// if we don't know it is.
struct try_streaming_tag {};

template < typename T >
struct select_print_tag
  : mpl::eval_if
    <
      is_range<T>
    , mpl::identity<range_tag>
    , mpl::if_
      <
        is_sequence<T>
      , sequence_tag
      , try_streaming_tag
      >
    >
{};

Note that eval_if is used for the outer if because we don’t want the inner to be evaluated if the first check is true. As far as whether we do range first or sequence first it doesn’t matter–we could go either way and since they’re orthoganal and unrelated no call would change. What does matter though is the last one because we want to use range or sequence overloads if possible and default to the stream. Since ranges and sequences can be streamable there’s an ambiguity that must be resolved and we’re picking non-stream overload first. This is also why the whole issue can’t be quickly solved by simple enable_if checks (sequence vs. range could, but the streaming bit throws a wrench in).

This seems like its pretty verbose, and it is, but we can now use this function and the tags without screwing up:


template < typename T >
void print(T const& t, try_streaming_tag) { std::cout << t; }

template < typename T >
void print(T const& t, range_tag) { for (auto const& v : t) print(v); }

template < typename T >
void print(T const& t, sequence_tag)
{
  fusion::for_each(t, [](auto const& v) { print(v); });
}

template < typename T >
void print(T const& t)
{
  using tag = typename select_print_tag<T>::type;
  print(t,tag{});
}

No artificial, arbitrary, and ill-conceived inheritance hierarchy for the tags is necessary here. The only reason for tags to inherit from each other is if they are reflecting concepts that refine each other. This allows the base-class resolution to trigger the use of overloads for the base-concept. We don’t need or even desire that here.

It’s hard to justify using tags like this. We’re not really benefiting from them at all. We still have to teach the compiler how to select the tag, so we’re not re-using the compiler’s logic. It does seem more legible though than either the “function overloading” or “conditional overloading” examples from the original blog. The former because that version has to repeat the logic in the select metafunction above for each function (with some sharing between them, which IMO just makes it harder to understand), and the latter because it’s a little bit macro heavy. The selection logic is also more focused this way, but that’s not really usual for tag dispatching (where the selection logic isn’t even in code but in the compiler).

Another method that is perhaps arguably better would be to use std::true_type and std::false_type as tags and just have is_xxx parameters–almost tag dispatching but not quite. Would work like so:


template < typename T, typename Ignore >
void print(T const& t, std::true_type, std::false_type, Ignore)
{
  range_print(t);
}

template < typename T, typename Ignore >
void print(T const& t, std::false_type, std::true_type, Ignore)
{
  seq_print(t);
}

template < typename T >
void print(T const& t, std::false_type, std::false_type, std::true_type)
{
  std::cout << t;
}

// Tick traits don't appear to return the standard bool types so we need some translation
template < typename T >
struct test_ : mpl::if_<T, std::true_type, std::false_type> {};

template < typename T >
using test = typename test_<T>::type;

template < typename T >
void print(T const& t)
{
  print(t, test<is_range<T>>{}, test<is_sequence<T>>{}, test<is_streamable<T>>{});
}

There’s some key differences here. First of all, if there ever were a Sequence that was also a Range we’d get a compile failure because there’s no overload to accept it. This is probably desirable because it would point this out rather than just silently defaulting one way or the other. Second of all the if/else logic required by both my former example and all of Mr. Fultz’s examples is gone. This version truly puts that effort on the compiler to figure out, we just give it the patterns and where to get the constraints…and we didn’t need tags at all. Finally, non-streamable types that don’t fit the other two concepts don’t wait to be inside the stream overload to abort during compilation. There’s no overload for that situation either.

Just overload it when you can!

There’s a couple branches from the original version that I didn’t address.

The first is the variant version. This one is actually not necessary and does nothing interesting, variant already knows how to print itself because it overloads the stream operator.

The second is for std::string. This one is important because unless we do something we’ll use the range version of the print function. The way I’ve written it we’d get the same output, but it’s gonna be the slow way and it’s completely logical that we’d put commas between range items–something we’d not want to happen to strings. The way to deal with this though is blazingly simple and requires nearly zero special consideration:

void print(std::string const& str) { std::cout << str; }

Concluding remarks

There’s a lot of reasons to prefer tag-dispatching and the similar construct I explained here over heavy use of SFINAE and/or deep macros. Neither of these latter tools should be discarded, but both introduce complexities that should be avoided when possible. There’s a lot that can be done with tags and things like tags to simplify code. It’s important though that when doing so you don’t try to outsmart the compiler, which can lead you to making poor decisions. Function overloading, with or without tags, can be a very powerful tool.

Compile time strings with constexpr

Posted in C++, cpp on October 17, 2014 by Crazy Eddie

In Using strings in C++ template metaprograms, Abel Sinkovics and Dave Abrahams describe a set of macros to create strings for metaprogramming in C++. Since C++11 though there is another way, and with C++14 it’s made easier. This article describes that method.

I’m using C++14 features here. Namely not needing postfix decltype in auto functions. My compiler seems to have some issues with auto (it’s not the latest version) so I’m not using it everywhere I could, but where I do I’m not doing the whole -> decltype(expression) thing. It cleans up the code a LOT and is a lot simpler. You should be able to do this in C++11 though by adding those bits where necessary.

With this technique you get a string type that can be used is much easier ways for many of the same purposes. It can be used at both runtime and compile-time and at least in the cases I’ve tried and compilers I have used this version can be used with much longer strings. For example in previous attempts to implement fizzbuzz as a metaprogram I was able to only acheive very low numbers before reaching the limit of template parameters for my compiler. In the sample code for this article I acheived all 100 entries.

Prerequisite utilities: index generation

For many of the functions in this construct we need a variadic template pack of all legal index values in an array. On the stackoverflow C++ Lounge wiki a technique is described that allows this. I altered it slightly:


template < int ... Nums >
struct indices
{
    template < int I >
    static constexpr auto append(boost::mpl::int_<I>)
    {
        return indices<Nums...,I>();
    }
};

constexpr indices<> make_indices(boost::mpl::int_<0>)
{
    return indices<>();
}

template < int I >
constexpr auto make_indices(boost::mpl::int_<I>)
{
    using prev = boost::mpl::int_<I-1>;
    return make_indices(prev()).append(prev());
}

You can use these constructs to create a type that contains all indices for an array, and then you can use this in a pack expression to retrieve that list of indices as a variadic template pack and then use that to initialize an array.

Basic operations

Before moving on to the more interesting stuff I will briefly describe the fundamental definitions to get started with:

template < int Size >
struct string
{
    using data_type = char const (&) [Size+1];

    constexpr int size() const
    {
        return Size;
    }

    constexpr data_type c_str() const
    {
        return arr;
    }

    // ...
private:
    char arr[Size+1];
};

The most interesting thing here is the definition of data_type. For a great many things we could simply return char const* but in many cases it helps us a lot to get the raw array type out of that function. It will decay into a pointer when necessary. The rest of this stuff is pretty self explanitory: we define constexpr functions to retrieve the size and the actual string, and a place to put it.

The array needs to be Size+1 so that it can store the null terminator and thereby behave like a C string. This could be ommitted if you never wish to use it as a c-string, but in many ways it’s easier to enable that form of access.

Construction

Building this thing then becomes the first tricky part we run into. We want to be able to build this thing out of a c-style string. We also want to be able to construct a zero size string without parameters.

template < int Size >
struct string
{
    // ...

    constexpr string(char const (&str) [Size+1])
        : string(str, detail_::make_indices(boost::mpl::int_<Size>()))
    {}

    constexpr string() : arr{} {}

private:
    char arr[Size+1];

    template < int I, int ... Indices >
    constexpr string(char const (&str) [I], detail_::indices<Indices...>)
        : arr{str[Indices]..., '\0'}
    {}
};

Line 6 here is where interesting things start. Normally you’d not take a c-style string as an array but instead as a pointer to char. Here though we want the size of the array and in C++ strings in quotes are actually static arrays; they have a size we can fetch through the type system. To fetch that size we simply make it a template parameter. The rest is funky syntax to take an array by reference.

On line 7 we use the new delegating constructor syntax to defer to a private constructor that uses the indexing trick. So we call make_indices as the second argument to that constructor.

Line 15 begins the other side of the indexing trick. Here we say that we expect a pack of integers. On line 16 we accept the type in which that pack will expand, thus giving us the pack as part of the deduction system. Finally on line 17 we finish the indexing trick and use it to initialize our internal array.

The default constructor should probably only be available when Size==0 but I’m being lazy here. You could use enable_if to fix this or create a specialization for string<0>.

The final thing here though is that it would be nice to have a helper function so we don’t need to count characters and specify the size. It’s pretty darn easy:

template < int Size >
constexpr auto make_string(char const (&cstr) [Size])
{
    return string<Size-1>(cstr);
}

Already we have a lot of power. We can define constexpr strings quite easily and we can demonstrably use them in places that require constant expressions:

BOOST_AUTO_TEST_CASE(from_cstr)
{
    constexpr auto s0 = make_string("hello");
    BOOST_CHECK_EQUAL(s0.c_str(), "hello");

    using c = boost::mpl::char_<s0.c_str()[2]>;
    using s = boost::mpl::int_<s0.size()>;

    BOOST_CHECK_EQUAL(c::value, 'l');
    BOOST_CHECK_EQUAL(s::value, 5);
}

Concatination

I’m going to skip over the definitions for push_front and push_back and instead just talk about append. It’s the more thorough example, leaving the other two fairly easy to understand once you can follow through append.

Like the constructor we need to use the indexing trick:

template < int Size >
struct string
{
    // ...
    template < int I >
    constexpr string<Size+I-1> append(char const (&cstr)[I]) const
    {
        return append( detail_::make_indices(boost::mpl::int_<Size>())
                     , detail_::make_indices(boost::mpl::int_<I-1>())
                     , cstr );
    }
    // ...
private:
    char arr[Size+1];

    template < int ... ThisIndices
             , int ... ThatIndices
             , int I >
    constexpr string<Size+I-1> append( detail_::indices<ThisIndices...>
                                     , detail_::indices<ThatIndices...>
                                     , char const (&cstr) [I] ) const
    {
        char const newArr[] = {arr[ThisIndices]..., cstr[ThatIndices]..., '\0'};
        return string<Size+I-1>(newArr);
    }
};

It’s kind of funky looking but there’s no reason to pass on arr because we already have it. So we pass on the indices to reference both arrays and a reference to the array we want to append. We can’t just create a class variable or type to hold pre-created indices because the pack part is essential to the indexing trick and that is only available if we make it part of the function signature.

The interesting part here is on line 23 where the new c-string is created by merging the two inputs. Using the indexing trick again here twice–we can have two packs because they resolved inside clear boundaries–we just use array initialization to do so, just as with the constructor.

There’s an alternative to the newArray method. It requires a constructor that takes individual chars and it must be a variadic template. Because of this requirement I decided I didn’t much care for that constructor and so did it this way. You can see the difference in the github change history for string.hpp.

That’s basically the whole bit. It’s nice to have operators though:

template < int I0, int I1>
constexpr string<I0+I1> operator + (string<I0> const& s0, string<I1> const& s1)
{
    return s0.append(s1);
}

And now we have compile-time string processing! Check out the unit test:

BOOST_AUTO_TEST_CASE(operators)
{
    constexpr auto s0 = '[' + make_string("hello") + ' ' + make_string("world") + ']';

    BOOST_CHECK_EQUAL(s0.c_str(), "[hello world]");
}

That test uses some additional operations (push_front and push_back) that are not significantly different from the append operation.

Limitations

There are a few limitations to this construct. Using them in constexpr scope will eventually hit a wall of recursion limits. I’ve found though that this is much, much greater than the MPL string.

Another limitation is that even in constexpr functions parameters are never constexpr. Thus you can’t do anything like this:

template < int I >
constexpr void fun(string<I> str)
{
    using illegal = boost::mpl::char_<str.c_str()[0]>;
}

A final limitation is that different strings of the same length are not distinct types. This construct would therefor not be sufficient to implement the keys in my json library. Furthermore, because of the aforementioned issue where function parameters are never constexpr, there’s really no great way to turn them into types. Anything that might would necessarily follow the same sort of structure as Abel Sinkovics and Dave Abrahams’s version–namely it would need macros. However, for the purposes discussed in that article, such as creating a compile-time regex generator, this constexpr string type would be sufficient I’d think.

Conclusion

So, why would you use such a thing? I have no really great plan to. This was done more or less because I could and because the methods used are interesting. I’ve not initiated anything new here, but I’ve tied together a few different techniques. This string type also might be practically useful for something like the tasks mentioned in Abel Sinkovics and Dave Abrahams’s article.

As a test I did implement fizzbuzz as a constexpr string generator. I indeed hit a recursion limit wall but was able to work around it by making the string slightly shorter. It thus goes all the way from 1-100, unlike my previous attempts to create one with the MPL string where the upper limit was closer to 20 and required increasing some numeric defines before including MPL. It might be possible to remove some recursion by mixing constexpr and template metaprogramming–for example implement the index trick more like was done on the C++ Lounge site.

Obviously to make this construct practically useful in a non-painful way you’d want to add many omitted functions, the most glaring of which is an index operator.

The code for this blog is available on my github if you’ve not noticed already.

Sane C++

Posted in Uncategorized on October 9, 2014 by Crazy Eddie

I’ve created a second C++ blog that will be more geared toward practical, everyday programming. I’ll continue using this one as a place to discuss more advanced or, “just because I can,” topics.

Crazy Eddie’s Sane C++

The first post went up today. It is a brief introduction to unit testing using Boost.Test.

Introduction to unit testing

From the about:

This blog is a compliment to my other blog, Crazy Eddie’s Crazy C++. There I focus on more advanced subjects, like template metaprogramming, that are not going to be what you want to be writing all the time (or at all) because it can be tough to maintain. In this blog I focus on more practical–boring if you will–subjects you can and should use in your everyday C++ development. In Crazy C++ you can find tricks that respond to hard problems or just exercise the language; here I intend all content to be accessible and usable.

I assume some familiarity with the C++ language. Perhaps you’ve taken a college course or two, or maybe you’ve read a book and written a few exercise programs. You might even be interning or in your first few years of professional software development. Perhaps some of my posts here will give the occasional insight to an experienced software developer well versed in C++, but for the most part I intend to address those people who wish to go from barely knowing anything to being able to write maintainable C++ code. Expect to see a lot of simple idioms and pattern application.

You can find code examples for most posts in this blog at my github: Sane CPP.

Near future installments will include subjects like pimpl, NVPI, RAII…and whatever.

On another issue: I am currently on sabbatical but am on the lookout for potential employers as well. My linkedin profile serves as a resume; feel free to contact me if you have a position I could be interested in. In the meantime I hope to be more active in both of these blogs.

Pimplcow

Posted in Uncategorized on September 13, 2014 by Crazy Eddie

This article briefly describes a simple little extension of the pimpl idiom that I’ve called “pimplcow”.  It’s not a unique idiom, I know at least one other developer/group that has done the same thing.  I invented it independently though and at least for a while thought it uniquely mine.  Perhaps I can lay claim to the name :p

The basic gist of the idiom can be shown with shared_ptr:


struct object
{
    int get() const;
    void set(int);

    object();
    object(object const&);
    object& operator = (object);

private:
    struct impl;
    std::shared_ptr<impl> pimpl_;

    impl * pimpl();
    impl const* pimpl() const;
};

The declaration of the class isn’t really anything new or interesting except perhaps for the pimpl function and the shared_ptr. The pimpl() function is actually useful outside of this idiom as it allows you to easily copy the constness of your object into the constness of your impl, which generally should match but pointers in C++ change things a bit. The implementation though is where things get a touch interesting:


struct object::impl { int i; };

object::object() : pimpl_(new impl()) {} // or make_shared
object::object(object const& other) : pimpl_(other.pimpl_) {} // shallow copy!
object & object::operator = (object other) { pimpl_.swap(other.pimpl_); }

int object::get() const { return pimpl()->i; }
void object::set(int i) { pimpl()->i = i; }

// Now for the tricky parts...

// For const operations we just return the pimpl we have.
object::impl const* object::pimpl() const { return pimpl_.get(); }

// For non-const operation we assume the object is about to be modified and we now copy the pimpl...maybe.
object::impl * object::pimpl()
{
    if (!pimpl_.unique()) pimpl_.reset(new impl(*pimpl_));
    return pimpl_.get();
}

This now implements the copy-on-write optimization in a VERY easy way. In fact, if you write your pimpl based classes in a sort of simple to follow standard way you end up having to change only two functions to switch between pimplcow and simple pimpl:


object::object(object const& other) : pimpl_(new impl(*other.pimpl_)) {}

object::impl * object::pimpl() { return pimpl_.get(); }

A further optimization recognizes that `shared_ptr` is actually a rather heavy-weight object when you know for certain you’re using shared pointer semantics, such as in the case of pimplcow. So instead you use an invasive reference count pointer such as `boost::intrusive_ptr`.

This construct is really simple and can save a lot of memory in some situations. It can also greatly reduce CPU use if your objects would otherwise be copied a lot but not changed. Yet further you can use value semantics instead of shared pointers to optimize against copying. It’s much, MUCH simpler to think about and use value types than shared pointers when the only thing you’re trying to do is avoid copies. Finally, in these conditions you must already be paying the cost of atomic increment/decrement in the reference count, which is the one thing you need to worry about here.

To close a bit about the reference count. You need to be weary of and measure the effect of atomic operations needed to increment and decrement the reference count for the shared pointer. If you’re making huge numbers of copies and/or destroying many values when you’ve not called a non-const member and thus doing the real copy, you can end up with a LOT of contention between CPUs. The pimplcow idiom is thread safe if you use a thread-safe reference counter, but in some cases it may actually be cheaper to make a real copy than to defer that copying and use a reference count. You’ll also want to avoid taking your parameters by value when you don’t need to because this will inject this atomic (or mutex lock) protected reference count manipulation, and that can indeed be a heavy cost in some cases.

Final close note: You can use this same sort of pattern to implement a flyweight quite easily. Use the usual pimplcow versions of copy and pimpl(), but additionally do a search into a flyweight manager to find an existing identity for the pimpl after you modify it. I have implemented both of these idioms in a sort of pimpl framework library in my github experiments repository: https://github.com/crazy-eddie/experiments/tree/master/performance – I make no claim that this experiments repository will be in a useable state when/if you chose to clone it. In fact I know right now that the pimplvector stuff is completely broken.

Final, final note: This idiom/pattern is indeed thread safe, but only if values are not shared among threads. So long as each thread has its own “copy” of the value you need not worry about any race conditions. This makes the idiom easy to implement and use. If you on the other hand need to share single copies of a value in multiple threads then you need to add further syncronozitation. This is not really unusual and so I don’t consider it a flaw, but you do need to be aware of it because of course bad things will happen if you use it incorrectly and cause a race condition.

Beginning Template Meta-Programming: Introducing meta-functions

Posted in Template Metaprogramming on June 28, 2013 by Crazy Eddie

I intended this to be an article about “arbitrary binding” in template meta-programming, but I found that there was much I needed to introduce in order to do so and the article was becoming quite lengthy without actually providing much.  Thus I’ve decided to begin a series on template meta-programming that will introduce the topic and expand into more complex subjects, partial application being one of them.

Some history – expression templates

Meta-programming in C++ was pretty much a total accident.  The template system was defined without meta-programming in mind.  For many years in C++ development, and unfortunately still not as rare as it should be, developers were afraid of templates and in some cases avoided them entirely.  More than one team would have coding standards that dictated that no developer use templates.  The problem was that many compilers did not implement templates very well and in fact there are still issues with some of the more commonly used compilers.  If you chose to go down the road of becoming efficient at Template Meta-Programming (TMP) you are sure to run into a few compiler bugs.

Even so though, some brave souls embraced templates and came up with some very interesting constructs.  The STL is but one of the more fascinating examples.  Many of the ideas introduced there indicated that there was much potential for expressive power in templates.  One very early example of techniques that hinted at TMP though are expression templates.  If you search through the Internet for this term you’ll find a great many excellent articles and papers.  Expression templates are a powerful concept that can be used to reduce the amount of temporaries created during calculations of complex objects such as matrices.  The idea is that operators, such as addition, don’t actually perform addition at all but instead create an object that implements the functions that perform that task.  This is in some ways hinting at lazy evaluation except that (until C++11 changed the meaning of auto) nobody in their right mind wants to declare the type that would be necessary to store that lazy evaluator in a variable.  Instead, the act of assignment or access would perform all the calculations in the complete expression leaving out that which is unnecessary, encoding the evaluation of each cell into a single expression, and eliminating the need for temporary objects.  What follows is a very brief example (the rest of the operators are left as an exercise):

struct matrix
{
    matrix(std::array<int,9> init) : vals(std::move(init)) {}

    int value(int i,int j) const
    {
        return vals[i * 3 + j];
    }

private:
    std::array<int, 9> vals;
};

template < typename Right
         , typename Left >
struct add_matrix
{
    add_matrix( Right const& r
              , Left const& l )
        : right(r)
        , left(l)
    {}

    add_matrix(add_matrix const& other)
        : right(other.right)
        , left(other.left)
    {}
    add_matrix & operator = (add_matrix const&) = delete;

    operator matrix () const
    {
        return matrix {{{ value(0,0), value(0,1), value(0,2)
                        , value(1,0), value(1,1), value(1,2)
                        , value(2,0), value(2,1), value(2,2) }}};
    }

    int value(int i, int j) const
    {
        return right.value(i,j) + left.value(i,j);
    }
private:
    Right const& right;
    Left const& left;
};

template < typename Right, typename Left >
auto operator + (Right const& r, Left const& l)
{
    return add_matrix<Right,Left>{r,l};
}

BOOST_AUTO_TEST_CASE(addition)
{
    auto m0 = matrix{{{ 1,2,3
                      , 4,5,6
                      , 7,8,9 }}};
    auto m1 = matrix{{{ 9,8,7
                      , 6,5,4
                      , 3,2,1 }}};

    auto exp = m0 + m1;
    matrix m = exp;

    for (int i = 0; i != 3; ++i)
    {
        for (int j = 0; j != 3; ++j)
        {
            BOOST_CHECK_EQUAL(exp.value(i,j), 10);
            BOOST_CHECK_EQUAL(m.value(i,j), 10);
        }
    }

    auto exp2 = m0 + m1;
    auto exp3 = exp + exp2;

    for (int i = 0; i != 3; ++i)
    {
        for (int j = 0; j != 3; ++j)
        {
            BOOST_CHECK_EQUAL(exp3.value(i,j), 20);
        }
    }
}

Expression templates are an example of an embedded Domain Specific Language (DSL). They turn expressions involving the matched types and operator overloads into a set of recursively defined types that represent an abstract syntax tree.

Our first “pseudo” meta-function

One of the first examples of a metafunction generally introduced is the factorial function.  I’m going to call this a pseudo metafunction because it doesn’t match any defined protocol for creating a consistent language for template metaprogramming.  As with functional programming in general, this compile time factorial calculator is recursive in nature.  In something like Haskel or Scala you would use pattern matching on the argument to invoke a different calculation for the argument should it be general or should it be your base case.  In TMP you do this with template specialization:

template < size_t I > // general case
struct pre_tmp_fact
{
    // recursively invoke pre_tmp_fact
    // and multiply the result with the current value.
    enum { value = I * pre_tmp_fact<I-1>::value };
};

template < > // base case specialization
struct pre_tmp_fact<0>
{
    // 0! == 1
    enum { value = 1 };
};

BOOST_AUTO_TEST_CASE(first_example)
{
    BOOST_CHECK(pre_tmp_fact<5>::value == 5 * 4 * 3 * 2);
}

Defining a protocol

In C++ Template Metaprogramming, Abrahams and Gurtovoy define a protocol for implementing a consistent language for template metaprogramming in C++.  That protocol involves three basic definitions:

Meta-Function
A meta-function is a template that takes types as parameters and ‘returns’ a type in the form of an internal type typedef.
Value Meta-Function
Value meta-functions are metafunctions that represent values. They have all the characteristics of a meta-function and also an internal value member. They are “identity meta-functions” in that they return themselves as their type. If you create instances of them they are generally convertible to the underlying raw value type they represent, with the appropriate value of course.
Meta-Function Class
Since a meta-function is not a type it cannot be passed as a parameter to another meta-function. Therefor an additional concept, that of meta-function class, needs to be defined. It is a type that has an internal meta-function called apply.

Our first meta-function

With the above in mind, we can recreate our factorial metafunction using these concepts.  In this article I’ll implement very basic, minimally functional integer wrapper meta-functions.  In the future I will use the Template Metaprogramming Library (MPL), which is part of the boost library set.  It’s worth reviewing here, for instruction purposes, but it’s generally easy to use what other people have built and tested than reinventing the wheel.  This time though, we’ll reinvent the wheel, we just won’t add all the spokes and will not true it:


// integer value wrapper metafunction.
template < int I >
struct int_
{
    typedef int_<I> type;
    enum : int { value = I };
};

BOOST_AUTO_TEST_CASE(value_mf)
{
    // Has a value...
	BOOST_CHECK(int_<5>::value == 5);
	// Is an identity metafunction...
	BOOST_CHECK(int_<5>::type::value == 5);
	// This recursion is OK because the compiler won't instantiate until requested.
	// Can go on forever.
	BOOST_CHECK(int_<5>::type::type::value == 42);
}

// This is not how boost's MPL does it!

// Forward declaration of metafunction
template < typename T >
struct prev;

// Specialize for int_ wrappers.
template < int I >
struct prev<int_<I>>
{
    typedef int_<I-1> type;
    enum { value = int_<I-1>::value };
};

BOOST_AUTO_TEST_CASE(prev_int)
{
    BOOST_CHECK(prev<int_<5>>::value == 4);
}

// General case.
template < typename T >
struct tmp_fact
{
    typedef tmp_fact<T> type;
    enum { value = T::value * tmp_fact<typename prev<T>::type>::value };
};

// Specification for base case.
template < >
struct tmp_fact<int_<0>>
    : int_<1> // You can subclass the result sometimes, to be preferred.
{};

BOOST_AUTO_TEST_CASE(fact_check)
{
    BOOST_CHECK(tmp_fact<int_<5>>::value == 5 * 4 * 3 * 2);
}

Moving forward

If you are interested in continued study of template meta-programming in C++ then I suggest purchasing the book, C++ Template Metaprogramming.  I will also be continuing forward with some other instructional material in future blogs, including implementing named parameters and partial evaluation (arbitrary binding) in templates.  This has been but a rudimentary look at the very basics that will be needed by future blog entries.

I might suggest also playing with the above code and trying to improve it. There are a great many ways in which it can be. The factorial metafunction for example could use tail recursion and could even use subclassing all the way through. It is much easier to read and work on a metafunction that can be defined in terms of other metafunctions through simple inheritance, so unless you must do otherwise to leverage lazy evaluation (any time you can avoid explicitly refering to type within a metafunction you’re doing lazy evaluation) it should be preferred. Often this requires decomposition of more complex metafunctions into smaller, simpler ones. This is of course, like with your usual code, never a bad thing.

GotW #91

Posted in C++11, GotW on May 31, 2013 by Crazy Eddie

I can’t seem to post to Herb Sutter’s blog, my comments just never appear and ones I’ve made are gone.  Maybe he has me blocked for some mysterious reason.  So I thought I’d put my answer to the problems here.

Anyone that wants to write solid C++ should be reading his blog.  I suggest subscribing.

So, my answer to GotW 91 is:

#1 – Some extra locking and reference counting will take place. A new small object will be created on the stack. Not much will happen actually. The locking is the greatest concern.

#2 – I don’t believe it matters until threads are introduced. If you call this function via a thread spawning though you want to be sure that you do so by value. Doing it by reference can create a race condition in which you reduce the reference count while the pointer is being used or copied by the other side. When that happens it could be that you’re the last one holding the pointer and then the other side gets to have it’s resource yanked out from under it. This is bad voodoo.

#3 has made me want to come up with a set of smart pointer types that secure and ensure correct ownership semantics. In this case you’d take a pointer (by value) of the kind that specifies how you’ll use it. If only within scope then you’d specify that with type. It’s a pretty tough problem though…at least for me.

The answer to the question is one of three (given the types in the question):

1. Use reference – when in doubt, use a reference rather than pointer. If you don’t need a pointer, don’t use one.
2. Use unique_ptr by value – You intend to steal ownership of the pointer and you want this ownership transfered away from client code immediately.
3. The parameter could be null but you don’t want to steal it.  Use a raw pointer (for now at least…if I ever solve the above idea then you may have a smart pointer to use instead)

A case against naming conventions based on type

Posted in Uncategorized with tags on May 17, 2013 by Crazy Eddie

Many places I have worked have naming conventions that are, in my opinion, poorly conceived.  Naming conventions really should be limited to deciding between camel case, pascal case, and simply, “Give it a name that explains what it is supposed to be for.”  Unfortunately a lot of places go much further and do things like putting ‘m_’ before member variables, having different naming conventions for different classes of entities (Pascal for classes, underscore for functions, camel for variables, _t after typedefs, etc…).  These kinds of conventions miss the point of generic and object oriented programming and thus miss the point of C++ in a great way.

The issue is that what kind of thing an entity is doesn’t really matter so long as a certain interface is obeyed.  Take for example putting the “_t” postfix on typedefs, which many developers and firms do.  What happens now when someone comes along and decides that this type would be better implemented as a class of its own rather than a typedef of something else.  Since typedefs in C++ are weak this is not a surprising or all to rare thing to do.  Now in order to keep from breaking clients everywhere we need to follow the convention for naming classes and then add a typedef so that the “_t” name exists.  This is silly.

Another example comes up with regard to functors.  Functors are classes thats’ instances look and behave like functions that may have state.  If you’ve specified that variables must be named one way, and functions another, then in fact you’ve got a problem here because conceptually this variable is both a variable and a function!  If you replace a function with a functor someplace then again you are in the unfortunate position of having to violate naming conventions or do something silly like make a function to call the functor just so both names exist.

The C++ standard library has a pretty good C++ naming convention.  It uses one convention for almost everything with the caveats that template parameters are named differently (and this is important because of dependent name issues) and so are macros.  Both of these things require special consideration when using so it’s worthwhile to make them stand out.

In C++ types are strong at the time of compilation, but in the source they are interchangeable to a great degree.  So long as something obeys an interface, calling code shouldn’t have to change when you completely change the type some entity is.  Naming conventions that force you to change names based on type hinder this ideal.

CUDA without Visual Studio

Posted in Uncategorized on February 11, 2013 by Crazy Eddie

So I have a solid-state drive that I put my system on. It is only 55G in size and Windows 7 itself takes up over half of that, that’s before installing ANYTHING. Visual Studio takes up around 9G of drive space and though you can ask it to install on another drive, that only moves 1G…the rest INSISTS on being on your C drive.

I wanted to try out cuda development though and I don’t currently have a Linux install in anything but a VM (that will be remedied eventually). The CUDA development kit requires VS though.

It is possible though to get the compiler for VS only by downloading the SDK. The SDK can mostly be installed on another drive. Though there is stuff that insists on being on your system drive, it’s more like .5G rather than 6+. Even stuff that VS installs there and won’t anywhere else can be put on another drive this way. Of course, you won’t have the debugger or the idea…but in a pinch it works.

The next thing I had to argue about is getting the CUDA compiler part to recognize the configuration. I got a weird error saying it couldn’t find configuration “null” in the amd64 directory within bin of the SDK install. That is easily fixed by creating a file called “vsvars64.bat” in that directory that has a single line in it: “call drive:\path_to\SetEnv.cmd /x64”. The ‘path_to’ part will depend on where you installed the SDK.

Then you have to close that cmd window and start again for some reason–talking about the “Windows SDK Command Prompt” from the start menu. After this you can set the PATH to include your cudapath\bin and cudapath\open64\bin. You need to be able to run `nvcc` and `nvopencc`.

Once all this was done I was able to compile a basic cuda program from the command line with: `nvcc –cl-version 2010 –use-local-env file.cu`.

This took hours of pain, google research and forum posting. Hopefully the next person in my shoes can find this and it helps. Look forward to hearing success stories and otherwise.

“Universal design principle”

Posted in Uncategorized on January 17, 2013 by Crazy Eddie

In this video the presenter associates what he calls the Universal design principle to Scott Meyers. He also attributes him with the name. I can’t find another source on this but I have seen the principle passed around by others such as Robert C. Martin. The principle is, “Make interfaces hard to use incorrectly and easy to use correctly.” The presenter had a picture of a pestle and cup; the idea being that it’s really easy to use and can’t be used any other way.

That got me thinking and for the first time I think I believe that this principle is the wrong way to think about it. Why? Because being the kind of gutter thinker I am I immediately thought of other uses for that pestle…some places it could be stuck. I suppose this would be considered the “wrong” use and so it doesn’t follow the principle but wait!

What if you WANT to use it that way?

So I think that perhaps a more useful, a more defensive way of thinking about this principle is to say that, “The use of an interface in any way that works is ‘correct’.” In other words, regardless of the intention of the interface designer, if it can be used in a particular way then it’s a correct use of that interface. It then follows that it’s the duty of the designer to make sure that his or her intentions MUST be followed because they’re the only use that the interface supports.

We then get a sort of inversion of the principle that I think more clearly defines the intention of the principle: to make sure your interfaces are adequately specified and that implementations protect themselves from violations of that interface…BUT…your implementation must also handle correctly any correct use your interface allows. That way if someone decides they want to stick it where the sun doesn’t shine…then either it should not fit, or it at least shouldn’t break.

SFINAE — but sometimes it is.

Posted in Uncategorized on January 1, 2013 by Crazy Eddie

As I was writing up the code for my last article on bind expressions I actually learned something I had not expected and did not know before. As I often say, “Teaching is one of the best ways to learn,” and well, here’s a great example.

When I initially wrote the bind helper function I only had the one set:

template < typename Fun >
binder<typename return_of<Fun>::type, Fun, empty_list> bind(Fun f);

// further definitions for 1, 2, and 3 argument functions...

I then had a helper metafunction that would derive the return type:

template < typename T >
struct return_of { typedef typename T::type type; };

// one of many...
template < typename R >
struct return_of<R(*)()> { typedef R type; };

This worked just fine for function pointers and functors with a return_type subtype definition. The problem later came when I decided to add member function compatibility. This requires building an extra object to convert the calling convention to function syntax rather than member pointer. Here is one such function overload:

template < typename R
         , typename P1, typename A1
         , typename P2, typename A2 >
binder_t< R
        , mf1<R,P1,P2>
        , boost::fusion::vector<A1,A2> >
bind(R(P1::*f)(P2),A1 a1, A2 a2)
{
	typedef mf1<R,P1,P2> F;
	typedef boost::fusion::vector<A1,A2> L1;
	return binder_t<R, F, L1>(F(f), L1(a1,a2));
}

The assumption I had at this point, previous to compiling and after writing 20 lines of repetative overloads, was that the substitution of the member function pointer type in the previous overloads would cause a failure that the compiler would ignore because of SFINAE. Yeah…that’s not how it works.

There are a couple ways to consider why SFINAE will not apply here. The first is to look at 14.8.2 in the standard, note that these are the only allowed deduction failures that SFINAE can catch, and see that what’s going on here, instantiating something within another template that causes an error, is not in that list.

The more direct reason though is this: During the deduction of the types for the bind function we are instantiating another template. That template goes through its own deduction process and passes! There is no syntactic error caused by substituting a member function pointer into the template parameters for return_of. The deduction process is finished here and that template can be instantiated. Remember also that in C++ templates are only instantiated as they are needed. It is perfectly legal to instantiate a template class that has members that would cause syntax errors so long as you never use those members. So this process is finished and we are no longer within the realm where SFINAE even applies in the language with regard to return_of. We then instantiate type within this template and it has one…so again we can’t trigger SFINAE here because the type name exists! Now the instantiation phase happens for type and of course that results in the gibberish code R(T::*)(??)::type. This now is a syntax error, an ill-formed piece of code and we are well past the point of SFINAE, which happens only during deduction.

There are a few ways you can fix this issue. The first is what I chose in the article code: use T::type directly in the signature of my function so that its discovered during deduction where “substitution” actually occurs. This of course required that I write alternate overloads for function pointers.

Another way is to write result_of so that it won’t try define a type type member if type T doesn’t have one:

#include <boost/mpl/has_xxx.hpp>

BOOST_MPL_HAS_XXX_TYPE_DEF(type);

template < typename T, bool = has_type<T>::value>
struct return_of{};

template < typename T, true >
struct return_of { typedef typename T::type type; };

Now what happens in the original definition of bind is that the actual substitution going on is return_of<T> and then we’re trying to access type within that substitution, which fails. This is a substitution failure and it is not an error, it’s simply discarded as a viable overload.

Finally, one more way to fix this issue is to force the definition of return_of::type to occur during the deduction phase of return_of while the template parameter is being substituted in for T:

template < typename T, typename Type = typename T::type>
struct return_of { typedef Type type; };

Now the substitution failure is happening within the result_of deduction, which must be deduced to finish deducing bind. This substitution failure is also “not an error” and the whole works is discarded as a viable overload for bind

It’s an interesting little gotcha. Some people I’ve talked to claim that what I did was legal in C++03 but only due to unclear language in the standard, which has now been clarified. Others claim that the standard already defines what I first did as ill-formed. I tend to believe the latter as it seems to coincide with my interpretation, but either way you certainly cannot depend on it working and if it does you should simply ignore that fact as a fluke. If you have code that relies on this working, get working on replacing it with whichever of the above best fits the problem. If you’re just learning how to use SFINAE to do type introspection and function overloading…keep this in mind and be better armed with a better understanding of the mechanism.

Until next time, happy coding!