Archive for the C++ Category

Compile time strings with constexpr

Posted in C++, cpp on October 17, 2014 by Crazy Eddie

In Using strings in C++ template metaprograms, Abel Sinkovics and Dave Abrahams describe a set of macros to create strings for metaprogramming in C++. Since C++11 though there is another way, and with C++14 it’s made easier. This article describes that method.

I’m using C++14 features here. Namely not needing postfix decltype in auto functions. My compiler seems to have some issues with auto (it’s not the latest version) so I’m not using it everywhere I could, but where I do I’m not doing the whole -> decltype(expression) thing. It cleans up the code a LOT and is a lot simpler. You should be able to do this in C++11 though by adding those bits where necessary.

With this technique you get a string type that can be used is much easier ways for many of the same purposes. It can be used at both runtime and compile-time and at least in the cases I’ve tried and compilers I have used this version can be used with much longer strings. For example in previous attempts to implement fizzbuzz as a metaprogram I was able to only acheive very low numbers before reaching the limit of template parameters for my compiler. In the sample code for this article I acheived all 100 entries.

Prerequisite utilities: index generation

For many of the functions in this construct we need a variadic template pack of all legal index values in an array. On the stackoverflow C++ Lounge wiki a technique is described that allows this. I altered it slightly:


template < int ... Nums >
struct indices
{
    template < int I >
    static constexpr auto append(boost::mpl::int_<I>)
    {
        return indices<Nums...,I>();
    }
};

constexpr indices<> make_indices(boost::mpl::int_<0>)
{
    return indices<>();
}

template < int I >
constexpr auto make_indices(boost::mpl::int_<I>)
{
    using prev = boost::mpl::int_<I-1>;
    return make_indices(prev()).append(prev());
}

You can use these constructs to create a type that contains all indices for an array, and then you can use this in a pack expression to retrieve that list of indices as a variadic template pack and then use that to initialize an array.

Basic operations

Before moving on to the more interesting stuff I will briefly describe the fundamental definitions to get started with:

template < int Size >
struct string
{
    using data_type = char const (&) [Size+1];

    constexpr int size() const
    {
        return Size;
    }

    constexpr data_type c_str() const
    {
        return arr;
    }

    // ...
private:
    char arr[Size+1];
};

The most interesting thing here is the definition of data_type. For a great many things we could simply return char const* but in many cases it helps us a lot to get the raw array type out of that function. It will decay into a pointer when necessary. The rest of this stuff is pretty self explanitory: we define constexpr functions to retrieve the size and the actual string, and a place to put it.

The array needs to be Size+1 so that it can store the null terminator and thereby behave like a C string. This could be ommitted if you never wish to use it as a c-string, but in many ways it’s easier to enable that form of access.

Construction

Building this thing then becomes the first tricky part we run into. We want to be able to build this thing out of a c-style string. We also want to be able to construct a zero size string without parameters.

template < int Size >
struct string
{
    // ...

    constexpr string(char const (&str) [Size+1])
        : string(str, detail_::make_indices(boost::mpl::int_<Size>()))
    {}

    constexpr string() : arr{} {}

private:
    char arr[Size+1];

    template < int I, int ... Indices >
    constexpr string(char const (&str) [I], detail_::indices<Indices...>)
        : arr{str[Indices]..., '\0'}
    {}
};

Line 6 here is where interesting things start. Normally you’d not take a c-style string as an array but instead as a pointer to char. Here though we want the size of the array and in C++ strings in quotes are actually static arrays; they have a size we can fetch through the type system. To fetch that size we simply make it a template parameter. The rest is funky syntax to take an array by reference.

On line 7 we use the new delegating constructor syntax to defer to a private constructor that uses the indexing trick. So we call make_indices as the second argument to that constructor.

Line 15 begins the other side of the indexing trick. Here we say that we expect a pack of integers. On line 16 we accept the type in which that pack will expand, thus giving us the pack as part of the deduction system. Finally on line 17 we finish the indexing trick and use it to initialize our internal array.

The default constructor should probably only be available when Size==0 but I’m being lazy here. You could use enable_if to fix this or create a specialization for string<0>.

The final thing here though is that it would be nice to have a helper function so we don’t need to count characters and specify the size. It’s pretty darn easy:

template < int Size >
constexpr auto make_string(char const (&cstr) [Size])
{
    return string<Size-1>(cstr);
}

Already we have a lot of power. We can define constexpr strings quite easily and we can demonstrably use them in places that require constant expressions:

BOOST_AUTO_TEST_CASE(from_cstr)
{
    constexpr auto s0 = make_string("hello");
    BOOST_CHECK_EQUAL(s0.c_str(), "hello");

    using c = boost::mpl::char_<s0.c_str()[2]>;
    using s = boost::mpl::int_<s0.size()>;

    BOOST_CHECK_EQUAL(c::value, 'l');
    BOOST_CHECK_EQUAL(s::value, 5);
}

Concatination

I’m going to skip over the definitions for push_front and push_back and instead just talk about append. It’s the more thorough example, leaving the other two fairly easy to understand once you can follow through append.

Like the constructor we need to use the indexing trick:

template < int Size >
struct string
{
    // ...
    template < int I >
    constexpr string<Size+I-1> append(char const (&cstr)[I]) const
    {
        return append( detail_::make_indices(boost::mpl::int_<Size>())
                     , detail_::make_indices(boost::mpl::int_<I-1>())
                     , cstr );
    }
    // ...
private:
    char arr[Size+1];

    template < int ... ThisIndices
             , int ... ThatIndices
             , int I >
    constexpr string<Size+I-1> append( detail_::indices<ThisIndices...>
                                     , detail_::indices<ThatIndices...>
                                     , char const (&cstr) [I] ) const
    {
        char const newArr[] = {arr[ThisIndices]..., cstr[ThatIndices]..., '\0'};
        return string<Size+I-1>(newArr);
    }
};

It’s kind of funky looking but there’s no reason to pass on arr because we already have it. So we pass on the indices to reference both arrays and a reference to the array we want to append. We can’t just create a class variable or type to hold pre-created indices because the pack part is essential to the indexing trick and that is only available if we make it part of the function signature.

The interesting part here is on line 23 where the new c-string is created by merging the two inputs. Using the indexing trick again here twice–we can have two packs because they resolved inside clear boundaries–we just use array initialization to do so, just as with the constructor.

There’s an alternative to the newArray method. It requires a constructor that takes individual chars and it must be a variadic template. Because of this requirement I decided I didn’t much care for that constructor and so did it this way. You can see the difference in the github change history for string.hpp.

That’s basically the whole bit. It’s nice to have operators though:

template < int I0, int I1>
constexpr string<I0+I1> operator + (string<I0> const& s0, string<I1> const& s1)
{
    return s0.append(s1);
}

And now we have compile-time string processing! Check out the unit test:

BOOST_AUTO_TEST_CASE(operators)
{
    constexpr auto s0 = '[' + make_string("hello") + ' ' + make_string("world") + ']';

    BOOST_CHECK_EQUAL(s0.c_str(), "[hello world]");
}

That test uses some additional operations (push_front and push_back) that are not significantly different from the append operation.

Limitations

There are a few limitations to this construct. Using them in constexpr scope will eventually hit a wall of recursion limits. I’ve found though that this is much, much greater than the MPL string.

Another limitation is that even in constexpr functions parameters are never constexpr. Thus you can’t do anything like this:

template < int I >
constexpr void fun(string<I> str)
{
    using illegal = boost::mpl::char_<str.c_str()[0]>;
}

A final limitation is that different strings of the same length are not distinct types. This construct would therefor not be sufficient to implement the keys in my json library. Furthermore, because of the aforementioned issue where function parameters are never constexpr, there’s really no great way to turn them into types. Anything that might would necessarily follow the same sort of structure as Abel Sinkovics and Dave Abrahams’s version–namely it would need macros. However, for the purposes discussed in that article, such as creating a compile-time regex generator, this constexpr string type would be sufficient I’d think.

Conclusion

So, why would you use such a thing? I have no really great plan to. This was done more or less because I could and because the methods used are interesting. I’ve not initiated anything new here, but I’ve tied together a few different techniques. This string type also might be practically useful for something like the tasks mentioned in Abel Sinkovics and Dave Abrahams’s article.

As a test I did implement fizzbuzz as a constexpr string generator. I indeed hit a recursion limit wall but was able to work around it by making the string slightly shorter. It thus goes all the way from 1-100, unlike my previous attempts to create one with the MPL string where the upper limit was closer to 20 and required increasing some numeric defines before including MPL. It might be possible to remove some recursion by mixing constexpr and template metaprogramming–for example implement the index trick more like was done on the C++ Lounge site.

Obviously to make this construct practically useful in a non-painful way you’d want to add many omitted functions, the most glaring of which is an index operator.

The code for this blog is available on my github if you’ve not noticed already.

Arbitrary binding pt1 – What is boost::bind and why is it cool?

Posted in C++, Template Metaprogramming on April 22, 2012 by Crazy Eddie

The STL has always been a rather brilliant piece of generic coding. The principles applied created a rather wonderful set of tools that could be used for a variety of conditions without losing type safety and without introducing unnecessary coupling by forcing inheritance. The ability to write generic algorithms that work on any type matching a given concept really adds to the expressiveness and power of C++. Until the advent of boost::bind though, the actual, practical use of these generic algorithms was too painful to bother.

To understand what I mean by this claim, consider the following program that finds all employees that make more than 50k a year:

struct employee
{
    int salary() const;
private:
  // ...
};

// Write special function just to compare salary because binds don't nest:
bool makes_less_than(employee const& e, int amount)
{
    return e.salary() < amount;
}

std::vector<employee> find_all_that_make_50k(std::vector<employee> const& employees)
{
    std::vector<employee> rval;

    std::remove_copy_if(employees.begin(), employees.end(), std::back_inserter(rval), std::bind1st(std::ptr_fun(makes_less_than), 50000));

    return rval;
}

In reality you would either write a functor that does exactly what you need, inheriting from std::unary_function, or you’d just use a for loop because use of all these binds and can get really ugly really fast. Compare the above to the alternative version you can do with boost::bind or with the new std::bind.


std::vector<employee> find_all_that_make_50k(std::vector<employee> const& employees)
{
    std::vector<employee> rval;
 
    // standard version...
    std::remove_copy_if(employees.begin(), employees.end(), back_inserter(rval), bind(less<int>, bind(&employee::salary, _1), 50000));

    // boost version...
    std::remove_copy_if(employees.begin(), employees.end(), back_inserter(rval), bind(&employee::salary, _1) < 50000);

    return rval;
}

This bind construct is even more interesting that this. Say I have an object that will call a function when something happens and I want it to call a method on an object that only takes some of the arguments the other object supplies? In other words, consider what I would do here in this C++03 code:

struct does_something
{
    void set_callback(void (*)(int, double));
};

struct can_respond
{
    void functionA(double);
};

There’s really no reasonable way for me to bind those two things. The best kind of thing one might come up with is to use the observer pattern:

struct does_something_observer
{
    virtual ~does_something_observer() {}
    virtual respond(int, double) = 0;
};

struct does_something
{
    void set_observer(does_something_observer *);
};

struct can_respond : does_something_observer
{
    void function(double);
    void respond(int, double d) { function(d); }
};

This option is suboptimal for three reasons:

  1. It requires we couple can_respond to part of the interface of does_something by inheriting from its observer
  2. We can’t change constness. If we make the callback function const, then subclasses that change in response have to cast it away (dangerous). If we make it non-const then we can’t use const observers.
  3. We force can_respond to implement a signature it’s not interested in, it doesn’t care about the int

Furthermore, most observers have many signatures in them…which is why the Java API has “Adapters” you can inherit from when you only want to listen to one part of the signal interface. Not using this adapter requires that you implement all the functions in the observer interface even if only to make them empty. It’s just a bunch of extra garbage code that fills up the brain (because it has to read it and know why these empty functions are there) without doing anything.

However, when we add an arbitrary function object type like boost::function (which I describe on my old blog — go to “W0t” and follow there) we can get rid of both of these problems:

struct does_something
{
    void set_responder(function<void(int,double)>);
};

struct can_respond
{
    void function(double);
};

does_something ds;
can_respond cr;

ds.set_responder(bind(&can_respond::function, ref(cr), _2));

The great thing here is that we’ve created a 2 argument function object out of a binding that will only use 1 of those arguments to call a function that only accepts one of them. The binder object created by bind can take an arbitrary, implementation defined amount of arguments and they can be anything. Only upon being called (well, when the call has to be compiled) does the system check whether or not the target function can take the arguments supplied, whether they match types, or whether they can be converted and how to do it. Basically, when you assign the binder to the function object by calling set_responder it will attempt to compile operator () with the two arguments that function object’s signature represents. This then will decide what happens.

Using these concepts and adding some loop processing and connection management, the boost::signals and signals2 libraries create a signal/slot mechanism that really surpasses any of the alternatives in its expressive power and in helping keep the coupling to a minimum. Unlike any other event marshaling system, which all require adherence to some specific signature and/or inheritence from a particular object, this system allows you to connect events to any object that can respond to them…ignoring or accepting any information provided by the event source.

THAT is pretty darn cool and brilliantly expressive. It wouldn’t be even half as great without the arbitrary binder. How was this binder created? Stay tuned for the next episode of “Arbitrary Binding”: “How to write your own arbitrary binder”

MSVC bug that’s constantly getting me

Posted in C++, cpp, Rant on April 20, 2011 by Crazy Eddie

This is partially a rant and partially instructional. I’ve just gotten bitten by this compiler bug, AGAIN, and I’m a little frustrated with it because it took a good 2 hours to find (involved template instantiations and the damn compiler never showed me the actual line of code trying to instantiate the template). As is the case with most MSVC bugs, this one involves templates.

Consider this code, what is the output you’d expect?

#include <iostream>

struct base
{
  template < typename T >
  base(T const&)
  {
    std::cout << "templated constructor\n";
  }

  base(base const&)
  {
    std::cout << "copy constructor\n";
  }
  base()
  {
    std::cout << "default constructor\n";
  }
};

struct derived
  : base
{
};

int main()
{
  derived d;
  derived d2(d);
}

If you expect the construction of ‘d’ to cause “default constructor” and the construction of ‘d2’ to output “copy constructor”, you’d be absolutely right to expect that. This is supposed to be how it works. A template constructor can never be a copy constructor and the copy constructor of a derived class is supposed to defer to the *copy* constructor of the base before building its own bits. Unfortunately, though you’re right to expect this behavior, you’d be absolutely wrong about it actually happening because MSVC is stupid here.

What you’re actually going to get from that code is a call to the template constructor in base due to the copying of a derived. The MSVC compiler simply passes the derived type up the chain. When it hits a template constructor it says, “Hey, I’ve got a static type of ‘derived’ here, which matches T better than it does ‘base const&’.” BAD, MSVC, BAD!

The workaround is to write a user-defined copy constructor and explicitly perform the cast that the compiler should be doing to begin with:

struct derived
  : base
{
  derived() : base() {} // need this now too.
  derived(derived const& other) 
    : base(static_cast<base const&>(other))
  {}
};

If you can keep this workaround in memory at all times then you’ll be safe. If you’re like me and don’t instantly think of every workaround you need to insert into your code every time you’re going to run into a well-known compiler bug…you might spend more than a short period of time trying to figure out WTF called the template constructor when you know nobody did. Well, someone did….your broken compiler did.

If, like me, the compiler refuses to tell you what bit of code is calling the template (because there actually isn’t any, though it would be nice to see, “during compilation of default created copy constructor for…,” like GCC often does) you might declare a private copy constructor in anything that derives from said object and see if you get complaints about having no access rather that long tirades of template vomit leading to dead ends. I just did this and ran into a couple places where the objects I didn’t think where ever copied where being copied.

I can’t currently be bothered to find the link, but this has been posted to MS connect. They’ve known about it for a long time. It’s probably not ever going to be fixed. It’s been around at least as far back as 2005 and the above code exhibits the same problem in 2010.