Archive for the cpp Category

Compile time strings with constexpr

Posted in C++, cpp on October 17, 2014 by Crazy Eddie

In Using strings in C++ template metaprograms, Abel Sinkovics and Dave Abrahams describe a set of macros to create strings for metaprogramming in C++. Since C++11 though there is another way, and with C++14 it’s made easier. This article describes that method.

I’m using C++14 features here. Namely not needing postfix decltype in auto functions. My compiler seems to have some issues with auto (it’s not the latest version) so I’m not using it everywhere I could, but where I do I’m not doing the whole -> decltype(expression) thing. It cleans up the code a LOT and is a lot simpler. You should be able to do this in C++11 though by adding those bits where necessary.

With this technique you get a string type that can be used is much easier ways for many of the same purposes. It can be used at both runtime and compile-time and at least in the cases I’ve tried and compilers I have used this version can be used with much longer strings. For example in previous attempts to implement fizzbuzz as a metaprogram I was able to only acheive very low numbers before reaching the limit of template parameters for my compiler. In the sample code for this article I acheived all 100 entries.

Prerequisite utilities: index generation

For many of the functions in this construct we need a variadic template pack of all legal index values in an array. On the stackoverflow C++ Lounge wiki a technique is described that allows this. I altered it slightly:


template < int ... Nums >
struct indices
{
    template < int I >
    static constexpr auto append(boost::mpl::int_<I>)
    {
        return indices<Nums...,I>();
    }
};

constexpr indices<> make_indices(boost::mpl::int_<0>)
{
    return indices<>();
}

template < int I >
constexpr auto make_indices(boost::mpl::int_<I>)
{
    using prev = boost::mpl::int_<I-1>;
    return make_indices(prev()).append(prev());
}

You can use these constructs to create a type that contains all indices for an array, and then you can use this in a pack expression to retrieve that list of indices as a variadic template pack and then use that to initialize an array.

Basic operations

Before moving on to the more interesting stuff I will briefly describe the fundamental definitions to get started with:

template < int Size >
struct string
{
    using data_type = char const (&) [Size+1];

    constexpr int size() const
    {
        return Size;
    }

    constexpr data_type c_str() const
    {
        return arr;
    }

    // ...
private:
    char arr[Size+1];
};

The most interesting thing here is the definition of data_type. For a great many things we could simply return char const* but in many cases it helps us a lot to get the raw array type out of that function. It will decay into a pointer when necessary. The rest of this stuff is pretty self explanitory: we define constexpr functions to retrieve the size and the actual string, and a place to put it.

The array needs to be Size+1 so that it can store the null terminator and thereby behave like a C string. This could be ommitted if you never wish to use it as a c-string, but in many ways it’s easier to enable that form of access.

Construction

Building this thing then becomes the first tricky part we run into. We want to be able to build this thing out of a c-style string. We also want to be able to construct a zero size string without parameters.

template < int Size >
struct string
{
    // ...

    constexpr string(char const (&str) [Size+1])
        : string(str, detail_::make_indices(boost::mpl::int_<Size>()))
    {}

    constexpr string() : arr{} {}

private:
    char arr[Size+1];

    template < int I, int ... Indices >
    constexpr string(char const (&str) [I], detail_::indices<Indices...>)
        : arr{str[Indices]..., '\0'}
    {}
};

Line 6 here is where interesting things start. Normally you’d not take a c-style string as an array but instead as a pointer to char. Here though we want the size of the array and in C++ strings in quotes are actually static arrays; they have a size we can fetch through the type system. To fetch that size we simply make it a template parameter. The rest is funky syntax to take an array by reference.

On line 7 we use the new delegating constructor syntax to defer to a private constructor that uses the indexing trick. So we call make_indices as the second argument to that constructor.

Line 15 begins the other side of the indexing trick. Here we say that we expect a pack of integers. On line 16 we accept the type in which that pack will expand, thus giving us the pack as part of the deduction system. Finally on line 17 we finish the indexing trick and use it to initialize our internal array.

The default constructor should probably only be available when Size==0 but I’m being lazy here. You could use enable_if to fix this or create a specialization for string<0>.

The final thing here though is that it would be nice to have a helper function so we don’t need to count characters and specify the size. It’s pretty darn easy:

template < int Size >
constexpr auto make_string(char const (&cstr) [Size])
{
    return string<Size-1>(cstr);
}

Already we have a lot of power. We can define constexpr strings quite easily and we can demonstrably use them in places that require constant expressions:

BOOST_AUTO_TEST_CASE(from_cstr)
{
    constexpr auto s0 = make_string("hello");
    BOOST_CHECK_EQUAL(s0.c_str(), "hello");

    using c = boost::mpl::char_<s0.c_str()[2]>;
    using s = boost::mpl::int_<s0.size()>;

    BOOST_CHECK_EQUAL(c::value, 'l');
    BOOST_CHECK_EQUAL(s::value, 5);
}

Concatination

I’m going to skip over the definitions for push_front and push_back and instead just talk about append. It’s the more thorough example, leaving the other two fairly easy to understand once you can follow through append.

Like the constructor we need to use the indexing trick:

template < int Size >
struct string
{
    // ...
    template < int I >
    constexpr string<Size+I-1> append(char const (&cstr)[I]) const
    {
        return append( detail_::make_indices(boost::mpl::int_<Size>())
                     , detail_::make_indices(boost::mpl::int_<I-1>())
                     , cstr );
    }
    // ...
private:
    char arr[Size+1];

    template < int ... ThisIndices
             , int ... ThatIndices
             , int I >
    constexpr string<Size+I-1> append( detail_::indices<ThisIndices...>
                                     , detail_::indices<ThatIndices...>
                                     , char const (&cstr) [I] ) const
    {
        char const newArr[] = {arr[ThisIndices]..., cstr[ThatIndices]..., '\0'};
        return string<Size+I-1>(newArr);
    }
};

It’s kind of funky looking but there’s no reason to pass on arr because we already have it. So we pass on the indices to reference both arrays and a reference to the array we want to append. We can’t just create a class variable or type to hold pre-created indices because the pack part is essential to the indexing trick and that is only available if we make it part of the function signature.

The interesting part here is on line 23 where the new c-string is created by merging the two inputs. Using the indexing trick again here twice–we can have two packs because they resolved inside clear boundaries–we just use array initialization to do so, just as with the constructor.

There’s an alternative to the newArray method. It requires a constructor that takes individual chars and it must be a variadic template. Because of this requirement I decided I didn’t much care for that constructor and so did it this way. You can see the difference in the github change history for string.hpp.

That’s basically the whole bit. It’s nice to have operators though:

template < int I0, int I1>
constexpr string<I0+I1> operator + (string<I0> const& s0, string<I1> const& s1)
{
    return s0.append(s1);
}

And now we have compile-time string processing! Check out the unit test:

BOOST_AUTO_TEST_CASE(operators)
{
    constexpr auto s0 = '[' + make_string("hello") + ' ' + make_string("world") + ']';

    BOOST_CHECK_EQUAL(s0.c_str(), "[hello world]");
}

That test uses some additional operations (push_front and push_back) that are not significantly different from the append operation.

Limitations

There are a few limitations to this construct. Using them in constexpr scope will eventually hit a wall of recursion limits. I’ve found though that this is much, much greater than the MPL string.

Another limitation is that even in constexpr functions parameters are never constexpr. Thus you can’t do anything like this:

template < int I >
constexpr void fun(string<I> str)
{
    using illegal = boost::mpl::char_<str.c_str()[0]>;
}

A final limitation is that different strings of the same length are not distinct types. This construct would therefor not be sufficient to implement the keys in my json library. Furthermore, because of the aforementioned issue where function parameters are never constexpr, there’s really no great way to turn them into types. Anything that might would necessarily follow the same sort of structure as Abel Sinkovics and Dave Abrahams’s version–namely it would need macros. However, for the purposes discussed in that article, such as creating a compile-time regex generator, this constexpr string type would be sufficient I’d think.

Conclusion

So, why would you use such a thing? I have no really great plan to. This was done more or less because I could and because the methods used are interesting. I’ve not initiated anything new here, but I’ve tied together a few different techniques. This string type also might be practically useful for something like the tasks mentioned in Abel Sinkovics and Dave Abrahams’s article.

As a test I did implement fizzbuzz as a constexpr string generator. I indeed hit a recursion limit wall but was able to work around it by making the string slightly shorter. It thus goes all the way from 1-100, unlike my previous attempts to create one with the MPL string where the upper limit was closer to 20 and required increasing some numeric defines before including MPL. It might be possible to remove some recursion by mixing constexpr and template metaprogramming–for example implement the index trick more like was done on the C++ Lounge site.

Obviously to make this construct practically useful in a non-painful way you’d want to add many omitted functions, the most glaring of which is an index operator.

The code for this blog is available on my github if you’ve not noticed already.

Advertisements

MSVC bug that’s constantly getting me

Posted in C++, cpp, Rant on April 20, 2011 by Crazy Eddie

This is partially a rant and partially instructional. I’ve just gotten bitten by this compiler bug, AGAIN, and I’m a little frustrated with it because it took a good 2 hours to find (involved template instantiations and the damn compiler never showed me the actual line of code trying to instantiate the template). As is the case with most MSVC bugs, this one involves templates.

Consider this code, what is the output you’d expect?

#include <iostream>

struct base
{
  template < typename T >
  base(T const&)
  {
    std::cout << "templated constructor\n";
  }

  base(base const&)
  {
    std::cout << "copy constructor\n";
  }
  base()
  {
    std::cout << "default constructor\n";
  }
};

struct derived
  : base
{
};

int main()
{
  derived d;
  derived d2(d);
}

If you expect the construction of ‘d’ to cause “default constructor” and the construction of ‘d2’ to output “copy constructor”, you’d be absolutely right to expect that. This is supposed to be how it works. A template constructor can never be a copy constructor and the copy constructor of a derived class is supposed to defer to the *copy* constructor of the base before building its own bits. Unfortunately, though you’re right to expect this behavior, you’d be absolutely wrong about it actually happening because MSVC is stupid here.

What you’re actually going to get from that code is a call to the template constructor in base due to the copying of a derived. The MSVC compiler simply passes the derived type up the chain. When it hits a template constructor it says, “Hey, I’ve got a static type of ‘derived’ here, which matches T better than it does ‘base const&’.” BAD, MSVC, BAD!

The workaround is to write a user-defined copy constructor and explicitly perform the cast that the compiler should be doing to begin with:

struct derived
  : base
{
  derived() : base() {} // need this now too.
  derived(derived const& other) 
    : base(static_cast<base const&>(other))
  {}
};

If you can keep this workaround in memory at all times then you’ll be safe. If you’re like me and don’t instantly think of every workaround you need to insert into your code every time you’re going to run into a well-known compiler bug…you might spend more than a short period of time trying to figure out WTF called the template constructor when you know nobody did. Well, someone did….your broken compiler did.

If, like me, the compiler refuses to tell you what bit of code is calling the template (because there actually isn’t any, though it would be nice to see, “during compilation of default created copy constructor for…,” like GCC often does) you might declare a private copy constructor in anything that derives from said object and see if you get complaints about having no access rather that long tirades of template vomit leading to dead ends. I just did this and ran into a couple places where the objects I didn’t think where ever copied where being copied.

I can’t currently be bothered to find the link, but this has been posted to MS connect. They’ve known about it for a long time. It’s probably not ever going to be fixed. It’s been around at least as far back as 2005 and the above code exhibits the same problem in 2010.

An easier way to select types

Posted in cpp, Template Metaprogramming on February 16, 2011 by Crazy Eddie

The Boost Template Meta-Programming Library is a very powerful library but there are some things that become necessary that can make things really tough to read. One example is having to nest if ‘blocks’ instead of having an ‘else if’. Considder

template < typename T >
struct match_type
  : mpl::if_
    <
      std::is_same<T, mpl::true>
    , double
    , if_
      <
        std::is_same<T, int>
      , char*
      , long
      >::type
    >
{};

Imagine that this went on for 5 checks or more. It becomes quite ungainly.

With this specific example we could use an mpl::map and things would be much easier. What if we where doing something else though? What if we really did need to run some metafunction on T in order to decide?

So I decided to create something a bit simpler to work with, a sort of “select/case” mechanism though at this time it doesn’t exactly match the same semantics as your standard switch statement. At any rate, here it is:


#include <boost/mpl/identity.hpp>
#include <boost/mpl/bool.hpp>
#include <boost/mpl/if.hpp>
#include <boost/mpl/lambda.hpp>
#include <boost/mpl/placeholders.hpp>
#include <type_traits>

using namespace boost;
using namespace boost::mpl::placeholders;

template < typename Test, typename Result >
struct case_ : Result 
{
};
template < typename Result >
struct default_ : Result {};

struct null_ {};

template < typename Switch
         , typename C1 = null_
         , typename C2 = null_
         , typename C3 = null_
         , typename C4 = null_ >
struct select_ {};

template < typename Switch, typename C1Test, typename C1Result, typename C2, typename C3, typename C4 >
struct select_<Switch, case_<C1Test, C1Result>, C2, C3, C4>
  : mpl::if_
    <
      typename C1Test::template apply<Switch>::type
    , C1Result
    , select_<Switch, C2, C3, C4>
    >::type
{
};

template < typename Switch, typename C1Result, typename C2, typename C3, typename C4 >
struct select_<Switch, default_<C1Result>, typename C2, typename C3, typename C4 >
  : C1Result
{};

// test code...

#include <iostream>
#include <typeinfo>


template < typename Var >
struct test_select
  : select_
    < Var
    , case_
      <
        mpl::lambda< std::is_same<_, mpl::true_> >::type
      , mpl::identity<double> 
      >
    , case_
      <
        mpl::lambda< std::is_same<_, int> >::type
      , mpl::identity<char*>
      >
    , default_< mpl::identity<long> >
    >
{};


// select as metafunction class...

template < typename C1 = null_
         , typename C2 = null_
         , typename C3 = null_
         , typename C4 = null_ >
struct select
{
  template < typename Switch >
  struct apply : select_<Switch, C1,C2,C3,C4> {};
};

// test select as metafunction class.
typedef select
    < case_
      <
        mpl::lambda< std::is_same<_, mpl::true_> >::type
      , mpl::identity<double> 
      >
    , case_
      <
        mpl::lambda< std::is_same<double, int> >::type
      , mpl::identity<char*>
      >
    , default_< mpl::identity<long> >
    > my_select;

// visually verify answer
int main()
{
  std::cout << typeid( test_select<double>::type ).name() << std::endl;
  std::cout << typeid( my_select::apply<int>::type ).name() << std::endl;
}

This could be easily adjusted to run the second parameter to ‘case_’ as a metafunction instead of simply grabbing a type.

Pointer Promises – what kind of things can we state?

Posted in cpp, Thinking Out Loud on January 30, 2011 by Crazy Eddie

Blog note: “Thinking Out Loud” categorized posts are little more than brain dumps that I am using to go through a thought process. Sometimes it helps to “talk” to someone in order to get your thoughts in some kind of order. That person doesn’t have to be real, they can be imaginary or, in this case, the general public who may or may not be remotely interested. Posting them does two things, allows me to get input from anyone so inclined and allows those who might be interested a chance to see what kinds of things someone like me thinks about when approaching a problem. So, take it for what it is.

Currently there are smart pointer libraries that express pointer scope and ownership but we have no kind of smart pointer types that express what you’re going to do with a pointer given to you. The closest two I can think of that might be considered to are auto_ptr and unique_ptr, simply because they assert that the object they govern can only be owned by one instance of their type. I ask you though, how might you write a function that asserts, in its interface, that the pointer being passed to it will be temporarily used, not copied, and that functions calling it are free to destroy the resource as soon as the function exits? I know of none.

We do have at least one precedent for desiring this kind of thing, the const keyword. You can use the const keyword as a member function modifier stating to all clients that, “This function will not alter the externally visible state of this object.” You can use the const modifier for reference and pointer parameters to state that, “This function will not attempt to alter the externally visible state of the object supplied as parameter.” Sure, it is given that you can break any of these promises but solid practices in C++ say that you both do not break such promises and that you do in fact make them so that clients can be aware of what you will and will not do as a side effect of your operations.

With all of that being said, I have decided that it’s time we had such a set of smart pointers. The question now is only what kind of statements do we want to express and what are the promises they make? Here’s the list of things I can think of that I might want to state about what I’ll do with a pointer supplied to me as a parameter:

  • I might try to make a copy of the object being pointed at.
  • I just need to temporarily use the object being pointed at, I’ll not retain reference or pointer to it.
  • I need to retain a pointer or reference to the object being pointed at, but I’m not going to take ownership of it.
  • I am going to retain a pointer or reference to the object being pointed at and need to retain sole ownership of it.
  • I am going to/may retain a pointer or reference to the object being pointed at and will share ownership of it with anyone else who is also retaining a pointer or reference.

At this point I’m of the opinion that the first statement isn’t really a promise about a pointer but a side effect of the pointed to object’s interface; thus it has no business being specified by a smart pointer library. The last two statements seem to already be taken care of by unique/auto_ptr and shared_ptr respectively. This leaves only the middle two.

The promise being made by the first of those two statements (type A ptr) would seem to me to be:

  1. I’m not myself going to make any copy of the pointer that is assigned to any variable that outlives the scope of this immediate call. and…
  2. I’m not going to pass this pointer to any function that does not make this same promise.

The promise being made by the second of those two statements (type B ptr) would seem to me to be:

  1. I need to be able to validate the validity of the pointer being referenced because I may outlive the object that owns it.
  2. I will never delete the object being pointed at by this pointer.
  3. I will not pass this pointer to any function that expects to hold ownership of it of any kind, shared or otherwise.

This would imply that you can pass one of the latter pointers to a function accepting one of the former, but not visa-versa. In other words, if you promise to not keep a reference to the object then you can’t pass to a function that doesn’t make that promise, but if you haven’t made that promise you can pass the pointer to one that does. In order to complete the rule set, we need to analyze the promises made by the last two, the ones that are already being implemented by existing smart pointers.

If you’re a function that claims to steal ownership (type C ptr) then the following statements are being made:

  1. Once I’m done, you don’t have to delete this object.
  2. Once I’m done, you’d better NOT delete this object.
  3. I promise to delete this object before I’m destroyed.
  4. You need to ask me for a pointer to this object because I might have destroyed it already.

If you’re a function that claims to share ownership of the object (type D ptr) then the following statements are being made:

  1. You have guaranteed, continued access to the object after making a call to this function iff you’ve kept shared ownership as well.
  2. You may have continued access to the object if you request it from the object this function is a member of.
  3. You may have continued access to the object if some other object besides this one has retained shared ownership of it.
  4. Nobody can steal sole ownership of the object.

At this point we can decide what kind of expressions the various kinds of pointers can be used in:

  • A type ‘A’ pointer appears to be the most restrictive. It thus can’t be passed to anyone expecting any of the other kinds of pointer. It can only be used in a type ‘A’ expression.
  • A type ‘B’ pointer is less restrictive (allows the client to do more) but poses more requirements on the caller. It can be used in a type ‘A’ expression and/or type ‘B’ expression but neither a type ‘C’ nor a type ‘D’ since they claim to have ownership responsibilities.
  • A type ‘C’ pointer takes ownership from the caller. It does not claim that it will never give it up nor destroy the pointer immediately. It can be used in type ‘A’, ‘B’, and ‘C’ expressions. It can give up ownership in a type ‘D’ expression.
  • A type ‘D’ pointer shares ownership. It can only destroy the object if it’s the last one left, which clients don’t know. It can be used in a type ‘A’ expression because they don’t make copies. It can be used it a type ‘B’ expression because they don’t take any kind of ownership. It can be used in another type ‘D’ expression because ownership is shared. They cannot be used in a type ‘C’ expression because they are not free to give up ownership.

We are left now with two issues left unexamined: raw pointers, references, and pointer returns. Let’s examine the first two together and leave the last for later. It would seem that examining the first two might give us some insight into the last.

The system being proposed would really be best served and used if neither raw pointers nor references even existed. Nothing can really be said about the former and use of the latter provides no guarantees either even if we are left with some extra requirements. Thus the best answer would seem to never, ever, allow converting raw pointers or references into smart pointers, or visa-versa. Unfortunately, this is impractical and ultimately impossible to implement. The second best answer is to provide explicit casts to/from smart pointers to the raw equivalents. The developer and team can then best decide whether to allow raw pointers at all and when/if to use references.

My advice would be to disallow the retaining of pointers or references to any kind of reference parameter. A reference could then be considered a type ‘A’ pointer or expression. It might, in fact, make sense to implicitly allow such use. Even more interesting, we might consider using a policy to differentiate between cases when any of the above pointer types can be NULL since this is one of the common reasons to use references to begin with (you can’t create a null reference within defined behavior). Even if we totally remove the need for references in this manner though there’s still other considerations such as legacy libraries and maybe even performance issues that I’m not fully aware of.

We do know though that a reference cannot form a type ‘C’ or type ‘D’ pointer. You cannot transfer ownership of a reference. You could pass a reference to a pointer that you could then take the address of and steal ownership of IT, but this is an inherently unsafe system that is prone to developer error. The whole point of this library is to avoid developer error (otherwise why even have smart pointers at all?) and to make it easier to use things the right way, and more difficult to use them the wrong way. Thus it would certainly make no sense to provide any type of conversion, implicit or otherwise, from a reference to a type ‘C’ or ‘D’ pointer.

The only one remaining is the type ‘B’ pointer. In this case it IS useful to use a reference as such a pointer, since ownership is not being transferred, but a reference doesn’t have enough information within it to implement one of the proposed requirements of the incoming data: namely that we can check that it’s valid before trying to use it. However, it certainly makes complete sense to pass a member variable to a function that claims it will retain a pointer to that variable but will not attempt to destroy it. What we need then is a method by which we can express this kind of use and none of our existing objects would appear to so far.

It could be argued that a shared_ptr system can be used to implement the terms of a type ‘B’ expression or pointer. Any object that holds a member can declare that member as a shared_ptr instead of a basic member and can then pass it off to functions that expect a weak_ptr parameter. However, there are two issues with this:

  1. We can’t use basic member variables for a type ‘B’ expression so implemented without hacks like supplying a “null-op deleter” to the shared_ptr. This is of course extremely dangerous and completely breaks the semantics of the shared_ptr construct.
    We can’t stop anyone from taking shared ownership of the data. In many cases this doesn’t matter much, but in others it indicates a serious error if information about an object outlives the object itself.

Noticing the fact that the shared_ptr construct in fact implements all the requirements of a type ‘B’ pointer though is not a useless comparison. We know that we can implement a type ‘B’ pointer in terms of a shared_ptr, we just need to add further restrictions that also allow us to use things like a “null-op deleter” safely. These facts may allow implementing a type ‘B’ pointer to be very quick and easy.

The final issue then regarding promises is return types. What are the various terms we would like to express when providing access to information about the state of a given object? What kind of information do we need to provide clients when handing out pointers? I can think of the following:

  1. I am giving up ownership to this pointer right now. You will either take that ownership over or the data being pointed at will be instantly destroyed.
  2. I am giving access to a pointer I may or may not be retaining some amount of interest in. I am thus going to give shared ownership of it so that others may keep it if they desire.
  3. I am insisting that I retain sole ownership of this data but am providing access to it.

These of course obviously coincide with type ‘C’, type ‘D’ and type ‘B’ pointers respectively. The first two can be implemented by the exact same objects, the last needs to be implemented by something special. You might wonder now why there is no type ‘A’ equivalent in this list; I propose that it makes no sense. The return of a function MUST express what kind of ownership is being retained by the object being called. A type ‘A’ expression does not indicate this kind of information, it only stipulates that there is no interest in ownership whatsoever.

All of this of course totally fails to address issues like multi-threading. It seems to me that such issues are best resolved with policies. A non-thread safe implementation of course could not be used without explicit cast in a thread-safe context but visa-versa would be fine. A thread safe interface would require locks though and objects like pointers to owned members would pose interesting dilemmas. Unfortunately, my experience in this area is literally non-existent; I think I’ve written maybe two programs that even USED threads and that’s far from enough to familiarize yourself with the various problems. Perhaps at a later date I’ll try to consider how MT would effect the various constructs.