Pimplcow

Posted in Uncategorized on September 13, 2014 by Crazy Eddie

This article briefly describes a simple little extension of the pimpl idiom that I’ve called “pimplcow”.  It’s not a unique idiom, I know at least one other developer/group that has done the same thing.  I invented it independently though and at least for a while thought it uniquely mine.  Perhaps I can lay claim to the name :p

The basic gist of the idiom can be shown with shared_ptr:


struct object
{
    int get() const;
    void set(int);

    object();
    object(object const&);
    object& operator = (object);

private:
    struct impl;
    std::shared_ptr<impl> pimpl_;

    impl * pimpl();
    impl const* pimpl() const;
};

The declaration of the class isn’t really anything new or interesting except perhaps for the pimpl function and the shared_ptr. The pimpl() function is actually useful outside of this idiom as it allows you to easily copy the constness of your object into the constness of your impl, which generally should match but pointers in C++ change things a bit. The implementation though is where things get a touch interesting:


struct object::impl { int i; };

object::object() : pimpl_(new impl()) {} // or make_shared
object::object(object const& other) : pimpl_(other.pimpl_) {} // shallow copy!
object & object::operator = (object other) { pimpl_.swap(other.pimpl_); }

int object::get() const { return pimpl()->i; }
void object::set(int i) { pimpl()->i = i; }

// Now for the tricky parts...

// For const operations we just return the pimpl we have.
object::impl const* object::pimpl() const { return pimpl_.get(); }

// For non-const operation we assume the object is about to be modified and we now copy the pimpl...maybe.
object::impl * object::pimpl()
{
    if (!pimpl_.unique()) pimpl_.reset(new impl(*pimpl_));
    return pimpl_.get();
}

This now implements the copy-on-write optimization in a VERY easy way. In fact, if you write your pimpl based classes in a sort of simple to follow standard way you end up having to change only two functions to switch between pimplcow and simple pimpl:


object::object(object const& other) : pimpl_(new impl(*other.pimpl_)) {}

object::impl * object::pimpl() { return pimpl_.get(); }

A further optimization recognizes that `shared_ptr` is actually a rather heavy-weight object when you know for certain you’re using shared pointer semantics, such as in the case of pimplcow. So instead you use an invasive reference count pointer such as `boost::intrusive_ptr`.

This construct is really simple and can save a lot of memory in some situations. It can also greatly reduce CPU use if your objects would otherwise be copied a lot but not changed. Yet further you can use value semantics instead of shared pointers to optimize against copying. It’s much, MUCH simpler to think about and use value types than shared pointers when the only thing you’re trying to do is avoid copies. Finally, in these conditions you must already be paying the cost of atomic increment/decrement in the reference count, which is the one thing you need to worry about here.

To close a bit about the reference count. You need to be weary of and measure the effect of atomic operations needed to increment and decrement the reference count for the shared pointer. If you’re making huge numbers of copies and/or destroying many values when you’ve not called a non-const member and thus doing the real copy, you can end up with a LOT of contention between CPUs. The pimplcow idiom is thread safe if you use a thread-safe reference counter, but in some cases it may actually be cheaper to make a real copy than to defer that copying and use a reference count. You’ll also want to avoid taking your parameters by value when you don’t need to because this will inject this atomic (or mutex lock) protected reference count manipulation, and that can indeed be a heavy cost in some cases.

Final close note: You can use this same sort of pattern to implement a flyweight quite easily. Use the usual pimplcow versions of copy and pimpl(), but additionally do a search into a flyweight manager to find an existing identity for the pimpl after you modify it. I have implemented both of these idioms in a sort of pimpl framework library in my github experiments repository: https://github.com/crazy-eddie/experiments/tree/master/performance – I make no claim that this experiments repository will be in a useable state when/if you chose to clone it. In fact I know right now that the pimplvector stuff is completely broken.

Final, final note: This idiom/pattern is indeed thread safe, but only if values are not shared among threads. So long as each thread has its own “copy” of the value you need not worry about any race conditions. This makes the idiom easy to implement and use. If you on the other hand need to share single copies of a value in multiple threads then you need to add further syncronozitation. This is not really unusual and so I don’t consider it a flaw, but you do need to be aware of it because of course bad things will happen if you use it incorrectly and cause a race condition.

Beginning Template Meta-Programming: Introducing meta-functions

Posted in Template Metaprogramming on June 28, 2013 by Crazy Eddie

I intended this to be an article about “arbitrary binding” in template meta-programming, but I found that there was much I needed to introduce in order to do so and the article was becoming quite lengthy without actually providing much.  Thus I’ve decided to begin a series on template meta-programming that will introduce the topic and expand into more complex subjects, partial application being one of them.

Some history – expression templates

Meta-programming in C++ was pretty much a total accident.  The template system was defined without meta-programming in mind.  For many years in C++ development, and unfortunately still not as rare as it should be, developers were afraid of templates and in some cases avoided them entirely.  More than one team would have coding standards that dictated that no developer use templates.  The problem was that many compilers did not implement templates very well and in fact there are still issues with some of the more commonly used compilers.  If you chose to go down the road of becoming efficient at Template Meta-Programming (TMP) you are sure to run into a few compiler bugs.

Even so though, some brave souls embraced templates and came up with some very interesting constructs.  The STL is but one of the more fascinating examples.  Many of the ideas introduced there indicated that there was much potential for expressive power in templates.  One very early example of techniques that hinted at TMP though are expression templates.  If you search through the Internet for this term you’ll find a great many excellent articles and papers.  Expression templates are a powerful concept that can be used to reduce the amount of temporaries created during calculations of complex objects such as matrices.  The idea is that operators, such as addition, don’t actually perform addition at all but instead create an object that implements the functions that perform that task.  This is in some ways hinting at lazy evaluation except that (until C++11 changed the meaning of auto) nobody in their right mind wants to declare the type that would be necessary to store that lazy evaluator in a variable.  Instead, the act of assignment or access would perform all the calculations in the complete expression leaving out that which is unnecessary, encoding the evaluation of each cell into a single expression, and eliminating the need for temporary objects.  What follows is a very brief example (the rest of the operators are left as an exercise):

// represents a delayed addition operation.
// Represents a result but isn't itself one.
template < typename Right
         , typename Left >
struct matrix_addition
{
    matrix_addition( expr_matrix const& lh
                   , expr_matrix const& rh)
        : m1(lh)
        , m2(rh)
    {}

    int get_value(int x, int y) const
    {
        return m1.get_value(x,y) 
             + m2.get_value(x,y)
        ;
    }

    operator expr_matrix() const // perform the addition and return the result.
    {
        expr_matrix tmp;
        for (size_t i = 0; i < 3; ++i)
        {
            for (size_t j = 0; j < 3; ++j)
            {
                tmp.assign_value(i,j, get_value(i,j));
            }
        }
        return tmp;
    }

private:
    expr_matrix const& m1;
    expr_matrix const& m2;
};

// override '+' to return addition result representation.
template < typename Right 
         , typename Left >
matrix_addition<Right,Left> operator + (Right const& rh, Left const& lh)
{
    return matrix_addition<Right,Left>(rh,lh);
}

BOOST_AUTO_TEST_CASE(check_expr_approach)
{
    expr_matrix m1;
    expr_matrix m2;

    fill_test_matrices(m1,m2);

    // Creates a full result through implicit casting.
    expr_matrix result = m1 + m2;

    check_test_matrices(m1,m2,result);

    // The only additions performed here are on [0,0].  None of the others are performed.
    // A complete summation of the matrixes used is not done.
    BOOST_CHECK((m1 + m2).get_value(0,0) * 2 == (m1 + m2 + result).get_value(0,0));
}

Expression templates are an example of an embedded Domain Specific Language (DSL). They turn expressions involving the matched types and operator overloads into a set of recursively defined types that represent an abstract syntax tree.

Our first “pseudo” meta-function

One of the first examples of a metafunction generally introduced is the factorial function.  I’m going to call this a pseudo metafunction because it doesn’t match any defined protocol for creating a consistent language for template metaprogramming.  As with functional programming in general, this compile time factorial calculator is recursive in nature.  In something like Haskel or Scala you would use pattern matching on the argument to invoke a different calculation for the argument should it be general or should it be your base case.  In TMP you do this with template specialization:

template < size_t I > // general case
struct pre_tmp_fact
{
    // recursively invoke pre_tmp_fact
    // and multiply the result with the current value.
    enum { value = I * pre_tmp_fact<I-1>::value };
};

template < > // base case specialization
struct pre_tmp_fact<0>
{
    // 0! == 1
    enum { value = 1 };
};

BOOST_AUTO_TEST_CASE(first_example)
{
    BOOST_CHECK(pre_tmp_fact<5>::value == 5 * 4 * 3 * 2);
}

Defining a protocol

In C++ Template Metaprogramming, Abrahams and Gurtovoy define a protocol for implementing a consistent language for template metaprogramming in C++.  That protocol involves three basic definitions:

Meta-Function
A meta-function is a template that takes types as parameters and ‘returns’ a type in the form of an internal type typedef.
Value Meta-Function
Value meta-functions are metafunctions that represent values. They have all the characteristics of a meta-function and also an internal value member. They are “identity meta-functions” in that they return themselves as their type. If you create instances of them they are generally convertible to the underlying raw value type they represent, with the appropriate value of course.
Meta-Function Class
Since a meta-function is not a type it cannot be passed as a parameter to another meta-function. Therefor an additional concept, that of meta-function class, needs to be defined. It is a type that has an internal meta-function called apply.

Our first meta-function

With the above in mind, we can recreate our factorial metafunction using these concepts.  In this article I’ll implement very basic, minimally functional integer wrapper meta-functions.  In the future I will use the Template Metaprogramming Library (MPL), which is part of the boost library set.  It’s worth reviewing here, for instruction purposes, but it’s generally easy to use what other people have built and tested than reinventing the wheel.  This time though, we’ll reinvent the wheel, we just won’t add all the spokes and will not true it:


// integer value wrapper metafunction.
template < int I >
struct int_
{
    typedef int_<I> type;
    enum : int { value = I };
};

BOOST_AUTO_TEST_CASE(value_mf)
{
    // Has a value...
	BOOST_CHECK(int_<5>::value == 5);
	// Is an identity metafunction...
	BOOST_CHECK(int_<5>::type::value == 5);
	// This recursion is OK because the compiler won't instantiate until requested.
	// Can go on forever.
	BOOST_CHECK(int_<5>::type::type::value == 42);
}

// This is not how boost's MPL does it!

// Forward declaration of metafunction
template < typename T >
struct prev;

// Specialize for int_ wrappers.
template < int I >
struct prev<int_<I>>
{
    typedef int_<I-1> type;
    enum { value = int_<I-1>::value };
};

BOOST_AUTO_TEST_CASE(prev_int)
{
    BOOST_CHECK(prev<int_<5>>::value == 4);
}

// General case.
template < typename T >
struct tmp_fact
{
    typedef tmp_fact<T> type;
    enum { value = T::value * tmp_fact<typename prev<T>::type>::value };
};

// Specification for base case.
template < >
struct tmp_fact<int_<0>>
    : int_<1> // You can subclass the result sometimes, to be preferred.
{};

BOOST_AUTO_TEST_CASE(fact_check)
{
    BOOST_CHECK(tmp_fact<int_<5>>::value == 5 * 4 * 3 * 2);
}

Moving forward

If you are interested in continued study of template meta-programming in C++ then I suggest purchasing the book, C++ Template Metaprogramming.  I will also be continuing forward with some other instructional material in future blogs, including implementing named parameters and partial evaluation (arbitrary binding) in templates.  This has been but a rudimentary look at the very basics that will be needed by future blog entries.

I might suggest also playing with the above code and trying to improve it. There are a great many ways in which it can be. The factorial metafunction for example could use tail recursion and could even use subclassing all the way through. It is much easier to read and work on a metafunction that can be defined in terms of other metafunctions through simple inheritance, so unless you must do otherwise to leverage lazy evaluation (any time you can avoid explicitly refering to type within a metafunction you’re doing lazy evaluation) it should be preferred. Often this requires decomposition of more complex metafunctions into smaller, simpler ones. This is of course, like with your usual code, never a bad thing.

GotW #91

Posted in C++11, GotW on May 31, 2013 by Crazy Eddie

I can’t seem to post to Herb Sutter’s blog, my comments just never appear and ones I’ve made are gone.  Maybe he has me blocked for some mysterious reason.  So I thought I’d put my answer to the problems here.

Anyone that wants to write solid C++ should be reading his blog.  I suggest subscribing.

So, my answer to GotW 91 is:

#1 – Some extra locking and reference counting will take place. A new small object will be created on the stack. Not much will happen actually. The locking is the greatest concern.

#2 – I don’t believe it matters until threads are introduced. If you call this function via a thread spawning though you want to be sure that you do so by value. Doing it by reference can create a race condition in which you reduce the reference count while the pointer is being used or copied by the other side. When that happens it could be that you’re the last one holding the pointer and then the other side gets to have it’s resource yanked out from under it. This is bad voodoo.

#3 has made me want to come up with a set of smart pointer types that secure and ensure correct ownership semantics. In this case you’d take a pointer (by value) of the kind that specifies how you’ll use it. If only within scope then you’d specify that with type. It’s a pretty tough problem though…at least for me.

The answer to the question is one of three (given the types in the question):

1. Use reference – when in doubt, use a reference rather than pointer. If you don’t need a pointer, don’t use one.
2. Use unique_ptr by value – You intend to steal ownership of the pointer and you want this ownership transfered away from client code immediately.
3. The parameter could be null but you don’t want to steal it.  Use a raw pointer (for now at least…if I ever solve the above idea then you may have a smart pointer to use instead)

A case against naming conventions based on type

Posted in Uncategorized with tags on May 17, 2013 by Crazy Eddie

Many places I have worked have naming conventions that are, in my opinion, poorly conceived.  Naming conventions really should be limited to deciding between camel case, pascal case, and simply, “Give it a name that explains what it is supposed to be for.”  Unfortunately a lot of places go much further and do things like putting ‘m_’ before member variables, having different naming conventions for different classes of entities (Pascal for classes, underscore for functions, camel for variables, _t after typedefs, etc…).  These kinds of conventions miss the point of generic and object oriented programming and thus miss the point of C++ in a great way.

The issue is that what kind of thing an entity is doesn’t really matter so long as a certain interface is obeyed.  Take for example putting the “_t” postfix on typedefs, which many developers and firms do.  What happens now when someone comes along and decides that this type would be better implemented as a class of its own rather than a typedef of something else.  Since typedefs in C++ are weak this is not a surprising or all to rare thing to do.  Now in order to keep from breaking clients everywhere we need to follow the convention for naming classes and then add a typedef so that the “_t” name exists.  This is silly.

Another example comes up with regard to functors.  Functors are classes thats’ instances look and behave like functions that may have state.  If you’ve specified that variables must be named one way, and functions another, then in fact you’ve got a problem here because conceptually this variable is both a variable and a function!  If you replace a function with a functor someplace then again you are in the unfortunate position of having to violate naming conventions or do something silly like make a function to call the functor just so both names exist.

The C++ standard library has a pretty good C++ naming convention.  It uses one convention for almost everything with the caveats that template parameters are named differently (and this is important because of dependent name issues) and so are macros.  Both of these things require special consideration when using so it’s worthwhile to make them stand out.

In C++ types are strong at the time of compilation, but in the source they are interchangeable to a great degree.  So long as something obeys an interface, calling code shouldn’t have to change when you completely change the type some entity is.  Naming conventions that force you to change names based on type hinder this ideal.

CUDA without Visual Studio

Posted in Uncategorized on February 11, 2013 by Crazy Eddie

So I have a solid-state drive that I put my system on. It is only 55G in size and Windows 7 itself takes up over half of that, that’s before installing ANYTHING. Visual Studio takes up around 9G of drive space and though you can ask it to install on another drive, that only moves 1G…the rest INSISTS on being on your C drive.

I wanted to try out cuda development though and I don’t currently have a Linux install in anything but a VM (that will be remedied eventually). The CUDA development kit requires VS though.

It is possible though to get the compiler for VS only by downloading the SDK. The SDK can mostly be installed on another drive. Though there is stuff that insists on being on your system drive, it’s more like .5G rather than 6+. Even stuff that VS installs there and won’t anywhere else can be put on another drive this way. Of course, you won’t have the debugger or the idea…but in a pinch it works.

The next thing I had to argue about is getting the CUDA compiler part to recognize the configuration. I got a weird error saying it couldn’t find configuration “null” in the amd64 directory within bin of the SDK install. That is easily fixed by creating a file called “vsvars64.bat” in that directory that has a single line in it: “call drive:\path_to\SetEnv.cmd /x64″. The ‘path_to’ part will depend on where you installed the SDK.

Then you have to close that cmd window and start again for some reason–talking about the “Windows SDK Command Prompt” from the start menu. After this you can set the PATH to include your cudapath\bin and cudapath\open64\bin. You need to be able to run `nvcc` and `nvopencc`.

Once all this was done I was able to compile a basic cuda program from the command line with: `nvcc –cl-version 2010 –use-local-env file.cu`.

This took hours of pain, google research and forum posting. Hopefully the next person in my shoes can find this and it helps. Look forward to hearing success stories and otherwise.

“Universal design principle”

Posted in Uncategorized on January 17, 2013 by Crazy Eddie

In this video the presenter associates what he calls the Universal design principle to Scott Meyers. He also attributes him with the name. I can’t find another source on this but I have seen the principle passed around by others such as Robert C. Martin. The principle is, “Make interfaces hard to use incorrectly and easy to use correctly.” The presenter had a picture of a pestle and cup; the idea being that it’s really easy to use and can’t be used any other way.

That got me thinking and for the first time I think I believe that this principle is the wrong way to think about it. Why? Because being the kind of gutter thinker I am I immediately thought of other uses for that pestle…some places it could be stuck. I suppose this would be considered the “wrong” use and so it doesn’t follow the principle but wait!

What if you WANT to use it that way?

So I think that perhaps a more useful, a more defensive way of thinking about this principle is to say that, “The use of an interface in any way that works is ‘correct’.” In other words, regardless of the intention of the interface designer, if it can be used in a particular way then it’s a correct use of that interface. It then follows that it’s the duty of the designer to make sure that his or her intentions MUST be followed because they’re the only use that the interface supports.

We then get a sort of inversion of the principle that I think more clearly defines the intention of the principle: to make sure your interfaces are adequately specified and that implementations protect themselves from violations of that interface…BUT…your implementation must also handle correctly any correct use your interface allows. That way if someone decides they want to stick it where the sun doesn’t shine…then either it should not fit, or it at least shouldn’t break.

SFINAE — but sometimes it is.

Posted in Uncategorized on January 1, 2013 by Crazy Eddie

As I was writing up the code for my last article on bind expressions I actually learned something I had not expected and did not know before. As I often say, “Teaching is one of the best ways to learn,” and well, here’s a great example.

When I initially wrote the bind helper function I only had the one set:

template < typename Fun >
binder<typename return_of<Fun>::type, Fun, empty_list> bind(Fun f);

// further definitions for 1, 2, and 3 argument functions...

I then had a helper metafunction that would derive the return type:

template < typename T >
struct return_of { typedef typename T::type type; };

// one of many...
template < typename R >
struct return_of<R(*)()> { typedef R type; };

This worked just fine for function pointers and functors with a return_type subtype definition. The problem later came when I decided to add member function compatibility. This requires building an extra object to convert the calling convention to function syntax rather than member pointer. Here is one such function overload:

template < typename R
         , typename P1, typename A1
         , typename P2, typename A2 >
binder_t< R
        , mf1<R,P1,P2>
        , boost::fusion::vector<A1,A2> >
bind(R(P1::*f)(P2),A1 a1, A2 a2)
{
	typedef mf1<R,P1,P2> F;
	typedef boost::fusion::vector<A1,A2> L1;
	return binder_t<R, F, L1>(F(f), L1(a1,a2));
}

The assumption I had at this point, previous to compiling and after writing 20 lines of repetative overloads, was that the substitution of the member function pointer type in the previous overloads would cause a failure that the compiler would ignore because of SFINAE. Yeah…that’s not how it works.

There are a couple ways to consider why SFINAE will not apply here. The first is to look at 14.8.2 in the standard, note that these are the only allowed deduction failures that SFINAE can catch, and see that what’s going on here, instantiating something within another template that causes an error, is not in that list.

The more direct reason though is this: During the deduction of the types for the bind function we are instantiating another template. That template goes through its own deduction process and passes! There is no syntactic error caused by substituting a member function pointer into the template parameters for return_of. The deduction process is finished here and that template can be instantiated. Remember also that in C++ templates are only instantiated as they are needed. It is perfectly legal to instantiate a template class that has members that would cause syntax errors so long as you never use those members. So this process is finished and we are no longer within the realm where SFINAE even applies in the language with regard to return_of. We then instantiate type within this template and it has one…so again we can’t trigger SFINAE here because the type name exists! Now the instantiation phase happens for type and of course that results in the gibberish code R(T::*)(??)::type. This now is a syntax error, an ill-formed piece of code and we are well past the point of SFINAE, which happens only during deduction.

There are a few ways you can fix this issue. The first is what I chose in the article code: use T::type directly in the signature of my function so that its discovered during deduction where “substitution” actually occurs. This of course required that I write alternate overloads for function pointers.

Another way is to write result_of so that it won’t try define a type type member if type T doesn’t have one:

#include <boost/mpl/has_xxx.hpp>

BOOST_MPL_HAS_XXX_TYPE_DEF(type);

template < typename T, bool = has_type<T>::value>
struct return_of{};

template < typename T, true >
struct return_of { typedef typename T::type type; };

Now what happens in the original definition of bind is that the actual substitution going on is return_of<T> and then we’re trying to access type within that substitution, which fails. This is a substitution failure and it is not an error, it’s simply discarded as a viable overload.

Finally, one more way to fix this issue is to force the definition of return_of::type to occur during the deduction phase of return_of while the template parameter is being substituted in for T:

template < typename T, typename Type = typename T::type>
struct return_of { typedef Type type; };

Now the substitution failure is happening within the result_of deduction, which must be deduced to finish deducing bind. This substitution failure is also “not an error” and the whole works is discarded as a viable overload for bind

It’s an interesting little gotcha. Some people I’ve talked to claim that what I did was legal in C++03 but only due to unclear language in the standard, which has now been clarified. Others claim that the standard already defines what I first did as ill-formed. I tend to believe the latter as it seems to coincide with my interpretation, but either way you certainly cannot depend on it working and if it does you should simply ignore that fact as a fluke. If you have code that relies on this working, get working on replacing it with whichever of the above best fits the problem. If you’re just learning how to use SFINAE to do type introspection and function overloading…keep this in mind and be better armed with a better understanding of the mechanism.

Until next time, happy coding!

Follow

Get every new post delivered to your Inbox.

Join 26 other followers