Does Karl Popper have anything to say on testing?

Posted in Uncategorized on May 24, 2012 by Crazy Eddie

I have noticed a tendency among many to hold what I consider a misunderstanding about testing, especially unit testing. For example, when I brought up the topic of testing critical sections, the first response and many after the fact seemed to focus on an inability to prove, through testing, that a multi-threaded component contains no bugs. Being the empiricist that I am, I tend to find this kind of response and this assumption about testing to be extremely questionable at best.

In some respects, and at a very theoretical level, computer programming is an a priori application of logic. Functions are mathematical constructs of pure logic that at least technically can be proven to be correct or incorrect through the rigorous application of predicate logic. In the older days I gather that this was the primary view of software development and programmers proved their code from start to finish.

Nobody does this today; nobody sane anyway. There are still a few adherents to the idea of proven code but instead of proving an entire program they prove individual functions. This seems a lot more realistic and practical, but it still runs into a fundamental problem with a priori understanding: the outcome of your proof is dictated more upon your understanding of the problem than on the reality of it.

Almost all developers in today’s landscape have learned that testing a product before releasing it is an absolute requirement. Many have also learned that you need to go even further than this and test each individual module or unit within the product as well. This is, however, a completely different approach from proofing and would be utterly unnecessary if proofing were possible or practical. Modern software engineering though has become way too complex to support the idea of proofing as a practical exercise and the difficulty in predicting behavior in parallel processing has perhaps made the idea ridiculous as well.

We don’t need to assert the impracticality or impossibility of proving a program at this point though. What only needs to be considered is how different the approaches between proof and testing are and to realize that while one method is entirely philosophical in its application, the other is empirical and scientific. Although math is considered a science by many, it is actually more in line with philosophical thinking and mathematical formulas, no matter how correct they seem…must always be checked against the real world to make sure the universe they model is the one we live in.

If proving your code is the philosophical approach then testing is the empirical one, the scientific one. I believe it worth reviewing then what this means in other sciences and what the philosophy of science itself says on the matter–unlike what Lawrence Krause might say I think the philosophy of science has done a great deal to improve the practice of science itself, especially the views of Karl Popper who did much to hone science into the workhorse it is today.

The definitive work on modern science and how it differs from pseudo-sciences like astrology was laid out in Popper’s article, Science as Falsification. Through it and other works he explains how science increases human knowledge not by showing what is true, but by showing what is false. We know that bricks of gold don’t fly not because we’ve proven that they don’t…but that we’ve shown that they do not. Relativity is viewed as true not because it’s been proven so, but because it’s the only theory that has not been proven false yet under most conditions. When a scientist comes up with a new theory it’s because they’re showing some previous theory wrong. When they test the theory they don’t go looking for ways to validate it, they look for ways to falsify it and if that cannot be done then the theory stands until it can be. This is the main reason why unfalsifiable statements are not scientific and I would go further to assert that they’re nonsense anyway.

This is the view that people should have toward software testing. When you’re writing your tests, be they integration, unit, or some other level of testing, you should focus not on proving that your code works but on showing that it doesn’t NOT work. Your code is your theory about how to solve a particular problem; your tests should attempt to falsify that theory. You should not view your tests as validating or justifying your code, but as a continuing effort to fix code that fails to work in some way.

With this in mind then it would become absurd to claim that tests prove code works or that tested code contains no bugs just like it is absurd to claim that any one of our scientific theories represents the whole and entire truth. Science works better than anything else for the very reason that it does NOT work in proofs nor make grandiose claims of absolute knowledge. Science works because the exercise of filtering away bad ideas, mistaken assumptions, and incomplete understanding sharpens our ability to manipulate the universe toward our own ends. Software development, as both a science and discipline of engineering, can and should be approached with the same understanding.

Arbitrary binding pt1 – What is boost::bind and why is it cool?

Posted in C++, Template Metaprogramming on April 22, 2012 by Crazy Eddie

The STL has always been a rather brilliant piece of generic coding. The principles applied created a rather wonderful set of tools that could be used for a variety of conditions without losing type safety and without introducing unnecessary coupling by forcing inheritance. The ability to write generic algorithms that work on any type matching a given concept really adds to the expressiveness and power of C++. Until the advent of boost::bind though, the actual, practical use of these generic algorithms was too painful to bother.

To understand what I mean by this claim, consider the following program that finds all employees that make more than 50k a year:

struct employee
{
    int salary() const;
private:
  // ...
};

// Write special function just to compare salary because binds don't nest:
bool makes_less_than(employee const& e, int amount)
{
    return e.salary() < amount;
}

std::vector<employee> find_all_that_make_50k(std::vector<employee> const& employees)
{
    std::vector<employee> rval;

    std::remove_copy_if(employees.begin(), employees.end(), std::back_inserter(rval), std::bind1st(std::ptr_fun(makes_less_than), 50000));

    return rval;
}

In reality you would either write a functor that does exactly what you need, inheriting from std::unary_function, or you’d just use a for loop because use of all these binds and can get really ugly really fast. Compare the above to the alternative version you can do with boost::bind or with the new std::bind.


std::vector<employee> find_all_that_make_50k(std::vector<employee> const& employees)
{
    std::vector<employee> rval;
 
    // standard version...
    std::remove_copy_if(employees.begin(), employees.end(), back_inserter(rval), bind(less<int>, bind(&employee::salary, _1), 50000));

    // boost version...
    std::remove_copy_if(employees.begin(), employees.end(), back_inserter(rval), bind(&employee::salary, _1) < 50000);

    return rval;
}

This bind construct is even more interesting that this. Say I have an object that will call a function when something happens and I want it to call a method on an object that only takes some of the arguments the other object supplies? In other words, consider what I would do here in this C++03 code:

struct does_something
{
    void set_callback(void (*)(int, double));
};

struct can_respond
{
    void functionA(double);
};

There’s really no reasonable way for me to bind those two things. The best kind of thing one might come up with is to use the observer pattern:

struct does_something_observer
{
    virtual ~does_something_observer() {}
    virtual respond(int, double) = 0;
};

struct does_something
{
    void set_observer(does_something_observer *);
};

struct can_respond : does_something_observer
{
    void function(double);
    void respond(int, double d) { function(d); }
};

This option is suboptimal for three reasons:

  1. It requires we couple can_respond to part of the interface of does_something by inheriting from its observer
  2. We can’t change constness. If we make the callback function const, then subclasses that change in response have to cast it away (dangerous). If we make it non-const then we can’t use const observers.
  3. We force can_respond to implement a signature it’s not interested in, it doesn’t care about the int

Furthermore, most observers have many signatures in them…which is why the Java API has “Adapters” you can inherit from when you only want to listen to one part of the signal interface. Not using this adapter requires that you implement all the functions in the observer interface even if only to make them empty. It’s just a bunch of extra garbage code that fills up the brain (because it has to read it and know why these empty functions are there) without doing anything.

However, when we add an arbitrary function object type like boost::function (which I describe on my old blog — go to “W0t” and follow there) we can get rid of both of these problems:

struct does_something
{
    void set_responder(function<void(int,double)>);
};

struct can_respond
{
    void function(double);
};

does_something ds;
can_respond cr;

ds.set_responder(bind(&can_respond::function, ref(cr), _2));

The great thing here is that we’ve created a 2 argument function object out of a binding that will only use 1 of those arguments to call a function that only accepts one of them. The binder object created by bind can take an arbitrary, implementation defined amount of arguments and they can be anything. Only upon being called (well, when the call has to be compiled) does the system check whether or not the target function can take the arguments supplied, whether they match types, or whether they can be converted and how to do it. Basically, when you assign the binder to the function object by calling set_responder it will attempt to compile operator () with the two arguments that function object’s signature represents. This then will decide what happens.

Using these concepts and adding some loop processing and connection management, the boost::signals and signals2 libraries create a signal/slot mechanism that really surpasses any of the alternatives in its expressive power and in helping keep the coupling to a minimum. Unlike any other event marshaling system, which all require adherence to some specific signature and/or inheritence from a particular object, this system allows you to connect events to any object that can respond to them…ignoring or accepting any information provided by the event source.

THAT is pretty darn cool and brilliantly expressive. It wouldn’t be even half as great without the arbitrary binder. How was this binder created? Stay tuned for the next episode of “Arbitrary Binding”: “How to write your own arbitrary binder”

Being friendly without being a slut

Posted in Uncategorized on October 6, 2011 by Crazy Eddie

So you’ve got some sort of problem that you cannot solve without friendship. Being the design oriented developer you are you know that this level of coupling is pretty major and best to be avoided, but you simply can’t avoid it; you absolutely need access to some internal functionality to be exposed to special classes. Knowing how bad it is, and not wanting to encourage developers working on your friends to start molesting you freely, you want to show just enough skin to get the job done. But C++ doesn’t provide any way to do this… or does it?

#include <iostream>

struct almost_friendly
{
  struct funktor
  {
    funktor(almost_friendly * t) : this_(t) {}

  private:
    void operator() ()
    {
      this_->something_private();
    }
    almost_friendly * this_;

    friend struct has_special_access;
  };

  funktor fun;

  almost_friendly() : fun(this) {}

private:
  void something_private() { std::cout << "Only has_special_access can call this.\n"; }
};

struct has_special_access
{
  void fun() { almost_friendly().fun(); }
};

int main() { has_special_access().fun(); }

MSVC bug that’s constantly getting me

Posted in C++, cpp, Rant on April 20, 2011 by Crazy Eddie

This is partially a rant and partially instructional. I’ve just gotten bitten by this compiler bug, AGAIN, and I’m a little frustrated with it because it took a good 2 hours to find (involved template instantiations and the damn compiler never showed me the actual line of code trying to instantiate the template). As is the case with most MSVC bugs, this one involves templates.

Consider this code, what is the output you’d expect?

#include <iostream>

struct base
{
  template < typename T >
  base(T const&)
  {
    std::cout << "templated constructor\n";
  }

  base(base const&)
  {
    std::cout << "copy constructor\n";
  }
  base()
  {
    std::cout << "default constructor\n";
  }
};

struct derived
  : base
{
};

int main()
{
  derived d;
  derived d2(d);
}

If you expect the construction of ‘d’ to cause “default constructor” and the construction of ‘d2′ to output “copy constructor”, you’d be absolutely right to expect that. This is supposed to be how it works. A template constructor can never be a copy constructor and the copy constructor of a derived class is supposed to defer to the *copy* constructor of the base before building its own bits. Unfortunately, though you’re right to expect this behavior, you’d be absolutely wrong about it actually happening because MSVC is stupid here.

What you’re actually going to get from that code is a call to the template constructor in base due to the copying of a derived. The MSVC compiler simply passes the derived type up the chain. When it hits a template constructor it says, “Hey, I’ve got a static type of ‘derived’ here, which matches T better than it does ‘base const&’.” BAD, MSVC, BAD!

The workaround is to write a user-defined copy constructor and explicitly perform the cast that the compiler should be doing to begin with:

struct derived
  : base
{
  derived() : base() {} // need this now too.
  derived(derived const& other) 
    : base(static_cast<base const&>(other))
  {}
};

If you can keep this workaround in memory at all times then you’ll be safe. If you’re like me and don’t instantly think of every workaround you need to insert into your code every time you’re going to run into a well-known compiler bug…you might spend more than a short period of time trying to figure out WTF called the template constructor when you know nobody did. Well, someone did….your broken compiler did.

If, like me, the compiler refuses to tell you what bit of code is calling the template (because there actually isn’t any, though it would be nice to see, “during compilation of default created copy constructor for…,” like GCC often does) you might declare a private copy constructor in anything that derives from said object and see if you get complaints about having no access rather that long tirades of template vomit leading to dead ends. I just did this and ran into a couple places where the objects I didn’t think where ever copied where being copied.

I can’t currently be bothered to find the link, but this has been posted to MS connect. They’ve known about it for a long time. It’s probably not ever going to be fixed. It’s been around at least as far back as 2005 and the above code exhibits the same problem in 2010.

Lambda capture and object lifetime

Posted in Uncategorized on April 6, 2011 by Crazy Eddie

Someone brought up an issue in comp.lang.c++ today that might be unexpected enough to some that it warrants discussion. Consider the following code

include <functional>
struct up_chuck
{
  int x;

  std::function<int()> get_fun() const
  {
    return [=](){return x;}
  }
};

int main()
{
  auto fun = up_chuck().get_fun();
  
  fun();
}

What do you expect to happen here? If you expect the lambda clause to capture a copy of ‘x’ and create a function that returns it then you’d be quite unfortunately wrong. The issue here is that member variables are not captured directly; the variable that is actually captured is ‘this’! What this means of course is that the lambda captures the ‘this’ pointer by value, making a copy of it, and then using that pointer to access ‘x’ and return it. We can prove this with a different main function:

#include <iostream>
int main()
{
  up_chuck uc;
  uc.x = 0;

  auto fun = uc.get_fun();

  std::cout << fun() << std::endl;
  uc.x = 42;
  std::cout << fun() << std::endl;
}

If you expect that the output of this program should be “0\n0\n” then I suggest you run it and verify that you’ll NOT get that output.

So then, what will happen to our original program if we compile and run it? Well, since the lambda makes a copy of the ‘this’ pointer and accesses its x variable to return it, the behavior of the program is undefined. Since the object that ‘this’ points to is a temporary that is destroyed as soon as the line that creates the ‘fun’ variable is complete, the function that ‘fun’ is now makes use of a dangling pointer.

So what can we do in order to avoid running into this problem? The one thing I can think of is to never, ever use default capture clauses in your lambdas. If we’d written our lambda and attempted to capture ‘x’ by value it would become clear that we’ve got a problem. Then we could assign that value locally and capture that new variable and everything would work fine. If we really do need to capture ‘this’ then we do so explicitly and we know that we’re going to have lifetime issues because we’ve captured a pointer. Thus never using the default capture syntax can at least point out some possible problems to us.

Name resolution and overloading.

Posted in Uncategorized on March 28, 2011 by Crazy Eddie

In C++ we’re allowed to use the same name for functions that take different parameter types and/or of different arity. For example:

void f();
void f(int);
void f(std::string);

The three versions of `f` above can coexist in the same program, within the same namespace or class, and don’t break the one-definition rule. They are different functions. This fact can often lead us to begin assuming that parameters are part of the function name. However, in C++ this is not the case and this assumption can often lead us to write code that does things we do not expect.

The truth of the matter is that all of the `f` functions have the same name: f. In most cases this detail can be ignored but sometimes it becomes very important. Consider for example this code:

struct type0 {};
struct type1 {};

struct base0 { virtual int get(type0) const = 0; };
struct base1 { virtual double get(type1) const = 0; };

struct test_object_ : base0,base1 {};
struct test_object : test_object_
{
  int get(type0) const { return 5; }
  double get(type1) const { return 42.666; }
};

int main()
{
  test_object o;
  test_object_ * op = &o;

  int x = o.get(type0());  // 1
  int y = op->get(type0()); // 2
} 

Can we expect that code to compile? The answer actually is no, it will not. The reason this code will not compile is that name resolution happens before overload resolution, and the scope of a name can make a huge difference as to whether overload resolution even applies. To explain a bit about this, lets take a look at the line labeled ’1′:

int x = o.get(type0());

In this case, the name `get` is looked up within the scope of test_object, a class. This means that member name resolution applies and because test_object declares one or more `get` names, the compiler stops there. This is called “hiding” and means that no `get` name within bases is never searched for. Now that the name `get` has been resolved to two possible functions within the same class declaration those two functions are candidates for parameter overload resolution. Since only one version of `get` actually works with the `type0` type, that version of `get` is called.

Next let’s consider the line labeled ’2′:

int x = op->get(type0());

This case is quite different even though it seems as though it should be exactly the same. In this case the static type of the class being used is not `test_object` but instead `test_object_`, its base. This class doesn’t declare any new names but gets all of its names from base classes. This being the case, `get` is not declared in `test_object_` and so the name resolution rules begin applying for base classes and it is here that things go wrong. Because `base0` and `base1` are unrelated, the name `get` is retrieved from both of these classes. One would think that this set of possible functions would be considered for overload resolution but the standard specifically says that they are not in 10.2/2:

If the resulting set of declarations are not all from sub-objects of the same type, or the set has a nonstatic member and includes members from distinct sub-objects, there is an ambiguity and the program is ill-formed. Otherwise that set is the result of the lookup.

All of this happens before overload resolution and even virtual function lookup! Thus the compiler never eliminates `get(type1)` from the list before barfing an error.

The trouble doesn’t actually end there either. The rules for overload resolution are quite complicated and can work in ways that even moderately advanced developers don’t expect: Expected conversion routine to be called

Why I hate having to work with Visual Studio

Posted in Rant on March 10, 2011 by Crazy Eddie

The fact that the MS C++ compiler is one of the worse on the market is bad enough. Random crashes, lockups, and intellisense taking complete control while also providing next to no actual benefit…annoys me but I’m used to it. What really bothers me though is its tendency to enter the Twilight Zone.

Today reminded me of this fact yet again. For no apparent reason every editor area in the program (luckily only one instance) has begun to:

* Place the edit cursor wherever you put it whether or not there’s anything there (well past the end of lines in other words)
* Ignore arrows, backspace, delete, etc…
* Pop up some kind of window menu when I type ‘-’.

This is the first time I’ve run into behavior this random, but I quite regularly run into a different kind of random craziness. Probably about 1-2 times a day one of the files I’m editing will randomly enter some strange mode where indentation doesn’t work and pressing ‘:’ puts you at the beginning of the line. Makes typing things out like “std::vector::iterator” more than a little annoying. Luckily simply closing the file and opening it again goes back into normal C++ mode from whatever insane mode it was previously in.

Follow

Get every new post delivered to your Inbox.