API, threads, and smart pointers

7 minute read

What is the Smart Pointers’ semantic? How could this affect the asynchronous stage? Creating API needs always an extra effort looking for future advantages: easy to use, asynchronous facilities, reusable, SOLID principles, etc.

Problem

In this first entry, I would like start making a question on many minds: “Should I use a raw pointer, a reference, a smart pointer or just a simple local variable on the stack?”, and answer depends on what you want to say to whom will use your API or your code in the future.

There is a huge documentation about all smart pointers(SP) but anyone underlines the semantic beyond each of ones or how you can use to get better APIs.

Solution

Tool Scope Multi-thread Synchronization needed
T Local scope Safe No
const T Local scope Safe No
T& Multiple scopes Unsafe: Can't guarantee that variable is still alive between threads Yes
T* Multiple scopes Unsafe: Can't guarantee that variable is still alive between threads Yes
T* const Multiple scopes Unsafe: Can't guarantee that variable is still alive between threads No, it is lovely const
std::unique_ptr<T> Local scope Safe: It guarantees that variable is alive between threads No
std::shared_ptr<T> Multiple scopes Safe: It guarantees that variable is alive Yes
std::shared_ptr<const T> Multiple scopes Safe: It guarantees that variable is alive No, it is lovely const
std::weak_ptr<T> Multiple scopes, not extension Safe: It guarantees that variable is alive when you use std::weak_ptr<T>::lock() Yes
QSharedDataPointer<T> Multiple scopes for reading and local scope in writing Safe: It guarantees that variable is alive</code> No, but new copies when write

TL;DR or Why?

The wild: raw pointers

Raw pointers are not bad per se. One problem is that you could have memory leaks because you have forgotten some delete or a unexpected exception is thrown. However, the big issue is that you have no information about who is the owner of the raw pointer, in other words, who bears the responsibility of deleting it. Documentation of your API is the only way to solve this and sometimes developers do not read or follow that one (do you remember if you must free the array returned by strerror function?). Let’s look the following example:

Foo* my_function( int* array, int array_size );

Who is the returned pointer owner? Should I delete it after using it or is something internal? Raw pointers have no information about multithreading therefore we must use synchronization tools like mutex, semaphores, conditions…

The lone ranger: std::unique_ptr

As you probably know, this smart pointer assures memory will be freed when we go out of the scope. Our previous function will be something like:

std::unary_ptr<Foo> my_function( int* array, int array_size);

At first glance, future developers of your API will know what will be the scoped of the returned object. Moreover, we also increase the exception guarantee to “Basic exception safety” (a.k.a. no-leak guarantee).

But there are still more, when we use std::unary_ptr we also facilitate the transformation from single thread code to multi-threading because we are ensuring that the object is unique (its semantic value). A handicap is clear: std::unary_ptr can NOT be stored in containers, neither share them. But go ahead to our next guest.

As mum said “Sharing is good”: std::shared_ptr.

If you come from languages which use garbage collector, you will be very comfortable using std::shared_ptr. Perhaps you would think: “why are not std::shared_ptr used everywhere?” or “Will C++ be easier than Java?” Well, you know C++ guys: we don’t like a unique solution for everything. In fact, we need to be aware of the extra cost related to the internal shared counter and the double memory request( this last has been dampened by std::make_shared, but that is another history).

std::shared_ptr<Foo> my_function( const vector<T>& values );

Asynchronous uses are not easier than std::unique_ptr because we are sharing memory, so we will need to protect access using synchronous tools( mutex, semaphore, etc). Return an internal std::shared_ptr member is not a good idea as you could think. Let’s assume we have the following code:

class X {
  public:
    Y y() const;
  private:
    Y m_y;
};

These first option has a big problem: the “y()” function makes a copy from m_y each time we call it. If sizeof(Y) is big (or copy operation is expensive), it will affect the performance. A second option is return a const reference, isn’t it?

class X {
  public:
    const Y& y() const;
  private:
    Y m_y;
};

That choice has two issues at least:

  • It does NOT guarantee to avoid copies, because it depends on where we will store the result. In example:
X x1; 
const Y y1 = x1.y(); // copy operation.

instead of something like:

const Y & y1 = x1.y(); // No copy operation
  • Secondly, and more important, you are creating a strong dependency between your class structure and your API ( Source and Binary compatibility in API creation will be a future post). Well, let’s return a raw pointer or a shared_ptr
class X {
public:
  X(): m_y( make_shared<Y>())
  {}
  std::shared_ptr<Y> y() const;
private:
  std::shared_ptr<Y> m_y;
};

That solution allows us to return m_y in an efficient way, because NO deep-copy is realized. It just makes a light copy, maybe a couple of pointers. In this way, the Y copy complexity does NOT mind. We get a constant order copy operation.

It seems a good idea but unfortunately it does not. In this way, we lost the encapsulation: we can change the returned object from outside X, and of course, X object will not be notified. Java guys solve that using the clone() method when they want to avoid this situation. But we have the elegant copy-on-write pointer to deal with this kind of things.

QSharedDataPointer: Cheap encapsulation or Copy-on-write

This SP uses one of the best C++ capabilities: the const modifier. Initially, that smart pointer makes shadow copy in each copy operation, in a similar way that shared_ptr does it. However, when we try to change the object pointed by our smart pointer and it is not unique, a deep copy will be make (a.k.a detach operation). We have the best of both worlds!!!

The best example of this smart pointer is QsharedDataPointer from Qt framework. In Qt documentation you will find examples and information about how it works. You can also find other kind of smart pointers in Qt, like QExplicitlySharedDataPointer which will also be a well worth reading.

I have to say that some kind of issues, regarding to returning a object, has been covered by the new Move semantic in C++ 11. Nevertheless, if your legacy project does not allow you to use the newest compilers, have a look to those types. About multi-threading, this smart pointer is an absolute winner: it is automatically shared when it is just reading, and making a local deep copy when it is written.

The voyeur: std::weak_ptr

Until now, we are looking for keep the object alive through different scopes (shared_ptr), or link its life to a specific scope(unique_ptr) or just the raw pointer’s wildlife (no exception safety). But, What do I need if I just want to know if object is alive but I do not want to affect its scope? Sorry Heisenberg(Uncertainty principle) but we can do that.

To solve that we have the std::weak_ptr. It allows us to make a reference to a std::shared_ptr without expanding its scope. It means, when last std::shared_ptr is gone, its internal raw pointer will be deleted, no matter if there are any associated std::weak_ptr.

This behaviour is specially useful when you just want to monitor some data from another threads, i.e. when you want to check the status of long tasks working with huge result set from the database.

Conclusion

The importance of smart pointer is not just that they produce a safe environment to rise exceptions, nor they solve the resource management (RAII), nor you can write less code (other lenguages use finally to recovery from some complex cleaning phases, it just a workaround due the lack of destructor). No, the big advantage is their semantic value. Another developer, at a glance to the smart pointer type, is aware of the variable scope, its asynchronous constraints or what the API author kept in mind. And all of this without documentation.

Yes, Smart pointers generate safer code than raw pointers. Moreover, they also facilitate the code maintainability, and that is one of the greatest values for this industry. Last statistics show that software average lifetime is just 10 years. How many lines of code could a developer team generate along 10 years? What is the cost to add a new feature in that legacy code? What is my time-to-market?. Well, software maintainability responses that questions, and it is translated in cost contention or the difference between the rival company beat us or not. It is not a minor subject. The most important part: Your comments, your feels or your experiences

Categories:

Updated:

Leave a Comment