API, threads, and smart pointers
What is the Smart Pointers’ semantic? How could this affect the asynchronous stage? Creating API needs always an extra effort looking for future advantages: easy to use, asynchronous facilities, reusable, SOLID principles, etc.
Problem
In this first entry, I would like start making a question on many minds: “Should I use a raw pointer, a reference, a smart pointer or just a simple local variable on the stack?”, and answer depends on what you want to say to whom will use your API or your code in the future.
There is a huge documentation about all smart pointers(SP) but anyone underlines the semantic beyond each of ones or how you can use to get better APIs.
Solution
Tool | Scope | Multi-thread | Synchronization needed |
---|---|---|---|
T |
Local scope | Safe | No |
const T |
Local scope | Safe | No |
T& |
Multiple scopes | Unsafe: Can't guarantee that variable is still alive between threads | Yes |
T* |
Multiple scopes | Unsafe: Can't guarantee that variable is still alive between threads | Yes |
T* const |
Multiple scopes | Unsafe: Can't guarantee that variable is still alive between threads | No, it is lovely const |
std::unique_ptr<T> |
Local scope | Safe: It guarantees that variable is alive between threads | No |
std::shared_ptr<T> |
Multiple scopes | Safe: It guarantees that variable is alive | Yes |
std::shared_ptr<const T> |
Multiple scopes | Safe: It guarantees that variable is alive | No, it is lovely const |
std::weak_ptr<T> |
Multiple scopes, not extension | Safe: It guarantees that variable is alive when you use std::weak_ptr<T>::lock() |
Yes |
QSharedDataPointer<T> |
Multiple scopes for reading and local scope in writing | Safe: It guarantees that variable is alive</code> | No, but new copies when write |
TL;DR or Why?
The wild: raw pointers
Raw pointers are not bad per se. One problem is that you could have memory leaks because you have forgotten some delete
or a unexpected exception is thrown.
However, the big issue is that you have no information about who is the owner of the raw pointer, in other words, who bears the responsibility of deleting it. Documentation of your API is the only way to solve this and sometimes developers do not read or follow that one (do you remember if you must free the array returned by strerror
function?).
Let’s look the following example:
Foo* my_function( int* array, int array_size );
Who is the returned pointer owner? Should I delete it after using it or is something internal? Raw pointers have no information about multithreading therefore we must use synchronization tools like mutex, semaphores, conditions…
The lone ranger: std::unique_ptr
As you probably know, this smart pointer assures memory will be freed when we go out of the scope. Our previous function will be something like:
std::unary_ptr<Foo> my_function( int* array, int array_size);
At first glance, future developers of your API will know what will be the scoped of the returned object. Moreover, we also increase the exception guarantee to “Basic exception safety” (a.k.a. no-leak guarantee).
But there are still more, when we use std::unary_ptr
we also facilitate the transformation from single thread code to multi-threading because we are ensuring that the object is unique (its semantic value). A handicap is clear: std::unary_ptr
can NOT be stored in containers, neither share them. But go ahead to our next guest.
As mum said “Sharing is good”: std::shared_ptr.
If you come from languages which use garbage collector, you will be very comfortable using std::shared_ptr
. Perhaps you would think: “why are not std::shared_ptr
used everywhere?” or “Will C++ be easier than Java?” Well, you know C++ guys: we don’t like a unique solution for everything. In fact, we need to be aware of the extra cost related to the internal shared counter and the double memory request( this last has been dampened by std::make_shared
, but that is another history).
std::shared_ptr<Foo> my_function( const vector<T>& values );
Asynchronous uses are not easier than std::unique_ptr
because we are sharing memory, so we will need to protect access using synchronous tools( mutex, semaphore, etc).
Return an internal std::shared_ptr
member is not a good idea as you could think. Let’s assume we have the following code:
class X {
public:
Y y() const;
private:
Y m_y;
};
These first option has a big problem: the “y()” function makes a copy from m_y each time we call it. If sizeof(Y)
is big (or copy operation is expensive), it will affect the performance. A second option is return a const reference, isn’t it?
class X {
public:
const Y& y() const;
private:
Y m_y;
};
That choice has two issues at least:
- It does NOT guarantee to avoid copies, because it depends on where we will store the result. In example:
X x1;
const Y y1 = x1.y(); // copy operation.
instead of something like:
const Y & y1 = x1.y(); // No copy operation
- Secondly, and more important, you are creating a strong dependency between your class structure and your API ( Source and Binary compatibility in API creation will be a future post). Well, let’s return a raw pointer or a
shared_ptr
class X {
public:
X(): m_y( make_shared<Y>())
{}
std::shared_ptr<Y> y() const;
private:
std::shared_ptr<Y> m_y;
};
That solution allows us to return m_y
in an efficient way, because NO deep-copy is realized. It just makes a light copy, maybe a couple of pointers. In this way, the Y
copy complexity does NOT mind. We get a constant order copy operation.
It seems a good idea but unfortunately it does not. In this way, we lost the encapsulation: we can change the returned object from outside X
, and of course, X
object will not be notified. Java guys solve that using the clone()
method when they want to avoid this situation. But we have the elegant copy-on-write pointer to deal with this kind of things.
QSharedDataPointer: Cheap encapsulation or Copy-on-write
This SP uses one of the best C++ capabilities: the const
modifier. Initially, that smart pointer makes shadow copy in each copy operation, in a similar way that shared_ptr
does it. However, when we try to change the object pointed by our smart pointer and it is not unique, a deep copy will be make (a.k.a detach operation). We have the best of both worlds!!!
The best example of this smart pointer is QsharedDataPointer
from Qt framework. In Qt documentation you will find examples and information about how it works. You can also find other kind of smart pointers in Qt, like QExplicitlySharedDataPointer
which will also be a well worth reading.
I have to say that some kind of issues, regarding to returning a object, has been covered by the new Move semantic in C++ 11. Nevertheless, if your legacy project does not allow you to use the newest compilers, have a look to those types. About multi-threading, this smart pointer is an absolute winner: it is automatically shared when it is just reading, and making a local deep copy when it is written.
The voyeur: std::weak_ptr
Until now, we are looking for keep the object alive through different scopes (shared_ptr
), or link its life to a specific scope(unique_ptr
) or just the raw pointer’s wildlife (no exception safety). But, What do I need if I just want to know if object is alive but I do not want to affect its scope? Sorry Heisenberg(Uncertainty principle) but we can do that.
To solve that we have the std::weak_ptr
. It allows us to make a reference to a std::shared_ptr
without expanding its scope. It means, when last std::shared_ptr
is gone, its internal raw pointer will be deleted, no matter if there are any associated std::weak_ptr
.
This behaviour is specially useful when you just want to monitor some data from another threads, i.e. when you want to check the status of long tasks working with huge result set from the database.
Conclusion
The importance of smart pointer is not just that they produce a safe environment to rise exceptions, nor they solve the resource management (RAII), nor you can write less code (other lenguages use finally
to recovery from some complex cleaning phases, it just a workaround due the lack of destructor). No, the big advantage is their semantic value. Another developer, at a glance to the smart pointer type, is aware of the variable scope, its asynchronous constraints or what the API author kept in mind. And all of this without documentation.
Yes, Smart pointers generate safer code than raw pointers. Moreover, they also facilitate the code maintainability, and that is one of the greatest values for this industry. Last statistics show that software average lifetime is just 10 years. How many lines of code could a developer team generate along 10 years? What is the cost to add a new feature in that legacy code? What is my time-to-market?. Well, software maintainability responses that questions, and it is translated in cost contention or the difference between the rival company beat us or not. It is not a minor subject. The most important part: Your comments, your feels or your experiences
Leave a Comment