DIY std::optional: Using Universal References in C++

This blog was written entirely because of the item Familiarize yourself with alternatives to overloading on universal references in Effcetive Morden C++. We start with a practical problem trying to emulate std::optional in C++14, because optional monads are a good paradigm for expressing values ​​that may be null, rather than assuming everything is maybe null which can easily break through the type system.
写下这篇文章完全是因为看到了 Effcetive Morden C++ 中的熟悉通用引用重载的替代方法。我们从一个现实的问题开始,即在 C++14 尝试模拟 std::optional,因为 optional 是表达可能为空的值的时候一个很好的范式,而不是假定一切都可能为 null 并轻易击穿类型系统。

Simplest Approach

最简单的方法

Obviously, a std::optional<T> is just a wrapper around two things: a T and a bool indicating whether T makes sense. We can easily write code like this:

显然,一个 std::optional<T> 就是两个东西的包装:一个 T 和一个 bool,指示 T 是否有意义。我们可以很容易写下类似这样的代码:

class _None_t {};
static constexpr _None_t None;

template <class T> class option {
  T val;
  bool has_val;

public:
  option() : has_val(false) {};
  option(const _None_t &) : has_val(false) {};
  option(const T &x) : val(x), has_val(true) {};
  option(T &&x) : val(std::move(x)), has_val(true) {};
  option(const option &copy_from) = default;
  option(option &&move_from) = default;

  option &operator=(const option &x) = default;
  option &operator=(option &&x) = default;
  option &operator=(const T &x) {
    val = x;
    has_val = true;
    return *this;
  };
  option &operator=(T &&x) {
    val = std::move(x);
    has_val = true;
    return *this;
  }
  option &operator=(const _None_t &) {
    has_val = false;
    return *this;
  }
}

The correctness of this is obvious, but it is not guaranteed to always compile. Obviously, T may have to be initialized with a meaningful parameter. This makes it impossible for our option<T> to be None in this case.
这样做的正确性当然是显然的,但是它不保证总是能编译。显然,T 有可能必须要一个有意义的参数来初始化。这使得对于这种情况,我们的 option<T> 根本无法为 None

To solve this situation, we naturally think of pointers. A pointer can always be nullptr unless it is allocated with a real value, so we can rewrite our option<T> as a alias of std::unique_ptr<T>, adding an additional function has_value() const to check if it is nullptr, and a copy constructor. Such a class is easy to write, but it obviously still has disadvantages. The biggest disadvantage is that it needs to dynamically allocate memory, and on the heap, which seriously slows down our option<T>.
为了解决这个情况,我们自然会想到指针。指针在除非它被赋予一个真正的值之前可以一直是 nullptr,因此我们可以重写我们的 option<T> 为一个 std::unique_ptr<T> 的重命名,添加了一个额外的函数 has_value() const 判断它是不是 nullptr,并添加复制构造。这样的类很容易写出,但是它显然仍然有缺点。最大的缺点是它需要动态分配内存,而且是在堆上,这严重拖慢了我们的 option<T> 的速度。

To solve this problem, we will use union, a keyword inherited from C that allows multiple different types to coexist in the same memory position. But here union has another use: shielding the constructor and destructor of T and letting us manage it ourselves.
为了解决这个问题我们会使用 union,这个继承自 C 的关键字可以将多个不同类型共存于同一块内存空间。但在这里 union 另一个用途:屏蔽 T 的构造函数和析构函数,让我们自己托管它。

template <typename T> class option {
  union {
    T val;
  };
  bool has_val;
};

In the above definition we put T in a union, which disables the constructor and destructor of T. Next, when we need to make sure that option actually has a T, we use placement new to construct T in place.
在上面的定义中我们把 T 放在 union 中,这禁用了 T 的构造和析构函数。接下来,当我们需要让 option 里面真的有 T 的时候,我们用 placement new 原地构造 T

  option(T &&f) : val(std::move(f)), has_val(true) {}
  option(const T &f) : val(f), has_val(true) {}
  option() : has_val(false) {};
  option(const none_t &) : has_val(false) {};
  option(const option &b) : has_val(b.has_val) {
    if (has_val)
      new (&val) T(b.val);
  }
  option(option &&b) : has_val(b.has_val) {
    if (b.has_val) {
      new (&val) T(std::move(b.val));
    }
  }

This looks perfect. In fact, it does roughly meet our needs, but is that enough? Suppose we have a custom class that needs to use string as a constructor. A string can certainly be constructed with const char*, so we naturally try to construct it with a string literal, like this. But does it really work as we wish? Guess if the following code will compile?
这看上去非常完美。实际上,它也确实能很大程度上满足了我们的需求,但是让我们精益求精。假如我们有一个自定义类,需要用到 string 作为构造函数。一个 string 当然可以用 const char* 来构造,所以我们自然而然地尝试用字符串字面量来构造它,就像这样。但是它真的能如我所愿吗?猜猜下面的代码会不会通过编译?

struct StringWrapper {
  std::string s;
  StringWrapper(std::string str) : s(std::move(str)) {}
};
int main() {
  option<StringWrapper> test("123");
  // opt_test.cpp: In function 'int main()':
  // opt_test.cpp:123:45: error: no matching function for call to 'option<StringWrapper>::option(const char [4])'
}

The answer is no! You need to explicitly construct a StringWrapper object to call the option<StringWrapper> constructor, because the option<StringWrapper> call function does not accept const char [4] as a parameter. C++ will not automatically “forward” it.
答案是否定的!你需要显式构造一个 StringWrapper 对象才能调用 option<StringWrapper> 的构造函数,因为 option<StringWrapper> 的调用函数并不能接受 const char [4] 作为参数。C++ s 不会自动“转发”它。

Universal Referances

通用引用

In order to forward the arguments to T’s constructor, as if our option<T> is actually T, we need to use perfect forwarding and universal references.
为了转发参数到 T 的构造函数,就像我们的 option<T> 实际上是 T 一样,我们需要用到完美转发和通用引用。

In fact, “T&&” has two different meanings. One is rvalue reference, of course. Such references behave exactly the way you expect: they bind only to rvalues, and their primary raison d’être is to identify objects that may be moved from.
事实上,“T&&”有两种不同的意思。第一种,当然是右值引用。这种引用表现得正如你所期待的那样:它们只绑定到右值上,并且它们主要的存在原因就是为了识别可以移动操作的对象。
The other meaning for “T&&” is either rvalue reference or lvalue reference. Such references look like rvalue references in the source code (i.e., “T&&”), but they can behave as if they were lvalue references (i.e., “T&”). Their dual nature permits them to bind to rvalues (like rvalue references) as well as lvalues (like lvalue references). Furthermore, they can bind to const or non-const objects, to volatile or non-volatile objects, even to objects that are both const and volatile. They can bind to virtually anything. Such unprecedentedly flexible references deserve a name of their own. I call them universal references.
“T&&”的另一种意思是,它既可以是右值引用,也可以是左值引用。这种引用在源码里看起来像右值引用(即“T&&”),但是它们可以表现得像是左值引用(即“T&”)。它们的二重性使它们既可以绑定到右值上(就像右值引用),也可以绑定到左值上(就像左值引用)。 此外,它们还可以绑定到 const 或者 non-const 的对象上,也可以绑定到 volatile 或者 non-volatile 的对象上,甚至可以绑定到既 const 又 volatile 的对象上。它们可以绑定到几乎任何东西。这种空前灵活的引用值得拥有自己的名字。我把它叫做通用引用(universal references)。
https://cntransgroup.github.io/EffectiveModernCppChinese/5.RRefMovSemPerfForw/item24.html

Assuming that everyone already understands how to write perfect forwarding, or you can also read the relevant description in Effective Morden C++, I won’t go into details here. The only problem with perfect forwarding is that it is not convenient to overload. It is easy to write a perfect forwarding constructor:
关于完美转发的写法假定大家都已经懂了,不懂的也可以看 Effective Morden C++ 相关的描述,这里不用多说了。完美转发唯一的问题是,它并不方便重载。我们很容易写出一个完美转发的构造函数:

  template <typename Arg>
  option(Arg &&f) : val(std::forward<Arg>(f)), has_val(true) {}  // <---- Perfect forwarding
  option(const option &b) : has_val(b.has_val) {
    if (has_val)
      new (&val) T(b.val);
  }
  option(option &&b) : has_val(b.has_val) {
    if (b.has_val) {
      new (&val) T(std::move(b.val));
    }
  }

But these two overloads do not work as we expect. Compiling the following code will produce a compiler error:
但是这两个重载并不会像我们想象的那样工作。编译以下代码,它会产生编译器错误:

struct StringWrapper {
  std::string s;
  StringWrapper(std::string str) : s(std::move(str)) {}
};

int main() {
  option<StringWrapper> test1;
  option<StringWrapper> test2(test1);
  // In instantiation of 'option<T>::option(Arg&&) [with Arg = option<StringWrapper>&; T = StringWrapper]':
  //    required from here
  //  11 |   option<StringWrapper> test2(test1);
  //     |                                    ^
  // error: no matching function for call to 'StringWrapper::StringWrapper(option<StringWrapper>&)'
  //  23 |   option(Arg &&f) : val(std::forward<Arg>(f)), has_val(true) {}
}

How could that be! We did clearly declared option(const option &b)! But the overloading is not in this order. For more information on why the compiler overloads in this way, please see Avoid overloading on universal references, which I will not go into here. In short, when encountering option<T>, the compiler will expand the universal reference above to:
怎么会呢!明明我们声明了 option(const option &b)! 但是重载并不是按这个顺序来的。关于编译器为什么会这样重载,请看 Avoid overloading on universal references,这里不赘述。总而言之,遇到 option<T> 的时候,编译器会将上面那个通用引用展开为:

  // template <typename Arg> // <-- Arg = option&
  option(option &f) : val(f), has_val(true) {}

Obviously it is a better match than the copy constructor of const option&, so the compiler chooses it.
显然这相比于 const option& 的复制构造函数是一个更好的匹配,因此,编译器选择了它。

Use enable_if

使用 std::enable_if

(Of course, C++20 introduces better practices, namely concepts, but let’s use the features of C++14 to build our option<T>.)
(当然,C++20 引入了更好的做法即 concepts,但是我们这里先用 C++14 就有的功能来构建我们的 option<T>。)

std::enable_if gives you a way to force compilers to behave as if a particular template didn’t exist. Such templates are said to be disabled. By default, all templates are enabled, but a template using std::enable_if is enabled only if the condition specified by std::enable_if is satisfied.
std::enable_if 可以给你提供一种强制编译器执行行为的方法,像是特定模板不存在一样。这种模板被称为被禁止(disabled)。默认情况下,所有模板是启用的(enabled),但是使用 std::enable_if 可以使得仅在 std::enable_if 指定的条件满足时模板才启用。

I am not going to explain the syntax of enable_if here. In short, after introducing enable_if, our constructors become like this;
这里不打算解释 enable_if 的语法。总而言之,引入 enable_if 后,我们的构造函数变成了这样;

  template <typename Arg,
            typename std::enable_if< expression , bool>::type = true>
  option(Arg &&f) : val(std::forward<Arg>(f)), has_val(true) {}

Where the expression is an expression that only when it is true will the overloaded function be called. Obviously, it is appropriate to call perfect forwarding only when Args can be used to construct T (for convenience, we do not consider meaningless duplicate types like option<option<T>>). Therefore, the expanded code becomes obvious (to highlight the type traits, I use not instead of !):
其中 expression 是某个表达式,只有在它成立的时候,这个重载函数才会被调用。显然,我们只有在 Args 可以用来构造 T 的时候调用完美转发是合适的(为了方便,我们不考虑像 option<option<T>> 这样无意义的重复类型)因此,详细的写法就变得显然了(为了突出 type traits 的关系,我使用了 not 代替 !):

// ...
  template <typename Arg>
  using OptionConstructable =
      std::is_base_of<option<T>,
                      std::remove_reference_t<std::remove_const_t<Arg>>>;

public:
  template <typename Arg,
            std::enable_if_t<std::is_constructible<T, Arg>::value and
                                 not OptionConstructable<Arg>::value,
                             bool> = true>
  option(Arg &&f) : val(std::forward<Arg>(f)), has_val(true) {}
  option() : has_val(false) {};
  option(const none_t &) : has_val(false) {};
  option(const option &b) : has_val(b.has_val) {
    if (has_val)
      new (&val) T(b.val);
  }
  option(option &&b) : has_val(b.has_val) {
    if (b.has_val) {
      new (&val) T(std::move(b.val));
    }
  }
// ...

Code and Tests

代码和测试

Finally, our self-made std::optional<T> using universal references is ready! Here is what the code looks like:
最后,我们使用通用引用的自制 std::optional<T> 就写好了!代码长这样:

#pragma once
#include <type_traits>
#include <utility>

struct none_t {};
constexpr none_t None;

template <typename T> class option {
  union {
    T val;
  };
  bool has_val;

  void release() {
    if (has_val)
      val.~T();
    has_val = false;
  }

  template <typename Arg>
  using OptionConstructable =
      std::is_base_of<option<T>,
                      std::remove_reference_t<std::remove_const_t<Arg>>>;

public:
  template <typename Arg,
            std::enable_if_t<std::is_constructible<T, Arg>::value and
                                 not OptionConstructable<Arg>::value,
                             bool> = true>
  option(Arg &&f) : val(std::forward<Arg>(f)), has_val(true) {}
  option() : has_val(false) {};
  option(const none_t &) : has_val(false) {};
  option(const option &b) : has_val(b.has_val) {
    if (has_val)
      new (&val) T(b.val);
  }
  option(option &&b) : has_val(b.has_val) {
    if (b.has_val) {
      new (&val) T(std::move(b.val));
    }
  }

  option &operator=(none_t) {
    release();
    return *this;
  }
  template <typename Arg>
  std::enable_if_t<std::is_constructible<T, Arg>::value and
                       not OptionConstructable<Arg>::value,
                   option &>
  operator=(Arg &&fwd) {
    if (has_val) {
      val = std::forward<Arg>(fwd);
    } else {
      new (&val) T(std::forward<Arg>(fwd));
    }
    has_val = true;
    return *this;
  }
  template <typename Arg>
  std::enable_if_t<OptionConstructable<Arg>::value, option &>
  operator=(Arg &&fwd) {
    has_val = fwd.has_val;
    val = std::forward<T>(fwd.val);
    return *this;
  }

  constexpr bool has_value() const noexcept { return has_val; }
  T &value() { return val; }
  const T &value() const { return val; }

  T *operator->() { return &val; }
  T const *operator->() const { return &val; }

  ~option() { release(); }
};

What a lovely code! Comparing to the real std::optional, the behavior of our’s is very consistent in allocation and moving or copying.
多么优美的代码!将它和真正的 std::optional 对比,无论是分配内存还是移动/拷贝的行为都非常一致。

And C++20

以及 C++20

When using C++20, a great improvement is that we can use concepts to replace complex and difficult std::enable_if to get more semantic error prompts. So the original std::enable_if can be replaced with requires, like this:
当我们使用 C++20 的时候,一个很大的特点是我们可以用 concepts 来替换复杂难用的 std::enable_if 并且获得更加语义化的错误提示。所以原先的 std::enable_if 可以替换成 requires,就像这样:

 template <typename Arg>
    requires(std::is_constructible<T, Arg>::value and
             not OptionConstructable<Arg>::value)
  option(Arg &&f) : val(std::forward<Arg>(f)), has_val(true) {}
  option() : has_val(false) {};
  option(const none_t &) : has_val(false) {};
  option(const option &b) : has_val(b.has_val) {
    if (has_val)
      new (&val) T(b.val);
  }
  option(option &&b) : has_val(b.has_val) {
    if (b.has_val) {
      new (&val) T(std::move(b.val));
    }
  }

  option &operator=(none_t) {
    release();
    return *this;
  }
  template <typename Arg>
    requires(std::is_constructible<T, Arg>::value and
             not OptionConstructable<Arg>::value)
  option &operator=(Arg &&fwd) {
    if (has_val) {
      val = std::forward<Arg>(fwd);
    } else {
      new (&val) T(std::forward<Arg>(fwd));
    }
    has_val = true;
    return *this;
  }
  template <typename Arg>
    requires OptionConstructable<Arg>::value
  option &operator=(Arg &&fwd) {
    has_val = fwd.has_val;
    val = std::forward<T>(fwd.val);
    return *this;
  }

What a simpler code!
真是简单了不少呢!

And Rust

以及 Rust

After writing a lot of C++ dark magic, let’s see how Rust’s Option is defined.
写完了丰富的 C++ 黑魔法,让我们看看 rust 的 Option 是怎么定义的。

enum Option<T> {
    None,
    Some(T),
}

Fuck.
操。

Conclusion 结论

Don’t use C++.
别用 C++

Previous  Next

Loading...