Tests vs. Types — Rust version / Хабр

A few days ago 0xd34df00d has published the translation of the article, describing the possible information about some function if we use it as a "black box", not trying to read its implementation. Of course, this information is quite different from language to language; in the original article, four cases were considered:

Python — dynamic typing, almost no information from signature, some hints are gained by the tests;
C — weak static typing, a little more information;
Haskell — strong static typing, with pure functions by default, a lot more information;
Idris — dependent typing, compiler can prove the function correctness.

"Here's C and there's Haskell, and what about Rust?" — this was the first question in the following discussion. The reply is here.

Let's first recall the task:

Given a list of values and a value, return the index of the value in the list or signify that it is not present in the list.

If someone don't want don't want to read this all, the code examples are provided in the Rust playground.
Otherwise, let's start!

Simple search

The first approach will be the almost naive signature, differing from the C code only in some idiomatic elements:

fn foo(x: &[i32], y: i32) -> Option<usize> {
    // Implementation elided 
}

What do we know about this function? Well, in fact — not very much. Of course, Option<usize> as a return value is a great improvement over whatever is provided by C, but there's no information about the function semantics. In particular, we have no guarantee of the side-effects being absent and no way to somehow check the desired behaviour.

Can a test improve this? Look here:

#[test]
fn test() {
    assert_eq!(foo(&[1, 2, 3], 2), Some(1));
    assert_eq!(foo(&[1, 2, 3], 4), None);
}

Nothing more, it seems — all these checks can be just the same in Python (and, in anticipation, tests will be of little help for the whole article).

Use the generics, Luke!

But is it any good that we have to use only 32-bit signed numbers? Fixing it:

fn foo<El>(x: &[El], y: El) -> Option<usize>
where
    El: PartialEq,
{
    // Implementation elided 
}

Well, that's something! Now we can take any slice, consisting of elements of any comparable type. Explicit polymorphism is almost always better that implicit (hello, Python) and almost always better then no polymorphism at all (hello, C), right?

Although, this function might unexpectedly pass this test:

fn refl<El: PartialEq + Copy>(el: El) -> Option<usize> {
    foo(&[el], el) // should always return Some(0), right?
}
#[test]
fn dont_find_nan() {
    assert_eq!(refl(std::f64::NAN), None);
}

This hints on the one missing point, since the specification wants the refl function, in fact, to always return Some(0). Of course, this is all due to the specific behaviour of the partially-equivalent types in general and floats in particular.
Maybe we want to get rid of this problem? So, we'll simply tighten the bound on the El type:

fn foo<El>(x: &[El], y: El) -> Option<usize>
where
    El: Eq,
{
    // Implementation elided 
}

Now, we're requiring not only the type to be comparable, — we require this comparisons to be equivalences. This, of course, limits the possible types to use with this function, but now both the signature and the tests hint that the behaviour should fit into the specification.

Side note: we want to go MORE generic!

This case has no relation to the initial task, but this seems to be a good example of the well-known principle: "be liberal in what you accept, be conservative in what you do". In other words: if you can generalize the input type without harming the ergonomics and performance — you probably should do it.

Now, we'll check this:

fn foo<'a, El: 'a>(x: impl IntoIterator<Item = &'a El>, y: El) -> Option<usize>
where
    El: Eq,
{
    // Implementation elided 
}

What do we know about this function now? In general, all the same, but now it accepts not only the slice or list, but some arbitrary object, which can yield the references to the type El, so that we compare it with the object in question. For example, if I'm not mistaken, in Java this type would be Iterable<Comparable>.

Like before, but a bit more strict

But now, maybe we need some more guaranties. Or we want to work on the stack (and so can't use Vec), but need to generalize our code for every possible array size. Or we want to compile the function optimized for every concrete array size.

Anyway, we need a generic array — and there's a crate in Rust, providing exactly that.

Now, here's our code:

use generic_array::{GenericArray, ArrayLength};

fn foo<El, Size>(x: GenericArray<El, Size>, y: El) -> Option<usize>
where
    El: Eq,
    Size: ArrayLength<El>,
{
    // Implementation elided 
}

What do we know from it? We know that function will take the array of some particular size, reflected in its type (and will be compiled independently for each such size). For now, this is almost nothing — the same guarantees were provided at runtime by the previous implementation.

But we can get further.

Type-level arithmetic

The initial article mentioned several guarantees provided by Idris which were impossible to get from other languages. One of them — and probably the simplest, since it doesn't involve any proofs or tests, just a little change in types, — states that the return value, if it's not Nothing, will always be less then the list length.

Looks like that the dependent types — or something like them — are necessary for such guarantee, and we can't get the same from Rust, right?

Meet the typenum. Using it, we can write our function in the following way:

use generic_array::{ArrayLength, GenericArray};
use typenum::{IsLess, Unsigned, B1};
trait UnsignedLessThan<T> {
    fn as_usize(&self) -> usize;
}

impl<Less, More> UnsignedLessThan<More> for Less
where
    Less: IsLess<More, Output = B1>,
    Less: Unsigned,
{
    fn as_usize(&self) -> usize {
        <Self as Unsigned>::USIZE
    }
}

fn foo<El, Size>(x: GenericArray<El, Size>, y: El) -> Option<Box<dyn UnsignedLessThan<Size>>>
where
    El: Eq,
    Size: ArrayLength<El>,
{
    // Implementation elided 
}

"What is this black magic?!" — you could ask. And you're right: typenum is a black magic, and any attempts to use it are even more magic.

But this function signature is fairly concrete.

It takes an array of El's with the length Size and one more El.
It returns an Option, which, if it is Some,
- holds a trait object, based on the UnsignedLessThan<Size> trait;
- and the UnsignedLessThan<T> is implemented wherever Unsigned and IsLess<T> are implemented and IsLess<T> returns B1, i.e. true.

In other words, this function is guaranteed to return unsigned integer less than the array size (strictly speaking, it returns the trait object, but we can call as_usize method and get the integer).

I can now speak of two major caveats:

We can get a loss of performance. If somehow this function will be on the "hot" path of the program, the constant dynamic dispatches might slow down the whole process. This might not be a large concern, in fact, but there's another:
For this function to compile, we must either write the proof of its correctness right inside it, or trick the type system with some unsafe. The former is fairly complex, and the latter is just cheating.

Conclusion

Of course, in practice we'll generally use either the second approach (with generic slice) or the approach in spoiler (with iterator). All subsequent discussion is probably not of any practical interest and is here only as an exercise with types.

Anyway, the fact that the Rust type system can emulate the feature from the stronger Idris type system, as for me, is fairly impressive by itself.