michael-filonenko Sep 1 2021 at 16:15

In-Memory Showdown: Redis vs. Tarantool

13 min

5.3K

VK corporate blogHigh performance*Database Administration*Tarantool*

In this article, I am going to look at Redis versus Tarantool. At a first glance, they are quite alike — in-memory, NoSQL, key value. But we are going to look deeper. My goal is to find meaningful similarities and differences, I am not going to claim that one is better than the other.

There are three main parts to my story:

We’ll find out what is an in-memory database, or IMDB. When and how are they better than disk solutions?
Then, we’ll consider their architecture. What about their efficiency, reliability, and scaling?
Then, we’ll delve into technical details. Data types, iterators, indexes, transactions, programming languages, replication, and connectors.

Feel free to scroll down to the most interesting part or even the summary comparison table at the very bottom and the article.

Content

Introduction
- What is an in-memory database, or IMDB
- Why IMDBs are needed
- What is Redis
- What is Tarantool
Architecture
- Performance
- Reliability
- Scaling
- Data schema validation
Technical features
- Supported data types
- Data eviction
- Iteration with keys
- Secondary indexes
- Transactions
- Persistence
- Programming languages for stored procedures
- Replication
- Connectors from other programming languages
- When not to use Redis or Tarantool
- Ecosystem
- Redis advantages
- Tarantool advantages
Summary
References

1. Introduction

What Is an in-Memory Database, or IMDB?

Redis and Tarantool are in-memory technologies. What is an in-memory database, or IMDB? This is a database that stores all the data in RAM. Its size is limited by the RAM capacity of the node. Although limiting the data amount, this greatly increases the speed.

In-memory DBs can store persist on disks. The node can be restarted without losing the information. Today, in-memory DBs can already be used as the main storage in production. For instance, Mail.ru Cloud Solutions uses Tarantool as the main DB to store metadata in their S3 compatible object repository.

In-memory DBs are also used for high-speed data access, capable of 10,000 requests per second. These can be the cases of large spikes in traffic on IMDB on the day of the Zack Snyder's Cut of Justice League release, Amazon a week before Christmas, or Uber on Friday night.

Why in-Memory DBs Are Needed?

Cache. In-memory DBs are often used as a cache for disk databases. RAM is way faster than any disk (even an SSD). However, caches restart, crash, can be inaccessible through the network, suffer from memory shortage and other issues.

Eventually, caches learned to provide persistence, reservation, and sharding.

Persistence means caches store data on disks. After restart, the status is restored without addressing the main storage. If we don’t do this, addressing a cold cache will take a really long time and can even result in the main DB crashing.
Reservation means caches can replicate data. If one node crashes, the second one will receive the queries. The main storage won’t crash because of overloading, as the reserve node will be there.
Sharding means that if hot data doesn't fit into a node's RAM, several nodes are used in parallel. This is horizontal scaling.

Sharding is a large-scale system. Reservation is a reliable system. Together with persistence, we get clustered data storage, which can be used to store terabytes of data and access it at remarkable speed, even at 1,000,000 RPS.

OLTP stands for Online Transaction Processing. In-memory solutions are fit for tasks of this type thanks to their architecture. OLTP comprises many short online transactions like INSERT, UPDATE, DELETE. The main thing with OLTP systems is fast procession of queries and ensuring data integrity. Efficiency is usually measured in RPS.

What’s Redis?

Redis is an in-memory data structure store.
Redis is a key-value store.
If you Google «database caching,» almost every article will mention Redis.
Redis only offers primary key access and doesn’t support secondary indexing.
Redis contains a Lua stored procedure engine.

What's Tarantool?

Tarantool is an in-memory computing platform.
Tarantool is a key-value store that supports documents and a relational data model.
It has been designed for hot data — MySQL caching in a social network, but gradually it became a fully-featured database.
Tarantool can provide any number of indexes.
Tarantool supports stored procedures in Lua, too.

2. Architecture

Performance

The most popular questions about in-memory DBs are «How fast are they?» and «How many millions of RPS can we get from one core?». Let’s perform an easy synthetic test, approximating database settings as much as possible. The Go script will fill the storage with random keys having random values.

MacBook Pro 2,9 GHz Quad-Core Intel Core i7

Redis version=6.0.9, bits=64

Tarantool 2.6.2

Redis

File: redis_test.go

Content:

redis_test.go

package main

import (
    "context"
    "fmt"
    "log"
    "math/rand"
    "testing"

    "github.com/go-redis/redis"
)

func BenchmarkSetRandomRedisParallel(b *testing.B) {
    client2 := redis.NewClient(&redis.Options{Addr: "127.0.0.1:6379", Password: "", DB: 0})
    if _, err := client2.Ping(context.Background()).Result(); err != nil {
        log.Fatal(err)
    }

    b.RunParallel(func(pb *testing.PB) {
        for pb.Next() {
            key := fmt.Sprintf("bench-%d", rand.Int31())
            _, err := client2.Set(context.Background(), key, rand.Int31(), 0).Result()
            if err != nil {
                b.Fatal(err)
            }
        }
    })
}

Tarantool

Command: tarantool

Tarantool initialization:

tarantool>
box.cfg{listen='127.0.0.1:3301', wal_mode='none', memtx_memory=2*1024*1024*1024}
box.schema.user.grant('guest', 'super', nil, nil, {if_not_exists=true,})
box.schema.space.create('kv', {if_not_exists=true,})
box.space.kv:create_index('pkey', {type='TREE', parts={{field=1, type='str'}},
                                   if_not_exists=true,})

File: tarantool_test.go
Content:

package main

import (
    "fmt"
    "math/rand"
    "testing"

    "github.com/tarantool/go-tarantool"
)

type Tuple struct {
    _msgpack struct{} `msgpack:",asArray"`
    Key      string
    Value    int32
}

func BenchmarkSetRandomTntParallel(b *testing.B) {
    opts := tarantool.Opts{
        User: "guest",
    }
    pconn2, err := tarantool.Connect("127.0.0.1:3301", opts)
    if err != nil {
        b.Fatal(err)
    }
    b.RunParallel(func(pb *testing.PB) {
        var tuple Tuple
        for pb.Next() {
            tuple.Key = fmt.Sprintf("bench-%d", rand.Int31())
            tuple.Value = rand.Int31()
            _, err := pconn2.Replace("kv", tuple)
            if err != nil {
                b.Fatal(err)
            }
        }
    })
}

Launching. To load databases to the maximum, let’s use more threads.

go test -cpu 12 -test.bench . -test.benchtime 10s

goos: darwin
goarch: amd64
BenchmarkSetRandomRedisParallel-12          929368         15839 ns/op
BenchmarkSetRandomTntParallel-12            972978         12749 ns/op

Results. Average duration of Redis query was 15 microseconds. For Tarantool — 12 microseconds. It means Redis efficiency is 63,135 RPS, and Tarantool — 78,437 RPS. The test does not demonstrate the speed, but the efficiency of in-memory DBs. You can tweak the benchmark in a way that your DB of choice wins.

Reliability

Two basic methods are used for reliability in data storages:

Persistence At a restart, DB will load the data from the disc without any queries to outside systems.
Replication If one node crashes, there will be a copy at the other one. Replication can be asynchronous and synchronous,

Both Redis and Tarantool can do that. We’ll delve into some technical details further.

Scaling

Scaling can be used:

to reserve additional nodes that can replace each other in case one of crashes;
in case the data doesn’t fit into a single node and has to be distributed among several ones.

Redis

Redis nodes can be interconnected by means of asynchronous replication. We’ll call such nodes a replica set. It is Redis Sentinel that manages the replica set. Redis Sentinel is one special process or several of them clustered, to monitor Redis nodes. They perform 4 primary tasks:

Checking node status within the group — dead or alive.
Notifying the system administrator if something goes wrong within the group.
Automatic switching of the master.
Config provider for external clients for them to know where to connect.

If data is to be sharded to several nodes, Redis offers an open-source version of Redis Cluster. It supports building a cluster from several replication groups. Data within the cluster is sharded over 16 384 slots. Slot ranges are determined among Redis nodes.

Nodes within the cluster communicate over a separate open port to know their neighbors’ statuses. To work with Redis Cluster, the app should use a special connector.

Tarantool

Tarantool also supports replication and sharding. The key tool of scalability management is Tarantool Cartridge. It unites nodes into replica sets. You can make one group of that kind and use it similarly to Redis Sentinel. Tarantool Cartridge can manage several replica sets and shard data across them. vshard library is used for sharding purposes.

Differences

Administration

Scripts and commands in Redis Cluster.
Web-interface or API in Tarantool Cartridge.

Sharding buckets

Number of sharding buckets in Redis is fixed, equal to 16 384.
Number of sharding buckets in Tarantool Cartridge (vshard) is customizable. It is set up once — when a cluster is created.

Bucket rebalancing (resharding)

In Redis Cluster, setup and launching are manual,
They are automatic in Tarantool Cartridge (vshard).

Query routing

In Redis Cluster, queries are routed on the side of the client app.
In Tarantool Cartridge, queries are routed on the cluster router nodes.

Infrastructure

Tarantool Cartridge also contains:

Data Schema Validation

In Redis, the primary data schema is key-value, but the values can contain different structures. You cannot set validation rules on the server side. We can’t indicate how certain data types should be used and what structure a value should have. The schema must be validated by a connector or a client app.

Tarantool supports data schema validation on the server side:

using integrated validation box.space.format that covers only the top-level section of the fields;
using an installed Avro schema extension.

3. Technical Features

What Kind of Data Types Can Be Stored?

In Redis, only a string can be a key. Redis supports the following data types:

strings;
string lists;
unordered collections of strings;
hashmaps or just key-value string pairs;
ordered collections of strings;
Bitmap and HyperLogLog.

Tarantool supports the following data types:

Primitive
- strings;
- boolean (true or false);
- integer;
- with floating point;
- with decimal floating point;
- UUID.
Complex
- arrays;
- hashmaps.

Redis data types are better tailored for event counters, including the unique ones, as well as for storage of small finished data marts.

Tarantool data types are better tailored for storage of objects and/or documents both in SQL and NoSQL DBMS.

Data Eviction

Redis and Tarantool both have engines to limit the memory occupied. If a client attempts to add data after the limit has been reached, the databases will return an error. Both Redis and Tarantool will proceed with reading queries in that case, though.

Let’s see how «no longer required» data can be deleted. Redis comprises several data eviction engines:

TTL — object eviction as soon as their lifetime expires;
LRU — long used data eviction;
RANDOM — random object eviction;
LFU — rarely used data eviction.

All the engines can be set up either for the entire data amount or for the objects marked as evictable only.

In Tarantool, expirationd or indexpiration extensions can be used for eviction. Another option is creating your own background procedure that will be by-index implemented (e. g., with a timestamp) and will delete unnecessary data.

Iteration With Keys

In Redis, this can be done by means of operators:

SCAN;
iteration with keys.

Transactions return pages with results. To get a new page, an ID of the previous one has to be sent. Transactions support by-template filtration. For this, MATCH parameter is used. Filtration takes place when the page is sent, so some of the pages might be blank. However, it doesn’t mean there are no more pages.

Tarantool offers a more flexible «iteration with keys» schema. Both direct and inverse iterations are possible, and you can additionally filter the values on the go. You can move to a certain key value and then check the consequent keys either in the ascending or descending order. The check direction can’t be changed on the spot.

For example:

results = {}
for _, tuple in box.space.pairs('key', 'GE') do
    if tuple['value'] > 10 then
        table.insert(results, tuple)
  end
end
return results

Secondary Indexes

Redis

Redis has no secondary indexes, but there are some ways to imitate them.

The order number of the element can be used as a secondary key in ordered collections.
Otherwise, hashmaps can be used with their key considered as the data index.

Tarantool

In Tarantool, a custom amount of secondary data indexes can be created.

Secondary keys may contain several fields.
Types HASH, TREE, RTREE, and BITSET can be used for secondary indexes.
Secondary indexes may contain unique and non-unique keys.
Locale settings can be used for any indexes, e. g., for register-independent string values.
Secondary indexes can be based on fields with value arrays (sometimes referred as MultiIndexes).

Summary

Secondary keys and convenient iterators enable relational data storage model building in Tarantool. It is impossible to build such a model in Redis.

Transactions

Transactions enable primitive execution of several operations. Both Redis and Tarantool support transactions. Transaction example from Redis:

> MULTI
OK
> INCR foo
QUEUED
> INCR bar
QUEUED
> EXEC
1) (integer) 1
2) (integer) 1

Transaction example from Tarantool

do
  box.begin()
  box.space.kv:update('foo', {{'+', 'value', 1}})
  box.space.kv:update('bar', {{'+', 'value', 1}})
  box.commit()
end

Persistence

Data persistence is ensured by two engines —

in-memory data recording to a disk at specified intervals — snapshotting;
successive write-ahead logging of all the incoming operations — transaction journal.

Both Redis and Tarantool have these persistence engines.

Redis

At specified intervals, Redis snapshots all in-memory data. By default, it is done every 60 seconds (customizable). Redis copies the current in-memory data using OS.fork and then stores data to the disk.

In case of an abnormal shutdown, Redis recovers its status from the most recent saving. If the last snapshot was made a long time ago, all the data received after the snapshot will be lost.

The transaction journal is used to store all the information arriving at the database. Every operation is logged in the on-disk journal. When Redis is started, it recovers its status from the snapshot and then adds the remaining operations from the journal.

In Redis, a snapshot is called RDB (Redis DataBase).
The transaction journal in Redis is called AOF (Append Only File).

Tarantool

The persistence engine is derived from database architectures.
It is comprehensive, with snapshotting and transaction journaling.
This mechanism ensures reliable WAL-based replication.

Tarantool snapshots the current in-memory data at specified intervals and records every transaction to the journal.

A snapshot in Tarantool is called a snap and can be made at any frequency.
In Tarantool, the transaction journal is called WAL (Write Ahead Log).

Each of the engines can be switched off both in Redis and in Tarantool. Both engines should be on for reliable data storage. You can trade-off persistence and turn off snapshotting and journaling to ensure the highest operation speed possible.

Differences

Redis uses OS.fork for snapshotting. Tarantool uses an internal readview of all the data, and this is faster than fork.

By default, Redis has snapshotting only. Tarantool has both snapshotting and transaction journaling.

Redis stores and uses only one file for both snapshotting and transaction journaling. Tarantool stores two snapshot files by default (but this number is customizable) and a consistently enlarging unlimited number of transaction journals. If a snapshot file is damaged in Tarantool, it can use the previous one to load. In Redis, you need to set up backups.

Unlike Redis, snapshotting and journaling in Tarantool form a common engine for data display in the file system. It means that in Tarantool snapshot files and journals store all the metainfo on the transaction, which is who made it and when. It has the same format and is complementary.

Troubleshooting

If a journal file is damaged in Redis:

redis-check-aof --fix

If a journal file is damaged in Tarantool:

tarantool> box.cfg{force_recovery=true}

Programming languages for stored procedures

Stored procedures are a code executed in the data section. Both Redis and Tarantool suggest using Lua for stored procedure creation. This language is quite simple. It was designed for those using programming for task solving in a specific area.

From a database developer point of view:

Lua can be easily integrated into an existing app.
It is easy to integrate with objects and processes of the app.
Lua has dynamic typization and automatic memory management.
This language has a garbage collector — incremental Mark&Sweep.

Differences

Implementation

Redis is a plain vanilla implementation of PUC-Rio.
Tarantool uses LuaJIT.

Task timeout

Redis allows you to set a timeout after which execution of a stored procedure will end.
In Tarantool stored procedures are compiled and executed faster, but no timeout can be set. To end a stored procedure, a user should make provisions to check the end flag.

Runtime

Redis is single-tasked: it executes the tasks one by one.
Tarantool uses cooperative multitasking. It executes the tasks one by one, but at the same time the task gives up IO operation management, in particular — directly by means of yield.

Summary

In Redis, Lua is just about the stored procedures.
In Tarantool, it is a cooperative runtime that supports communication with outside systems.

Replication

Replication is an engine that enables object copying from one node to another. Replication can be asynchronous and synchronous.

Asynchronous replication: after adding an object to one node we don’t wait for it to be replicated to the second node.
Synchronous replication: after adding an object, we wait for it to be saved at the first and the second node.

Redis and Tarantool support asynchronous replication, whereas synchronous replication is only available in Tarantool.

In some cases, we need to wait for the object replication:

in Redis, we use the wait command. It accepts only two parameters:
- number of replicas an object has to obtain;
- amount of time required for that to happen.

In Tarantool, it can be done with a code fragment — Pseudocode:

while not timeout do
   if box.info.lsn <= (box.info.replication[dst].downstream.vclock[box.info.id] or 0) then
       break
   end
   fiber.sleep(0.1)
end

Synchronous Replication

Redis has no synchronous replication. Tarantool has it in versions from 2.6.

Connectors From Other Programming Languages

Both Redis and Tarantool support connectors from popular programming languages:

Go;
Python;
NodeJS;
Java.

Complete lists:

When Not to Use Redis or Tarantool

Both Redis and Tarantool are poorly tailored for OLAP tasks. Online Analytical Processing deals with historic or archive data. OLAP has relatively few transactions, queries are often complex and contain aggregation.

In both cases, data is stored line-by-line. This makes aggregation algorithms less effective as compared to column-oriented databases.

Redis and Tarantool use one-thread for data which makes parallelizing analytical queries impossible.

Ecosystem

Redis

There are three categories of Redis modules:

Enterprise;
verified and certified for Enterprise and Open Source;
unverified.

Enterprise modules:

full-text search;
storage and search by bloom-filters;
time series storage.

Certified:

storage of graphs and queries to them;
storage of JSON and queries to it;
storage of ML models and their operation.

All the modules sorted by the number of stars at Github: https://redis.io/modules

Tarantool

There are two categories of modules:

Embedded: https://www.tarantool.io/en/doc/latest/reference/
Enterprise: https://www.tarantool.io/en/enterprise_doc/rocksref/#closed-source-modules

Redis Advantages

It is easier to use.
There is more info on the web, 20 000 questions on Stackoverflow (7 000 of these pending unanswered).
The entry barrier is lower.
There are more people experienced with Redis.

Tarantool Advantages

Free developer support on Telegram.
Secondary indexes available.
Index iteration available.
UI for cluster administration available.
Application server with cooperative multitasking. It is similar to single-flow Go.
Higher ceiling in production.

4. Summary

Redis offers a great advanced cache, but it can’t be used as a primary storage. Tarantool is a multi-paradigm database that can be used as a primary storage. Tarantool supports:

Relational storage model with SQL.
Distributed NoSQL storage.
Advanced cache creation.
Making a queue broker.

Redis has a lower entry barrier. Tarantool has a higher ceiling in production.

	Redis	Tarantool
Description	Advanced in-memory cache.	Multi-paradigm DBMS with an integrated application server.
Data model	Key-value	Key-value, documents, relational DBMS
Website	redis.io	www.tarantool.io
Documentation	redis.io/documentation	www.tarantool.io/ru/doc/latest
Developer	Salvatore Sanfilippo, Redis Labs	Mail.ru Group
Current release	6.2	2.7.2
License	The 3-Clause BSD License	The 2-Clause BSD License
Implementation language	C	C, C++
Supported OS	BSD, Linux, MacOS, Win	BSD, Linux, MacOS
Data schema	Key-value	Flexible
Secondary indexes	No	Yes
SQL support	No	For one instance — ANSI SQL
Foreign keys	No	Yes, with SQL
Triggers	No	Yes
Transactions	Optimistic locking, primitive execution	ACID, read committed
Scaling	Sharding within a fixed range.	Sharding within an adjustable amount of virtual buckets.
Multitasking	Yes, server serialization	Yes, cooperative multitasking
Persistence	Snapshots and journaling.	Snapshots and journaling.
Consistency concept	Eventual Consistency Strong eventual consistency with CRDTs	Immediate Consistency
API	RESP open protocol	Open binary protocol (on MsgPack base)
Script language	Lua	Lua
Supported languages	C, C#, C++, Clojure, Crystal, D, Dart, Elixir, Erlang, Fancy, Go, Haskell, Haxe, Java, JavaScript (Node.js), Lisp, Lua, MatLab, Objective-C, OCaml, Pascal, Perl, PHP, Prolog, Pure Data, Python, R, Rebol, Ruby, Rust, Scala, Scheme, Smalltalk, Swift, Tcl, Visual Basic	C, C#, C++, Erlang, Go, Java, JavaScript, Lua, Perl, PHP, Python, Rust

5. References

You can download Tarantool here at the official website.
Get help in the Telegram chat.

Tags:

tarantool

Hubs: