Pull to refresh
VK
Building the Internet

In-Memory Showdown: Redis vs. Tarantool

Reading time13 min
Views5.3K
image

In this article, I am going to look at Redis versus Tarantool. At a first glance, they are quite alike — in-memory, NoSQL, key value. But we are going to look deeper. My goal is to find meaningful similarities and differences, I am not going to claim that one is better than the other.

There are three main parts to my story:

  • We’ll find out what is an in-memory database, or IMDB. When and how are they better than disk solutions?
  • Then, we’ll consider their architecture. What about their efficiency, reliability, and scaling?
  • Then, we’ll delve into technical details. Data types, iterators, indexes, transactions, programming languages, replication, and connectors.

Feel free to scroll down to the most interesting part or even the summary comparison table at the very bottom and the article.

Content


  1. Introduction
    • What is an in-memory database, or IMDB
    • Why IMDBs are needed
    • What is Redis
    • What is Tarantool
  2. Architecture
    • Performance
    • Reliability
    • Scaling
    • Data schema validation
  3. Technical features
    • Supported data types
    • Data eviction
    • Iteration with keys
    • Secondary indexes
    • Transactions
    • Persistence
    • Programming languages for stored procedures
    • Replication
    • Connectors from other programming languages
    • When not to use Redis or Tarantool
    • Ecosystem
    • Redis advantages
    • Tarantool advantages
  4. Summary
  5. References

1. Introduction


What Is an in-Memory Database, or IMDB?


Redis and Tarantool are in-memory technologies. What is an in-memory database, or IMDB? This is a database that stores all the data in RAM. Its size is limited by the RAM capacity of the node. Although limiting the data amount, this greatly increases the speed.

In-memory DBs can store persist on disks. The node can be restarted without losing the information. Today, in-memory DBs can already be used as the main storage in production. For instance, Mail.ru Cloud Solutions uses Tarantool as the main DB to store metadata in their S3 compatible object repository.

In-memory DBs are also used for high-speed data access, capable of 10,000 requests per second. These can be the cases of large spikes in traffic on IMDB on the day of the Zack Snyder's Cut of Justice League release, Amazon a week before Christmas, or Uber on Friday night.

Why in-Memory DBs Are Needed?


Cache. In-memory DBs are often used as a cache for disk databases. RAM is way faster than any disk (even an SSD). However, caches restart, crash, can be inaccessible through the network, suffer from memory shortage and other issues.

Eventually, caches learned to provide persistence, reservation, and sharding.

  • Persistence means caches store data on disks. After restart, the status is restored without addressing the main storage. If we don’t do this, addressing a cold cache will take a really long time and can even result in the main DB crashing.
  • Reservation means caches can replicate data. If one node crashes, the second one will receive the queries. The main storage won’t crash because of overloading, as the reserve node will be there.
  • Sharding means that if hot data doesn't fit into a node's RAM, several nodes are used in parallel. This is horizontal scaling.

Sharding is a large-scale system. Reservation is a reliable system. Together with persistence, we get clustered data storage, which can be used to store terabytes of data and access it at remarkable speed, even at 1,000,000 RPS.

OLTP stands for Online Transaction Processing. In-memory solutions are fit for tasks of this type thanks to their architecture. OLTP comprises many short online transactions like INSERT, UPDATE, DELETE. The main thing with OLTP systems is fast procession of queries and ensuring data integrity. Efficiency is usually measured in RPS.

What’s Redis?


  • Redis is an in-memory data structure store.
  • Redis is a key-value store.
  • If you Google «database caching,» almost every article will mention Redis.
  • Redis only offers primary key access and doesn’t support secondary indexing.
  • Redis contains a Lua stored procedure engine.

What's Tarantool?


  • Tarantool is an in-memory computing platform.
  • Tarantool is a key-value store that supports documents and a relational data model.
  • It has been designed for hot data — MySQL caching in a social network, but gradually it became a fully-featured database.
  • Tarantool can provide any number of indexes.
  • Tarantool supports stored procedures in Lua, too.

2. Architecture


Performance


The most popular questions about in-memory DBs are «How fast are they?» and «How many millions of RPS can we get from one core?». Let’s perform an easy synthetic test, approximating database settings as much as possible. The Go script will fill the storage with random keys having random values.

MacBook Pro 2,9 GHz Quad-Core Intel Core i7

Redis version=6.0.9, bits=64

Tarantool 2.6.2

Redis


File: redis_test.go

Content:

redis_test.go

package main

import (
    "context"
    "fmt"
    "log"
    "math/rand"
    "testing"

    "github.com/go-redis/redis"
)

func BenchmarkSetRandomRedisParallel(b *testing.B) {
    client2 := redis.NewClient(&redis.Options{Addr: "127.0.0.1:6379", Password: "", DB: 0})
    if _, err := client2.Ping(context.Background()).Result(); err != nil {
        log.Fatal(err)
    }

    b.RunParallel(func(pb *testing.PB) {
        for pb.Next() {
            key := fmt.Sprintf("bench-%d", rand.Int31())
            _, err := client2.Set(context.Background(), key, rand.Int31(), 0).Result()
            if err != nil {
                b.Fatal(err)
            }
        }
    })
}

Tarantool


Command: tarantool

Tarantool initialization:

tarantool>
box.cfg{listen='127.0.0.1:3301', wal_mode='none', memtx_memory=2*1024*1024*1024}
box.schema.user.grant('guest', 'super', nil, nil, {if_not_exists=true,})
box.schema.space.create('kv', {if_not_exists=true,})
box.space.kv:create_index('pkey', {type='TREE', parts={{field=1, type='str'}},
                                   if_not_exists=true,})

File: tarantool_test.go
Content:

package main

import (
    "fmt"
    "math/rand"
    "testing"

    "github.com/tarantool/go-tarantool"
)

type Tuple struct {
    _msgpack struct{} `msgpack:",asArray"`
    Key      string
    Value    int32
}

func BenchmarkSetRandomTntParallel(b *testing.B) {
    opts := tarantool.Opts{
        User: "guest",
    }
    pconn2, err := tarantool.Connect("127.0.0.1:3301", opts)
    if err != nil {
        b.Fatal(err)
    }
    b.RunParallel(func(pb *testing.PB) {
        var tuple Tuple
        for pb.Next() {
            tuple.Key = fmt.Sprintf("bench-%d", rand.Int31())
            tuple.Value = rand.Int31()
            _, err := pconn2.Replace("kv", tuple)
            if err != nil {
                b.Fatal(err)
            }
        }
    })
}

Launching. To load databases to the maximum, let’s use more threads.

go test -cpu 12 -test.bench . -test.benchtime 10s

goos: darwin
goarch: amd64
BenchmarkSetRandomRedisParallel-12          929368         15839 ns/op
BenchmarkSetRandomTntParallel-12            972978         12749 ns/op

Results. Average duration of Redis query was 15 microseconds. For Tarantool — 12 microseconds. It means Redis efficiency is 63,135 RPS, and Tarantool — 78,437 RPS. The test does not demonstrate the speed, but the efficiency of in-memory DBs. You can tweak the benchmark in a way that your DB of choice wins.

Reliability


Two basic methods are used for reliability in data storages:

  • Persistence At a restart, DB will load the data from the disc without any queries to outside systems.
  • Replication If one node crashes, there will be a copy at the other one. Replication can be asynchronous and synchronous,

Both Redis and Tarantool can do that. We’ll delve into some technical details further.

Scaling


Scaling can be used:

  • to reserve additional nodes that can replace each other in case one of crashes;
  • in case the data doesn’t fit into a single node and has to be distributed among several ones.

Redis


Redis nodes can be interconnected by means of asynchronous replication. We’ll call such nodes a replica set. It is Redis Sentinel that manages the replica set. Redis Sentinel is one special process or several of them clustered, to monitor Redis nodes. They perform 4 primary tasks:

  • Checking node status within the group — dead or alive.
  • Notifying the system administrator if something goes wrong within the group.
  • Automatic switching of the master.
  • Config provider for external clients for them to know where to connect.

If data is to be sharded to several nodes, Redis offers an open-source version of Redis Cluster. It supports building a cluster from several replication groups. Data within the cluster is sharded over 16 384 slots. Slot ranges are determined among Redis nodes.

Nodes within the cluster communicate over a separate open port to know their neighbors’ statuses. To work with Redis Cluster, the app should use a special connector.

Tarantool


Tarantool also supports replication and sharding. The key tool of scalability management is Tarantool Cartridge. It unites nodes into replica sets. You can make one group of that kind and use it similarly to Redis Sentinel. Tarantool Cartridge can manage several replica sets and shard data across them. vshard library is used for sharding purposes.

Differences


Administration

  • Scripts and commands in Redis Cluster.
  • Web-interface or API in Tarantool Cartridge.

Sharding buckets

  • Number of sharding buckets in Redis is fixed, equal to 16 384.
  • Number of sharding buckets in Tarantool Cartridge (vshard) is customizable. It is set up once — when a cluster is created.

Bucket rebalancing (resharding)

  • In Redis Cluster, setup and launching are manual,
  • They are automatic in Tarantool Cartridge (vshard).

Query routing

  • In Redis Cluster, queries are routed on the side of the client app.
  • In Tarantool Cartridge, queries are routed on the cluster router nodes.

Infrastructure

  • Tarantool Cartridge also contains:


Data Schema Validation


In Redis, the primary data schema is key-value, but the values can contain different structures. You cannot set validation rules on the server side. We can’t indicate how certain data types should be used and what structure a value should have. The schema must be validated by a connector or a client app.

Tarantool supports data schema validation on the server side:

  • using integrated validation box.space.format that covers only the top-level section of the fields;
  • using an installed Avro schema extension.

3. Technical Features


What Kind of Data Types Can Be Stored?


In Redis, only a string can be a key. Redis supports the following data types:

  • strings;
  • string lists;
  • unordered collections of strings;
  • hashmaps or just key-value string pairs;
  • ordered collections of strings;
  • Bitmap and HyperLogLog.

Tarantool supports the following data types:

  • Primitive
    • strings;
    • boolean (true or false);
    • integer;
    • with floating point;
    • with decimal floating point;
    • UUID.
  • Complex
    • arrays;
    • hashmaps.

Redis data types are better tailored for event counters, including the unique ones, as well as for storage of small finished data marts.

Tarantool data types are better tailored for storage of objects and/or documents both in SQL and NoSQL DBMS.

Data Eviction


Redis and Tarantool both have engines to limit the memory occupied. If a client attempts to add data after the limit has been reached, the databases will return an error. Both Redis and Tarantool will proceed with reading queries in that case, though.

Let’s see how «no longer required» data can be deleted. Redis comprises several data eviction engines:

  • TTL — object eviction as soon as their lifetime expires;
  • LRU — long used data eviction;
  • RANDOM — random object eviction;
  • LFU — rarely used data eviction.

All the engines can be set up either for the entire data amount or for the objects marked as evictable only.

In Tarantool, expirationd or indexpiration extensions can be used for eviction. Another option is creating your own background procedure that will be by-index implemented (e. g., with a timestamp) and will delete unnecessary data.

Iteration With Keys


In Redis, this can be done by means of operators:

  • SCAN;
  • iteration with keys.

Transactions return pages with results. To get a new page, an ID of the previous one has to be sent. Transactions support by-template filtration. For this, MATCH parameter is used. Filtration takes place when the page is sent, so some of the pages might be blank. However, it doesn’t mean there are no more pages.

Tarantool offers a more flexible «iteration with keys» schema. Both direct and inverse iterations are possible, and you can additionally filter the values on the go. You can move to a certain key value and then check the consequent keys either in the ascending or descending order. The check direction can’t be changed on the spot.

For example:

results = {}
for _, tuple in box.space.pairs('key', 'GE') do
    if tuple['value'] > 10 then
        table.insert(results, tuple)
  end
end
return results

Secondary Indexes


Redis


Redis has no secondary indexes, but there are some ways to imitate them.

  • The order number of the element can be used as a secondary key in ordered collections.
  • Otherwise, hashmaps can be used with their key considered as the data index.

Tarantool


In Tarantool, a custom amount of secondary data indexes can be created.

  • Secondary keys may contain several fields.
  • Types HASH, TREE, RTREE, and BITSET can be used for secondary indexes.
  • Secondary indexes may contain unique and non-unique keys.
  • Locale settings can be used for any indexes, e. g., for register-independent string values.
  • Secondary indexes can be based on fields with value arrays (sometimes referred as MultiIndexes).

Summary


Secondary keys and convenient iterators enable relational data storage model building in Tarantool. It is impossible to build such a model in Redis.

Transactions


Transactions enable primitive execution of several operations. Both Redis and Tarantool support transactions. Transaction example from Redis:

> MULTI
OK
> INCR foo
QUEUED
> INCR bar
QUEUED
> EXEC
1) (integer) 1
2) (integer) 1

​Transaction example from Tarantool

do
  box.begin()
  box.space.kv:update('foo', {{'+', 'value', 1}})
  box.space.kv:update('bar', {{'+', 'value', 1}})
  box.commit()
end

Persistence


Data persistence is ensured by two engines —
  • in-memory data recording to a disk at specified intervals — snapshotting;
  • successive write-ahead logging of all the incoming operations — transaction journal.

Both Redis and Tarantool have these persistence engines.

Redis


At specified intervals, Redis snapshots all in-memory data. By default, it is done every 60 seconds (customizable). Redis copies the current in-memory data using OS.fork and then stores data to the disk.

In case of an abnormal shutdown, Redis recovers its status from the most recent saving. If the last snapshot was made a long time ago, all the data received after the snapshot will be lost.

The transaction journal is used to store all the information arriving at the database. Every operation is logged in the on-disk journal. When Redis is started, it recovers its status from the snapshot and then adds the remaining operations from the journal.

  • In Redis, a snapshot is called RDB (Redis DataBase).
  • The transaction journal in Redis is called AOF (Append Only File).

Tarantool


  • The persistence engine is derived from database architectures.
  • It is comprehensive, with snapshotting and transaction journaling.
  • This mechanism ensures reliable WAL-based replication.

Tarantool snapshots the current in-memory data at specified intervals and records every transaction to the journal.

  • A snapshot in Tarantool is called a snap and can be made at any frequency.
  • In Tarantool, the transaction journal is called WAL (Write Ahead Log).

Each of the engines can be switched off both in Redis and in Tarantool. Both engines should be on for reliable data storage. You can trade-off persistence and turn off snapshotting and journaling to ensure the highest operation speed possible.

Differences


Redis uses OS.fork for snapshotting. Tarantool uses an internal readview of all the data, and this is faster than fork.

By default, Redis has snapshotting only. Tarantool has both snapshotting and transaction journaling.

Redis stores and uses only one file for both snapshotting and transaction journaling. Tarantool stores two snapshot files by default (but this number is customizable) and a consistently enlarging unlimited number of transaction journals. If a snapshot file is damaged in Tarantool, it can use the previous one to load. In Redis, you need to set up backups.

Unlike Redis, snapshotting and journaling in Tarantool form a common engine for data display in the file system. It means that in Tarantool snapshot files and journals store all the metainfo on the transaction, which is who made it and when. It has the same format and is complementary.

Troubleshooting


If a journal file is damaged in Redis:

redis-check-aof --fix

If a journal file is damaged in Tarantool:

tarantool> box.cfg{force_recovery=true}

Programming languages for stored procedures


Stored procedures are a code executed in the data section. Both Redis and Tarantool suggest using Lua for stored procedure creation. This language is quite simple. It was designed for those using programming for task solving in a specific area.

From a database developer point of view:

  • Lua can be easily integrated into an existing app.
  • It is easy to integrate with objects and processes of the app.
  • Lua has dynamic typization and automatic memory management.
  • This language has a garbage collector — incremental Mark&Sweep.

Differences


Implementation

  • Redis is a plain vanilla implementation of PUC-Rio.
  • Tarantool uses LuaJIT.

Task timeout

  • Redis allows you to set a timeout after which execution of a stored procedure will end.
  • In Tarantool stored procedures are compiled and executed faster, but no timeout can be set. To end a stored procedure, a user should make provisions to check the end flag.

Runtime

  • Redis is single-tasked: it executes the tasks one by one.
  • Tarantool uses cooperative multitasking. It executes the tasks one by one, but at the same time the task gives up IO operation management, in particular — directly by means of yield.

Summary


  • In Redis, Lua is just about the stored procedures.
  • In Tarantool, it is a cooperative runtime that supports communication with outside systems.

Replication


Replication is an engine that enables object copying from one node to another. Replication can be asynchronous and synchronous.

  • Asynchronous replication: after adding an object to one node we don’t wait for it to be replicated to the second node.
  • Synchronous replication: after adding an object, we wait for it to be saved at the first and the second node.

Redis and Tarantool support asynchronous replication, whereas synchronous replication is only available in Tarantool.

In some cases, we need to wait for the object replication:

  • in Redis, we use the wait command. It accepts only two parameters:
    • number of replicas an object has to obtain;
    • amount of time required for that to happen.
  • In Tarantool, it can be done with a code fragment — Pseudocode:

    while not timeout do
       if box.info.lsn <= (box.info.replication[dst].downstream.vclock[box.info.id] or 0) then
           break
       end
       fiber.sleep(0.1)
    end
    


Synchronous Replication


Redis has no synchronous replication. Tarantool has it in versions from 2.6.

Connectors From Other Programming Languages


Both Redis and Tarantool support connectors from popular programming languages:

  • Go;
  • Python;
  • NodeJS;
  • Java.

Complete lists:


When Not to Use Redis or Tarantool


Both Redis and Tarantool are poorly tailored for OLAP tasks. Online Analytical Processing deals with historic or archive data. OLAP has relatively few transactions, queries are often complex and contain aggregation.

In both cases, data is stored line-by-line. This makes aggregation algorithms less effective as compared to column-oriented databases.

Redis and Tarantool use one-thread for data which makes parallelizing analytical queries impossible.

Ecosystem


Redis


There are three categories of Redis modules:

  • Enterprise;
  • verified and certified for Enterprise and Open Source;
  • unverified.

Enterprise modules:

  • full-text search;
  • storage and search by bloom-filters;
  • time series storage.

Certified:

  • storage of graphs and queries to them;
  • storage of JSON and queries to it;
  • storage of ML models and their operation.

All the modules sorted by the number of stars at Github: https://redis.io/modules

Tarantool


There are two categories of modules:


Redis Advantages


  • It is easier to use.
  • There is more info on the web, 20 000 questions on Stackoverflow (7 000 of these pending unanswered).
  • The entry barrier is lower.
  • There are more people experienced with Redis.

Tarantool Advantages


  • Free developer support on Telegram.
  • Secondary indexes available.
  • Index iteration available.
  • UI for cluster administration available.
  • Application server with cooperative multitasking. It is similar to single-flow Go.
  • Higher ceiling in production.

4. Summary


Redis offers a great advanced cache, but it can’t be used as a primary storage. Tarantool is a multi-paradigm database that can be used as a primary storage. Tarantool supports:

  • Relational storage model with SQL.
  • Distributed NoSQL storage.
  • Advanced cache creation.
  • Making a queue broker.

Redis has a lower entry barrier. Tarantool has a higher ceiling in production.

Redis Tarantool
Description Advanced in-memory cache. Multi-paradigm DBMS with an integrated application server.
Data model Key-value Key-value, documents, relational DBMS
Website redis.io www.tarantool.io
Documentation redis.io/documentation www.tarantool.io/ru/doc/latest
Developer Salvatore Sanfilippo, Redis Labs Mail.ru Group
Current release 6.2 2.7.2
License The 3-Clause BSD License The 2-Clause BSD License
Implementation language C C, C++
Supported OS BSD, Linux, MacOS, Win BSD, Linux, MacOS
Data schema Key-value Flexible
Secondary indexes No Yes
SQL support No For one instance — ANSI SQL
Foreign keys No Yes, with SQL
Triggers No Yes
Transactions Optimistic locking, primitive execution ACID, read committed
Scaling Sharding within a fixed range. Sharding within an adjustable amount of virtual buckets.
Multitasking Yes, server serialization Yes, cooperative multitasking
Persistence Snapshots and journaling. Snapshots and journaling.
Consistency concept Eventual Consistency Strong eventual consistency with CRDTs Immediate Consistency
API RESP open protocol Open binary protocol (on MsgPack base)
Script language Lua Lua
Supported languages C, C#, C++, Clojure, Crystal, D, Dart, Elixir, Erlang, Fancy, Go, Haskell, Haxe, Java, JavaScript (Node.js), Lisp, Lua, MatLab, Objective-C, OCaml, Pascal, Perl, PHP, Prolog, Pure Data, Python, R, Rebol, Ruby, Rust, Scala, Scheme, Smalltalk, Swift, Tcl, Visual Basic C, C#, C++, Erlang, Go, Java, JavaScript, Lua, Perl, PHP, Python, Rust

5. References


  1. You can download Tarantool here at the official website.
  2. Get help in the Telegram chat.
Tags:
Hubs:
Total votes 18: ↑17 and ↓1+16
Comments3

Articles

Information

Website
vk.com
Registered
Founded
Employees
5,001–10,000 employees
Location
Россия
Representative
Миша Берггрен