Pull to refresh
40.52

Data storages *

Posts about data storages

Show first
Rating limit
Level of difficulty

Decentralized Torrent storage in DHT

Reading time5 min
Views2.1K

The DHT system has existed for many years now, and torrents along with it, which we successfully use to get any information we want.

Together with this system, there are commands to interact with it. There are not many of them, but only two are needed to create a decentralized database: put and get.

This is what will be discussed below...

HDB++ TANGO Archiving System

Reading time3 min
Views1.1K
main

What is HDB++?


This is a TANGO archiving system, allows you to save data received from devices in the TANGO system.


Working with Linux will be described here (TangoBox 9.3 on base Ubuntu 18.04), this is a ready-made system where everything is configured.


What is the article about?


  • System architecture.
  • How to set up archiving.

It took me ~ 2 weeks to understand the architecture and write my own scripts for python for this case.


What is it for?


Allows you to store the history of the readings of your equipment.


  • You don't need to think about how to store data in the database.
  • You just need to specify which attributes to archive from which equipment.
Read more →

The Rules for Data Processing Pipeline Builders

Reading time5 min
Views3.8K


"Come, let us make bricks, and burn them thoroughly."
– legendary builders

You may have noticed by 2020 that data is eating the world. And whenever any reasonable amount of data needs processing, a complicated multi-stage data processing pipeline will be involved.


At Bumble — the parent company operating Badoo and Bumble apps — we apply hundreds of data transforming steps while processing our data sources: a high volume of user-generated events, production databases and external systems. This all adds up to quite a complex system! And just as with any other engineering system, unless carefully maintained, pipelines tend to turn into a house of cards — failing daily, requiring manual data fixes and constant monitoring.


For this reason, I want to share certain good engineering practises with you, ones that make it possible to build scalable data processing pipelines from composable steps. While some engineers understand such rules intuitively, I had to learn them by doing, making mistakes, fixing, sweating and fixing things again…


So behold! I bring you my favourite Rules for Data Processing Pipeline Builders.

Read more →

Bcache against Flashcache for Ceph Object Storage

Reading time11 min
Views2.7K

Fast SSDs are getting cheaper every year, but they are still smaller and more expensive than traditional HDD drives. But HDDs have much higher latency and are easily saturated. However, we want to achieve low latency for the storage system, and a high capacity too. There’s a well-known practice of optimizing performance for big and slow devices — caching. As most of the data on a disk is not accessed most of the time but some percentage of it is accessed frequently, we can achieve a higher quality of service by using a small cache.

Server hardware and operating systems have a lot of caches working on different levels. Linux has a page cache for block devices, a dirent cache and an inode cache on the filesystem layer. Disks have their own cache inside. CPUs have caches. So, why not add one more persistent cache layer for a slow disk?
Read more →

Why Enterprise Chat Apps isn’t built on Server-side Database like Hangouts, Slack, & Hip chat?

Reading time3 min
Views2.9K
One of the most significant tools for any organization to smoothen their collaborative world is only through a real-time chat application whether the conversation takes place on mobile or desktop. Hangouts, Slack and Hipchat have been in action for businesses to establish a decent conversation between their internal employees and clients right from small-scale to enterprises.

Those big players come into play where there requires team collaboration. The big players are built on a server-side database where the messages shared from one device to another is stored in their server database. Ultimately, this results in storing a huge amount of data within the server-side database (Cloud-database).

The consumption of cloud storage will be pretty high. The client-side database is more efficient where the messages relayed is stored in the client device. The messages will be queued to minimize the consumption of data in the server.
image
Read more →

A Brief Comparison of the SDS Architectures for Virtualization

Reading time6 min
Views2.8K

The search for a suitable storage platform: GlusterFS vs. Ceph vs. Virtuozzo Storage


This article outlines the key features and differences of such software-defined storage (SDS) solutions as GlusterFS, Ceph, and Virtuozzo Storage. Its goal is to help you find a suitable storage platform.

Gluster



Let’s start with GlusterFS that is often used as storage for virtual environments in open-source-based hyper-converged products with SDS. It is also offered by Red Hat alongside Ceph.
GlusterFS employs a stack of translators, services that handle file distribution and other tasks. It also uses services like Brick that handle disks and Volume that handle pools of bricks. Next, the DHT (distributed hash table) service distributes files into groups based on hashes.
Note: We’ll skip the sharding service due to issues related to it, which are described in linked articles.

image

When a file is written onto GlusterFS storage, it is placed on a brick in one piece and copied to another brick on another server. The next file will be placed on two or more other bricks. This works well if the files are of about the same size and the volume consists of a single group of bricks. Otherwise the following issues may arise:
Read more →

GitHub Package Registry will support Swift packages

Reading time1 min
Views1.2K
On May 10, we announced the limited beta of GitHub Package Registry, a package management service that makes it easy to publish public or private packages next to your source code. It currently supports familiar package management tools: JavaScript (npm), Java (Maven), Ruby (RubyGems), .NET (NuGet), and Docker images, with more to come.

Today we’re excited to announce that we’ll be adding support for Swift packages to GitHub Package Registry. Swift packages make it easy to share your libraries and source code across your projects and with the Swift community.

Read more →

VShard — horizontal scaling in Tarantool

Reading time14 min
Views2.5K


Hi, my name is Vladislav, and I am a member of the Tarantool development team. Tarantool is a DBMS and an application server all in one. Today I am going to tell the story of how we implemented horizontal scaling in Tarantool by means of the VShard module.

Some basic knowledge first.

There are two types of scaling: horizontal and vertical. And there are two types of horizontal scaling: replication and sharding. Replication ensures computational scaling whereas sharding is used for data scaling.

Sharding is also subdivided into two types: range-based sharding and hash-based sharding.

Range-based sharding implies that some shard key is computed for each cluster record. The shard keys are projected onto a straight line that is separated into ranges and allocated to different physical nodes.

Hash-based sharding is less complicated: a hash function is calculated for each record in a cluster; records with the same hash function are allocated to the same physical node.

I will focus on horizontal scaling using hash-based sharding.
Read more →

Authors' contribution