• Lossless ElasticSearch data migration

    • Translation

    Academic data warehouse design recommends keeping everything in a normalized form, with links between. Then the roll forward of changes in relational math will provide a reliable repository with transaction support. Atomicity, Consistency, Isolation, Durability — that's all. In other words, the storage is explicitly built to safely update the data. But it is not optimal for searching, especially with a broad gesture on the tables and fields. We need indices, a lot of indices! Volumes expand, recording slows down. SQL LIKE can not be indexed, and JOIN GROUP BY sends us to meditate in the query planner.

    Read more →
  • Accelerating PHP connectors for Tarantool using Async, Swoole, and Parallel

      In the PHP ecosystem, there are currently two connectors for the Tarantool server: the official PECL extension tarantool/tarantool-php written in C, and tarantool-php/client written in PHP. I am the author of the latter one.

      In this article I would like to share the results of performance testing of both these libraries and show how you can achieve 3x-5x performance improvement (on synthetic tests!) with minimal changes in code.
      Read more →
    • How to write the home address right?

        How Tax Service, OpenStreetMap, and InterSystems IRIS
        could help developers get clean addresses

        Pieter Brueghel the Younger, Paying the Tax (The Tax Collector), 1640

        In my previous article, we just skimmed the surface of objects. Let's continue our reconnaissance. Today's topic is a tough one. It's not quite BIG DATA, but it's still the data not easy to work with: we're talking about fairly large amounts of data. It won't all fit into RAM at once, and some of it won't even fit on the drive (not due to lack of space, but because there's a lot of junk). The name of our subject is FIAS DB: the Federal Information Address System database — the databases of addresses in Russia. The archive is 5.5 GB. And it's a compressed XML file. After extraction, it will be a full 53 GB (set aside 110 GB for extraction). And when you start to parse and convert it, that 110 GB won't be enough. There won't be enough RAM either.
        Read more →