Pull to refresh

Development

Show first
Rating limit
Level of difficulty

AWS SageMaker: Choosing the Right Inference Type for ML Models

Level of difficultyEasy
Reading time5 min
Views59

When I started working with AWS SageMaker, one of the most common questions was: “Which inference type should I choose for my model?” SageMaker offers four different options, and at first glance, the differences between them aren’t always obvious. Let’s break down when and which approach to use.

Read more

StarRocks vs. ClickHouse, Apache Druid, and Trino

Level of difficultyEasy
Reading time8 min
Views116

In the big data era, data is one of the most valuable assets for enterprises. The ultimate goal of data analytics is to power swift, agile business decision making. As database technologies advance at a breathtaking pace in recent years, a large number of excellent database systems have emerged. Some of them are impressive in wide-table queries but do not work well in complex queries. Some support flexible multi-table queries but are held back by slow query speed.

Each type of data has a data model that best represents them. However, in real business scenarios, there is no such thing as ultra-fast data analytics under the perfect data model. Big data engineers sometimes have to make compromises on data models. Such compromises may cause long latency in complex queries or damage the real-time query performance because engineers must take the trouble to convert complex data models into flat tables.

New business requirements put forward new challenges for database systems. A good OLAP database system must be able to deliver excellent performance in both wide-table and multi-table scenarios. This system must also reduce the workload of big data engineers and enable customers to query data of any dimension in real time without worrying about data construction.

Read more

A Small Practical Guide to Calculating the Economic Value of AppSec and DevSecOps

Level of difficultyMedium
Reading time5 min
Views93

Investing in Application Security (AppSec) and DevSecOps is no longer optional; it's a strategic imperative. However, securing budget and justifying these initiatives requires moving beyond fear and speaking the language of business: Return on Investment (ROI).

This guide provides a structured framework for calculating the costs and benefits of embedding security into your software development lifecycle (SDLC). By understanding and applying concepts like Total Cost of Ownership (TCO), Lifecycle Cost Analysis (LCCA), and Return on Security Investment (ROSI), you can build a compelling financial case, guide your security strategy, and prove tangible value to stakeholders.

Read more

Stream-first Gotenberg Client for Go

Level of difficultyMedium
Reading time2 min
Views107

Go client for Gotenberg — document conversion service supporting Chromium, LibreOffice, and PDF manipulation engines.

Features

- Chromium: Convert URLs, HTML, and Markdown to PDF

- LibreOffice: Convert Office documents (Word, Excel, PowerPoint) to PDF

- PDF Engines: Merge, split, and manipulate PDFs

- Webhook support: Async conversions with callback URLs

- Stream-first: Built on httpstream for efficient multipart uploads

Read more

Stream-first HTTP Client for Go

Level of difficultyMedium
Reading time5 min
Views177

Stream-first HTTP Client for Go. Efficient, zero-buffer streaming for large HTTP payloads — built on top of net/http.

httpstream provides a minimal, streaming-oriented API for building HTTP requests without buffering entire payloads in memory.Ideal for large JSON bodies, multipart uploads, generated archives, or continuous data feeds.

- Stream data directly via io.Pipe—no intermediate buffers

- Constant memory usage (O(1)), regardless of payload size

- Natural backpressure (writes block when receiver is slow)

- Thin net/http wrapper—fully compatible

- Middleware support: func(http.RoundTripper) http.RoundTripper

- Fluent API for readability (GETPOSTMultipart, etc.)

- No goroutine leaks, no globals

httpstream connects your writer directly to the HTTP transport. Data is transmitted as it's produced, allowing the server to start processing immediately—without waiting for the full body to be buffered.

Read more

The LLM's Narrative Engine: A Critique of Prompting

Level of difficultyEasy
Reading time8 min
Views107

In a previous article, I proposed the holographic hypothesis: an LLM isn't a database of facts, but an interference field—a landscape of probabilities shaped by billions of texts. But a static landscape is just potential. How does the model actually move through it? How does it choose one specific answer from infinite possibilities?

This is where the Narrative Engine comes in. If the holographic hypothesis describes the structure of an LLM's "mind," the narrative engine hypothesis describes its dynamics. It is the mechanism that drives the model, forcing its probabilistic calculations to follow the coherent pathways of stories. This article critiques modern prompting techniques through this new lens, arguing that we are not programming a machine, but initiating a narrative.

Read more

Comparison: StarRocks vs Apache Druid

Level of difficultyEasy
Reading time5 min
Views145

Apache Druid has been a staple for real-time analytics. However, with evolving and sophisticated analytics demands, it has faced challenges in satisfying modern data performance needs. Enter StarRocks, a high-performance, open-source analytical database, designed to adeptly meet the advanced analytics needs of contemporary enterprises by offering robust capabilities and performance.

In this article, we’ll explore the functionalities, strengths, and challenges of both Apache Druid and StarRocks. Using practical examples and benchmark results, we aim to guide you in identifying which database might best meet your data needs.

Read more

LLM as a Resonance-Holographic Field of Meanings

Level of difficultyEasy
Reading time14 min
Views472

Alright. I pose the same question to an LLM in various forms. And this statistical answer generator, this archive of human knowledge, provides responses that sometimes seem surprisingly novel, and other times, derivative and banal.

On Habr, you'll find arguments that an LLM is incapable of novelty and creativity. And I'm inclined to agree.
You'll also find claims that it shows sparks of a new mind. And, paradoxically, I'm inclined to agree with that, too.

The problem is that we often try to analyze an LLM as a standalone object, without fully grasping what it is at its core. This article posits that the crucial question isn't what an LLM knows or can do, but what it fundamentally is.

Read more

How we boosted SQL query accuracy by 33% with LLMs

Level of difficultyMedium
Reading time8 min
Views424

Traditional approaches to SQL query generation often rely on instruction-tuned language models, but these can be inefficient and inaccurate. In this article, we’ll explore a new method based on reinforcement learning for model fine-tuning, which can improve both the accuracy and efficiency of SQL generation.

Read more

OAuth 2.0 authorization in PostgreSQL using Keycloak as an example

Level of difficultyEasy
Reading time27 min
Views643

Hello, Habr! We continue the series of articles on the innovations of the Tantor Postgres 17.5.0 DBMS, and today we will talk about authorization support via OAuth 2.0 Device Authorization Flow is a modern and secure access method that allows applications to request access to PostgreSQL on behalf of the user through an external identification and access control provider, such as Keycloak, which is especially convenient for cloud environments and microservice architectures (the feature will also be available in PostgreSQL 18). In this article, we'll take a step-by-step look at configuring OAuth authorization in PostgreSQL using Keycloak: configure Keycloak, prepare PostgreSQL, write an OAuth token validator in PostgreSQL, and verify successful authorization via psql using Device Flow.

Read more

Exposed: Custom column types

Level of difficultyEasy
Reading time8 min
Views364

Exposed is an SQL library for Kotlin with DSL and DAO APIs for database interactions. While it comes with support for standard SQL data types, you can extend its functionality by creating custom column types.

Custom column types are useful when Exposed lacks support for specific database types (like PostgreSQL's enum, inet or ltree) or when you want to map columns to domain-specific types that better align with your business logic. By implementing custom columns, you gain control over data storage and retrieval while maintaining type safety.

In this article, we'll explore how to create custom column types in Exposed by creating a simple column type for PostgreSQL's enum.

Read more

4 best tips to building high-quality data products from SYNQ

Level of difficultyEasy
Reading time6 min
Views344

The “test everything” principle doesn’t improve data quality — it destroys it. Hundreds of useless alerts create noise that drowns out truly important signals, and the team stops responding to them. Google and Monzo have already moved away from this approach.

Here’s how to shift from blanket testing to targeted checks at nodes with the greatest impact radius — and why one well-placed test at the source is worth more than a hundred checks downstream.

Read more

Privacy on Mobile: a practitioner’s checklist

Level of difficultyMedium
Reading time13 min
Views5.7K

People have always valued privacy. Developments of the past decades — the internet, social networks, targeted advertising — turned data into an asset. The AI wave multiplies what can be inferred from crumbs. Phones and apps are integral to people’s lives. Some users keep everything on their phones; others are more restrictive. It shouldn’t rely only on user awareness: developers should provide the first line of defence and the tools that protect a user’s right to privacy. Even if you already deal with most of these pieces daily, I want to share my mental model — how I frame decisions with checklists and a few concrete examples from practice.

Read more

Shardman. A quick guide for the architect

Reading time22 min
Views730

The myth of the magical fast=true parameter is still alive and well, but in distributed databases, another contender appears: distributed=true. Neither one will save you if you don’t rethink your schema, sharding keys, sequences, queries, and migration process. We walk through every corner with a clear-eyed approach — from choosing sharding keys and colocated tables to CDC, topologies, and foreign key constraints — showing where performance really improves, where it gets more expensive, and how to deal with it.

Read more

AI slop coding, or How to build ridiculously long attack chains with AI

Level of difficultyEasy
Reading time7 min
Views776

While researching malware used by attacker groups, we came across a series of unusual attacks that used GitHub repositories to store malicious files and victim data. These campaigns appear targeted rather than large-scale, and it seems the attackers relied heavily on AI during development. The earliest activity we traced was in September 2024, and the most recent in April 2025.

Our Threat Intelligence team investigates complex attacks featuring novel persistence and data collection methods and unique infrastructures. Sometimes we find simple two-line scripts, and other times we run into "bombs" that trigger dozens of different payloads at once. But it's pretty rare for us to come across such long chains of really simple AI-written scripts that still work, tied together in a way that clearly wasn't random. Think of this as an APT-style attack implemented at the "script kiddie" level (a derogatory term in hacker culture for those who rely on scripts or programs written by others).

Read more

How to successfully migrate from Oracle to Postgres Pro Enterprise

Level of difficultyMedium
Reading time8 min
Views617

Migration from Oracle to vanilla PostgreSQL hits roadblocks with packages, autonomous transactions, and collections—they simply don’t exist there. We’ll break down why ora2pg stumbles, how native implementations of these mechanisms in Postgres Pro Enterprise make life easier, and how ora2pgpro translates PL/SQL semantically correctly, without hacks or crude regex.

Read more

Give Your AI Agent Sight: Integrating Chrome DevTools with MCP

Level of difficultyEasy
Reading time3 min
Views1.2K

Hey everyone! I’m excited to share something that’s a real game-changer for anyone who writes code for the web. I’m talking about the new Chrome DevTools Model Context Protocol (MCP) server. If you want to know more details, read the article until the end.

Read more

Quitting the Samurai Path: How EXANTE Is Changing Its Infrastructure, or How We Failed at Going Cloud Native

Level of difficultyEasy
Reading time5 min
Views682

From hype to strategy: how EXANTE redefined Cloud Native after painful Kubernetes mistakes, lessons learned, and building a more resilient infrastructure

Read more
1
23 ...