melanny20 Apr 28 at 11:00

By next year, we'll be talking to databases in natural language

Easy

4 min

705

Postgres Professional corporate blogPostgreSQL*SQL*

Review

Translation

Original author: https://habr.com/ru/users/Safreliy/

The future of database interaction

According to Gartner, natural language queries will replace SQL as early as 2026.

While Gartner's prediction may be optimistic, the shift toward natural language interfaces for databases is inevitable. The timeline may vary, but the transition itself is a certainty.

Companies are already taking steps in this direction. Oracle has developed the APEX AI Assistant, which interactively generates and executes SQL queries. Meanwhile, Hugging Face now offers tools that allow developers to explore datasets using natural language queries converted into SQL.

The use of LLMs to generate SQL and simplify database interactions is a logical step in the evolution of database management systems. The real question is: how soon will the industry fully embrace this transformation?

Training LLMs to write SQL

At PGProDay 2025, I presented our approach to generating SQL queries from natural language user inputs.

The advantages of LLMs

Not only data analysts and engineers, but also non-technical users — such as marketers, finance professionals, and managers — want easy access to data without needing to know SQL. With LLMs capable of reasoning, the transition to natural language interactions with databases is only a matter of time.

SQL is a structured, declarative language that well-suited for LLM processing. Its predictable structure and templated tasks make it an ideal candidate for democratized data access. Eventually, business users will be able to query databases in natural language without needing to learn SQL syntax.

For example, in a logistics company, queries like “Find flights delayed by more than 2 hours” or “Calculate the average fleet utilization” can be seamlessly translated into SQL statements using SELECT and JOIN.

A key advantage of specialized LLMs is domain-specific customization. Models trained on specific database schemas and business glossaries outperform generic LLMs. For instance, if one database column labeled revenue includes returned items while another does not, a domain-trained LLM will recognize and handle this discrepancy correctly. It can also interpret complex metrics such as “kilometer-hours of vehicle operation” or “repeat purchase conversion rates.”

The evolving role of data engineers

The role of data engineers will shift from writing SQL queries to managing metadata, prompt engineering, and training models. Instead of manually building ETL pipelines, engineers will leverage LLM agents to refine and optimize solutions within existing ecosystems.

However, optimized SQL queries will still be necessary for high-load systems and complex analytical reports. Students today should focus on understanding ontologies, data models, and prompt engineering while maintaining a solid foundation in database fundamentals.

The importance of metadata

Gartner emphasizes the critical role of metadata collection and structuring before implementing AI-driven database management. Metadata provides a semantic framework that enables LLMs to process queries with contextual awareness.

This aligns with the concept of Fluid Data, where data structures dynamically adapt to changing environments. Key metadata components include:

Field types, table relationships, and integrity constraints
Business logic rules (e.g., revenue = sales – returns + discounts)
Data lineage tracking (e.g., temperature_raw → noise filtering → temperature_clean)

Fluid data

Fluid Data simplifies the transition between relational, graph, hierarchical, and document-based databases using LLMs as translators. This capability is particularly valuable for sectors like banking, where detecting fraudulent transactions requires shifting from relational to graph-based analysis. Fraudsters often move money through a network of intermediaries, which is difficult to detect in a relational model but straightforward in a graph database.

From relational to graph databases

While transitioning from relational to graph databases is complex, it is feasible. LLMs can automate this process by translating natural language queries into Cypher, the query language for graph databases.

Additionally, LLMs can automate ETL/ELT processes, analyze data sources, and suggest optimal data pipelines. They can also assist with schema migrations, making database adaptation to new business needs more seamless.

Adapting database schemas to new data

Consider a steel manufacturing plant implementing predictive analytics for monitoring blast furnaces. New IoT sensors collect additional parameters:

vibration_spectrum: JSON data representing vibration frequency, e.g., {"10Hz": 0.5, "20Hz": 0.8}
electrode_wear_rate: electrode wear rate in % per hour

Previously, the database only stored basic metrics such as temperature, pressure, and output_volume. ETL pipelines loaded data into a furnace_health table but did not account for the new parameters.

Engineers need reports to answer:

How does vibration spectrum correlate with electrode wear?
When should furnace maintenance be scheduled?

Manually updating schemas, transforming ETL processes, and generating reports would be time-consuming and costly. This is where LLMs shine — they can instantly detect schema mismatches, rewrite pipelines, and generate reports without human intervention.

Autonomous databases are on the horizon

Data is becoming interactive at the semantic rather than syntactic level. This paves the way for fully autonomous databases, where LLMs handle everything from raw data ingestion to analytical report generation.

Business leaders should start preparing now by:

Cataloging metadata and implementing Data Catalogs
Testing NLP-driven database interfaces in low-risk scenarios

Ignoring this trend will leave organizations behind, stuck with rigid data systems. Meanwhile, competitors will be making real-time decisions using chat-based data interfaces.

Hubs: