Meta-Attention Is All You Need

What if an LLM could observe its own hidden states? Meta-transformers use cognitive tokens and meta-attention to turn internal uncertainty into usable signal.

The basis of artificial intelligence

What if an LLM could observe its own hidden states? Meta-transformers use cognitive tokens and meta-attention to turn internal uncertainty into usable signal.

Developers of smart cameras, smart DVRs, and neural-network video analytics for surveillance systems need AI models capable of operating in real-world street conditions. Out there, nobody walks around with professional cameras, carefully adjusts angles, sets up lighting, records without compression, or follows the common sense taught in cinematography textbooks.
Of course, Gambit can be used for many other tasks, but its main focus is the convenient collection of material FROM video surveillance systems and dataset annotation specifically FOR video surveillance neural networks.
Gambit is not designed for polished photos and Internet reels. Quite the opposite — it is intended for low-quality surveillance archive footage. At SpesLab, we call this kind of content “wild.”

In the previous article I described my “anime factory” in detail — a pipeline that automatically turns episodes into finished Shorts. But inside that system there is one especially important module that deserves a separate deep dive: a virtual camera for automatic reframing.
In this article, I will break down not just an “auto-crop function,” but a full virtual camera algorithm for vertical video. This is exactly the kind of task that looks simple at first glance: you have a horizontal video, you need to turn it into 9:16, keep a person in frame, and avoid making the result look like a jittery autofocus camera from the early 2010s.
But as soon as you try to build it not for a demo, but for a real pipeline, engineering problems immediately show up:

Hi, Habr!
Over the past few months, I have been building a system that I internally call an “anime factory”: it takes a source episode as input and produces a ready-to-publish YouTube Short with dynamic reframing, subtitles, post-processing, and metadata.
What makes it interesting is not just the fact that editing can be automated, but that a significant part of this work can be decomposed into engineering stages: transcription, audio and scene analysis, strong-moment discovery, “virtual camera” control, and a feedback loop based on performance metrics.
In this article, I will show how this pipeline is structured, why I chose a modular architecture instead of an end-to-end black box, where the system broke, and which decisions eventually made it actually usable.

After the unexpected divorce between LeCun and Meta, there is a lot of talk that the dead-end in LLM progress will be overcome through the physics of the world. That is, having a neural network work with physical data from the surrounding environment will allow the model to acquire meaning and an understanding of its actions. LeCun has a foundational paper that nobody is going to read. So, I'll summarize it as best I can. Essentially, the idea is that the current trajectory of LLM development is doomed. As long as they are predicting the next token, real understanding — the emergence of real meaning — is impossible. LeCun proposes training neural networks on physical world data, assuming that building a model of it will allow the system to discard details and focus on meaning.
I agree with LeCun that using world data will partially solve the data scarcity problem. But here I see a problem that engineers might not understand. A physical model of the world is actually much poorer than human knowledge. Newton described the entire infinite number of possible falls with a few lines of formulas. I doubt LeCun wants to spend billions of dollars on this wonderful deduction.

LLMs fail at elementary math. Corporations spend billions, but ultimately are forced to attach calculators to computing machines of incredible power. All attempts to fix this via Chain-of-Thought, fine-tuning on arithmetic tasks, or context expansion have failed.
I conducted a series of experiments to understand why, and came to the conclusion that neural networks are simply not meant for discrete arithmetic. Their true purpose is continuous transformations.
This article describes the implementation of a novel neural network architecture that combines the precision of symbolic AI with the generalization capabilities of LLMs. As always, experiments and code are included.

November 20 marked the official launch Nano Banana Pro (Gemini-3-Pro-Image-Preview) with the powerful Gemini 3 Pro as its foundation. This is a more mature tool for design, infographics, and content. We will not only look at the new features and why this particular model is a breakthrough, but we will also see it in action with real examples.

In this article, we will not only install a local (and free) alternative to ChatGPT, but also review several open LLMs, delve into the advanced settings of LM Studio, connect the chatbot to Visual Studio Code, and teach it to assist us with programming. We will also look at how to fine-tune the model's behavior using system prompts.

On November 18, 2025, Google introduced a new combination: the Gemini 3 Pro model and the Google Antigravity IDE. The first is about controlled reasoning, long context, and multimodality. The second is about multi-agent development with artifacts and "transparent" steps. Headlines immediately flooded the feeds: "Cursor is dead.".
In this article, we break down what exactly Google has launched, why the words "the smartest model" are an exaggeration, how Antigravity differs from Cursor, which development scenarios are already changing, and where it's still too early to abandon your familiar stack.

Now you won't have to blush in front of your employer or teacher because an AI wrote the text for you. Just make the detectors believe that the author is a human. In this article, we'll tell you how to do it and what the trick is.
In 2025, sexting has become a real trend thanks to sexting neural networks and convenient platforms that make online intimate messaging safe and exciting. With the development of artificial intelligence, online sexting has turned into an art where everyone can enjoy virtual flirting without risk. I tested dozens of services and selected the TOP 10 bots and apps for sexting in Russian, evaluating them based on convenience, anonymity, and the quality of sexual correspondence. These sexting services offer everything: from anonymous sexting to virtual sex chat with self-destructing photos. Let's figure out which sexting chatbots and platforms are worthy of your attention and how they work.

In the fall of 2025, many people, myself included, opened their favorite hardware store to 'quickly grab another 32–64 GB of DDR5 for games, an IDE, and a couple of Docker containers'—only to close the tab in mild culture shock. The memory that cost a 'reasonable' amount in the summer suddenly cost almost as much as a mid-range graphics card.
In short, this isn't 'greedy stores' but the consequence of a rather complex restructuring of the entire DRAM market for AI servers and HBM memory. In this article, we'll explore what's happening at memory factories, why PC modules are suffering the most, what to expect in 2026, and how to make upgrade decisions if you're a gamer, developer, or just a hardware enthusiast.


In my previous Russian-language article on Machine Learning as Alchemy, I discussed the possibility of discovering novel solutions without relying on GPUs or expensive computing clusters. In this article, I will share my experiments with continual learning and the compositionality of thought using micro-neural networks, and explain what the philosopher Lev Vygotsky has to do with it all.
Image generation by neural networks has become a 'regular button' alongside familiar design tools. Today, you can create an image from a description in Russian, right in your browser, often without registration and, importantly, for free. Such a 'free image generator' is useful not only for designers: entrepreneurs create product cards and hero banners, SMM specialists create ad creatives and stories, journalists and bloggers create illustrations for their materials, and developers create interface prototypes and game mockups.
Why has this topic become so popular?

In modern neural networks, including Transformer-based LLMs, unbounded activation functions—ReLU and GELU—have become the standard. Their main advantage is good gradient flow and the rapid training of deep models.
However, in practice, a problem is observed: when dominant patterns or high-frequency noise appear in the input context (long dialogues, noisy data, repetitive or dominant tokens), models become unstable and prone to generation degradation and hallucinations.
In this article, I attempted to find out if the choice of activation function could be fundamentally linked to LLM hallucinations.

In my previous article, I noted some interesting behavior regarding Weight Decay; here, I examine it in detail.
It is generally accepted in the ML industry that if we take a pre-trained model and fine-tune it on a new task, the old weights are gradually overwritten. Furthermore, if we add Weight Decay (L2 regularization), the process of "forgetting" superfluous information should theoretically happen even faster.
I tested this claim experimentally. The results were counter-intuitive: under specific settings, Weight Decay works in the exact opposite way—it protects the old structure from destruction.
Below is a description of the experiment and conclusions for those involved in model training and AI safety.

In my previous article, I explored the phenomenon of subliminal learning, but it raised more questions than answers. It is time to dive deeper. Below, you will find the experiments and the code.
In the fields of AI Alignment and LLM Security, a critical question remains: does fine-tuning or Reinforcement Learning from Human Feedback (RLHF) guarantee the removal of unwanted information?
Spoiler: The experiments demonstrated that the well-known Mode Connectivity effect makes the complete erasure of pre-training information practically impossible during standard fine-tuning. Structural Imprinting persists in the weight topology and can be read through a subliminal channel. Even with full weight unfreezing and aggressive L2 regularization (active forgetting), the latent space topology formed during the pre-training stage persists and determines the solution to the new task with an accuracy of 88–99%.

2025. Algorithms have seamlessly integrated into our lives—from work to education, creativity, and daily routines. They edit texts, select fonts, generate ideas, assist with coding, compose music, and more. Frankly speaking, the only thing they can’t do yet is brew your coffee. Although... that might just be a matter of time.
Just two years ago, we were amazed by neural networks hesitantly manipulating objects in photos. Who could predict back then that Will Smith’s spaghetti feast would mark the beginning of such a revolution?
With new opportunities come fresh challenges. How do you navigate this vast landscape? What tools are truly effective? Which ones fit your needs best? Where can you avoid paying, registering, or deciphering complex interfaces?
We’ve compiled a list of reliable and user-friendly neural networks ready for immediate use without unnecessary hassles. The services are categorized neatly: text generation, image creation, video production, music composition, presentations, and much more. Each category showcases three top-rated options!
Yes, many services offer paid subscriptions. But today, we're focusing solely on what works freely, no credit card required!

In my previous article, I showed how researchers confused being 'aware' (signal registration) with being 'conscious' (subjective awareness). But this is no accident — it is part of a narrative being constructed by AI labs. Anthropic is leading this trend. Let’s break down their latest paper, where a "learned pattern" has suddenly turned into "malicious intent."