andriygav Jul 28 at 06:01

Intelligent systems at phystech: 2025 graduation

14 min

772

Artificial IntelligencePopular scienceStudying in IT

Digest

The students of the Intelligent Systems Department successfully defended their bachelor’s and master’s theses. This year, 14 Bachelor’s and 8 Master’s students earned their degrees in Physics, Mathematics, and Computer Sciences. We are proud to say that our Department is unique in publishing the complete set of defense materials during the last ten years. These materials include the text of the dissertation work, the published papers, the code of the computational experiments, and the slides with video of the defense talk.

We encourage our students to publish the results of their scientific research in peer-reviewed journals. In 2024, our students published 53 papers. A great example of the thesis work is a formatted scientific paper for a BS student and several papers for an MS student. It ensures that the results of the student's research are critically reviewed and approved by the scientific society.

In 2025, the AI assistance became a challenge for thesis work defences. The problem is that AI assistants intrude on the text of student work. We omit many unnecessary formalities for the text preparation, like the desired number of pages and the list of performance criteria. Instead, we review the focus of the work. The personal student results shall impact the theory of machine learning. In this light, the AI assistance expands the students' experience and does not invade their texts.

In this post, we gladly summarize the defended works of our BS and MS students and highlight the results. A recording of their pre-defence presentations can be found here and here in Russian. Most part of the theses has a publicly available English version.

We motivate our students to contribute the main part of their efforts to the development of the theory of Machine Learning. However, the topics Optimization and Applied Data Science Research are included in the agenda.

Applied methods in machine learning

Research in applied machine learning methods is one of the popular topics among our students, as this research has the fastest contribution to our lives.

Galina Boeva’s master's thesis, supervised by our alumnus Dr. Alexey Zaytsev, introduces LANET — a model for predicting timestamp sets using attention and historical data aggregation. Its key innovation is modeling label relationships, supported by theoretical analysis and attention graph visualizations. The work resulted in a publication at the ECAI conference.

From pre-print version of the paper: architecture of LANET.

The master's thesis of Kseniia Petrushina, supervised by Dr. Alexander Panchenko, aimed at the challenge of detecting images that appear realistic but defy common sense — like a man sleeping on a rock or a snowplow driving through sand. It introduces two methods: one based on detecting logical contradictions between atomic facts describing the image (NLI), and another that uses the hidden representations of LVLMs (Linear Probing). Predictions from the NLI-based method correlate with the presence of hallucinations in the generated facts. The methods are compared against the Through the Looking Glass (TLG) approach, which learns the importance of each fact and achieves the highest accuracy in identifying strange images, with Linear Probing as a close second.

From pre-print version of the paper: a snow plow driving down a snowy street.

German Gritsay’s master's thesis, under the supervision of Dr. Andrey Grabovoy, considers the problem of machine-generated text detections. The thesis author improves its interpretability, handling diverse classification problems for detecting and analyzing AI-generated fragments. It proposes attention-based architectures, statistical analysis techniques, and multi-task learning methods to regularize feature representations and improve model generalization. The research work evaluates these approaches on synthetic datasets and real-world benchmarks, including international competitions, demonstrating their practical value in multilingual and multidomain scenarios. The work investigates approaches aimed at the detection of generated document-level fragments, presenting token-level classification with segmentation algorithms for variable-length fragments. The other presented approach, multi-task learning (MTL), reduces model complexity, which is supported by theoretical analysis proving lower Rademacher complexity compared to single-task approaches. Empirical results confirm the ability of MTL to cluster textual representations in vector space, acting as an implicit regularization and improving robustness across domains and generative models.

From the master’s thesis of German: PCA decomposition of text embeddings after the transform-encoder.

The master’s thesis of Bair Mikhailov, supervised by Dr. Dmitry Dylov, aimed at the analysis of the phylogenetic relationship between tomistomas, gavials, crocodiles, and alligator remains unresolved due to conflicting morphological and molecular evidence. This study introduces a machine learning framework to analyze brain endocasts derived from CT scans, aiming to resolve these evolutionary uncertainties. Their segmented brain endocasts from crocodilian cranial scans using a 2D U-Net architecture. Results demonstrate the potential of explainable deep learning to address phylogenetic controversies, offering a scalable, data-driven alternative to subjective morphological comparisons. This work bridges computational radiology and evolutionary biology, providing a template for quantitative neuroanatomical phenotyping in extinct and extant species.

From Bair's master's thesis: example of prediction of crocodilian brain endocasts.

Ildar Khabutdinov’s master's thesis, supervised by Dr. Andrey Grabovoy, considers two studies on grammatical error correction using the Sequence Tagging approach. The first study adapts the GECToR model for Russian, addressing the lack of annotated data by creating a synthetic dataset, achieving $F_{0.5} = 82.5$ on synthetic data and demonstrating knowledge transfer to the RULEC test set without fine-tuning $F_{0.5} = 22.2$ . The second study proposes a fully automated, annotation-free method using the Levenshtein algorithm to generate subword-level edits, which are language-agnostic and require no manual rules or dictionaries. Applied to the original GECToR model, it achieves competitive results in English: $F_{0.5} = 62.4$ (CoNLL-2014 dataset) and $F_{0.5} = 61.9$ (BEA-2019 dataset). Together, these studies showcase the adaptability of Sequence Tagging models for both resource-rich and low-resource languages.

From Ildar master’s thesis: Levenshtein matrix and editing instructions between source and target sequences.

The bachelor’s thesis study by Fedor Sobolevsky, under the supervision Prof. Konstantin Vorontsov, examines LLMs in application to the hierarchical summarization task, which implies summarizing text as a text tree that goes from key points to more specific details. The task is formalized as text tree generation with the goal of minimizing the distance between the generated and the reference summaries. Since this necessitates the use of a metric on the text tree space, a new metric, the text tree edit distance (TTED), is presented. To measure the informativeness of the metric in terms of highlighting significant aspects of text tree distance, a new metric quality factor is proposed, as well as an unbiased estimate of the factor on random tree samples. The experimental evaluation of the TTED metric using the proposed quality factor shows a significant improvement in capturing semantic and structural differences of text trees compared to a previously used similarity score and thus signifies that it can be used for hierarchical summarization scoring.

From Fedor’s Bachelor thesis: distance Estimations Using TTED and Baseline Method.

The Arina Chumachenko’s master’s thesis, under supervision by Prof. Ivan Oseledets, aimed at analysing text-to-image personalization methods like Textual Inversion and DreamBooth achieve high-fidelity concept generation, maintaining optimal balance between identity preservation and prompt adherence remains an unresolved challenge. Their work advances context regularization methods by introducing Gram-based context regularization to improve pose diversity and generation flexibility in synthesized images. A two-stage training strategy incorporating losses from a non-finetuned U-Net model enhances generalization capabilities, while optimizing context attention map regularization, mitigating overfitting and artifacts. Experimental results demonstrate that these contributions collectively improve concept fidelity and textual alignment, enabling more robust and adaptable personalized image generation within diffusion-based frameworks.

From Arina’s Master’s thesis: qualitative comparison of baseline methods (method without any regularizations and CoRe method) and proposed methods for «bear-plushie» concepts.

Optimization

A number of this year’s theses are dedicated to various aspects of optimization methods — ranging from theoretical analysis to practical algorithm design. These works hold strong scientific value and lie at the intersection of optimization theory and foundational machine learning.

The bachelor's thesis of Alexey Rebrikov, supervised by Dr. Aleksandr Beznosikov, investigates the No Full Gradient SARAH algorithm—a variance-reduction method for stochastic optimization that avoids computing full gradients. Theoretical analysis is provided for both convex and non-convex settings, with convergence guarantees under standard smoothness assumptions. Experiments on image classification tasks show that the algorithm maintains competitive accuracy while reducing computational costs. This work advances efficient optimization techniques for large-scale machine learning.

Experiments on the CIFAR-10 dataset. The proposed algorithm is comparable with SGD and outperforms the original SARAH algorithm.

The bachelor's theses of Fanis Khafizov and Vadim Kasiuk, supervised by Dr. Aleksandr Beznosikov, introduce ImpK, a novel family of importance-based compression operators for distributed learning that selects gradient coordinates based on their impact on the optimization objective rather than just magnitude or randomness. They propose multiple variants of ImpK alongside SCAM, a new error compensation mechanism that improves convergence by incorporating accumulated errors into gradient selection. Theoretical analysis establishes linear convergence rates, and experiments demonstrate superior performance compared to existing methods, especially when combined with SCAM. This work advances gradient compression by unifying existing approaches into a generalized framework with strong theoretical and empirical support.

Convergence comparison of the proposed method and baseline algorithms on the CIFAR-10 classification task.

The bachelor's thesis of Denis Rubtsov, supervised by Prof. Alexander Gasnikov, focuses on the convergence of optimization algorithms. While classical stochastic optimization results typically bound the expected number of iterations needed to reach a target accuracy, Denis’s work investigates algorithms that ensure convergence with high probability—limiting the risk of large deviations from the minimum. The thesis develops efficient methods that provide such guarantees under various assumptions on the objective function. The idea of Denis’s thesis builds on the concept of robust distance estimation: given a set of points that are likely to lie near the optimum, it is possible to identify a central point that is very close to the true optimum with high probability. This approach enables the design of optimization algorithms with strong probabilistic convergence guarantees.

From the book of Problem complexity and method efficiency in optimization (Arkadi Nemirovski and David Yudin, 1983): the explanation of the main idea of robust distance estimation.

The bachelor's thesis of Egor Zadvornov, supervised by Dr. Artemii Malkov, investigates the problem of forecasting the distribution of topics in a media stream. It is shown that classical regression forecasting approaches are unstable to the occurrence of irregular cyclic changes with a variable period. The ARFilter method is proposed, which improves the accuracy of regression models by selecting a subset of dictionary words that correlate with the time distribution of the corresponding topics. For the selected words, local forecasts of their normalized shares are built, which are then aggregated into a forecast of the distribution of topics. On synthetic data, the method reduces the forecast error by ten times compared to the basic ARIMA approach. The developed approach does not depend on the choice of a specific autoregressive model and is effective for forecasting highly volatile topic distributions.

Machine learning fundamentals and computational mathematics

A large number of this year’s theses focused on fundamental machine learning and applied mathematics. These works explore core principles and address key challenges in modern machine learning and data science. Each thesis demonstrates strong research potential, offering valuable insights into representation learning, generalization, model interpretability, and the mathematical foundations of learning algorithms.

Eduard Vladimirov’s master’s thesis, supervised by Prof. Vadim Strijov, introduces CaSCA — a linear autoencoder that separates latent states into causal and reconstructive components by jointly optimizing reconstruction and predictive skill. Applied to sensor data, CaSCA reduces multicollinearity, preserves explained variance, and improves forecasting and classification accuracy. The method formalizes causal dimensionality reduction and proves the identifiability of causal drivers up to rotation.

The scheme presents the main idea of CaSCA.

In connection with the mentioned work, the bachelor’s thesis of Altay Eynullayev, supervised by Prof. Vadim Strijov, explores the use of covariance matrices of multidimensional time series to enhance forecasting accuracy. The approach leverages the Riemannian geometry of the space of symmetric positive definite (SPD) matrices to build models that incorporate covariance structure. Additionally, the thesis analyzes these matrices to identify conditions under which the method is most effective.

Translation between objects in covariance matrix space (Ct) and corresponding tangent space (Tc).

Nikita Kornilov’s master’s thesis, supervised by Prof. Alexander Gasnikov, focuses on Flow Matching. Recent advances in Flow Matching for generative modeling increasingly aim to learn straight trajectories for fast inference. However, existing methods often rely on costly iterative optimization or discrete Optimal Transport heuristics. To address this, Nikita proposed Optimal Flow Matching — a novel approach that consistently recovers optimal straight trajectories with a single loss minimization. The method combines probability transformation via ODEs from Flow Matching with straight transports from Optimal Transport. The thesis led to a paper presented at NeurIPS.

From paper: the proposed Optimal Flow Matching obtains exactly straight transport trajectories.

Marat Khusainov’s master's thesis, supervised by Dr. Sergey Samsonov, investigates Generative Flow Networks (GFlowNets)—models designed to sample compositional discrete objects like graphs or strings from distributions defined by unnormalized probability mass functions. The study identifies the dormant neuron phenomenon in GFlowNets, where an increasing number of inactive neurons during training reduces network expressivity. To address this, Marat proposes a simple method: periodically checking for dormant neurons across all layers during training and reinitializing their incoming and outgoing weights. This approach was validated on biochemical tasks, showing improved GFlowNet performance.

The bachelor’s thesis of Gleb Karpeev, supervised by Prof. Vadim Strijov, tackles the problem of forecasting sets of correlated time series. The proposed framework encodes time series into covariance matrices and performs forecasting directly on the Riemannian manifold of symmetric positive definite (SPD) matrices. To model dynamics on this manifold, a Riemannian Score-Based Generative Model (RSGM) is adapted for conditional forecasting. Experiments on synthetic data demonstrate the method’s effectiveness, emphasizing the importance of respecting the data’s underlying geometry.

A schematic illustration of a score-based generative model, which is the basis of Gleb’s work. The forward process gradually adds noise to the data, transforming it into a Gaussian distribution. The reverse process starts from pure noise and reconstructs the original object.

In his bachelor’s thesis, Sergey Firsov, supervised by Dr. Oleg Bakhteev, proposes a hardware-aware Neural Architecture Search method based on DARTS, a well-known method of Neural Architecture search. The approach introduces a complexity vector with per-operation penalties and a hypernetwork that maps it to architecture logits. This enables a single training run to yield a family of models adapted to various hardware constraints.

The illustration of the main idea of the proposed method, here S is a complexity vector sampled from a simplex space.

Muhammadsharif Nabiev’s bachelor's thesis ,supervised by Dr. Oleg Bakhteev, also considers a model selection problem. He explores inductive bias in multitask learning through learned representations. Using a shared encoder–decoder setup, the work applies the Information Bottleneck framework to study the trade-off between compression and relevance. Experiments reveal that optimal models balance high accuracy, strong compression, and low mutual information, forming a Pareto front across these objectives.

A visualization of the discovered Pareto front: all identified models are shown in blue, the Pareto front is highlighted in orange, and the red point marks the true model.

Ivan Papay’s bachelor's thesis, supervised by Prof. Vadim Strijov, addresses ordinal classification with objects described by partially ordered features. He proposes aggregating these partial orders using weighted incidence matrices of order graphs. The solution is found as a projection of response variables onto a superposition of partial order cones. Ivan also introduces an optimal parameter estimation method that improves noise resistance.

The work of Anastasia Linich lies at the intersection of fundamental mathematics and applied machine learning. In her bachelor's thesis, supervised by Prof. Serguei Barannikov, developed interpretable classifiers for evaluating partial Lean 4 proofs using attention maps from DeepSeek-Prover. Lean 4 is a formal proof language used for writing and verifying mathematical theorems with machine assistance. Anastasia proposed two methods — manifold topology divergence and block-end self-attention—which outperformed baseline approaches. The best model achieved $62\%$ accuracy, surpassing zero-shot prompting by $3\%$ .

Prompts used for evaluation Lean 4 proofs.

Vladislav Meshkov bachelor's thesis, under the supervision of Dr. Andrey Grabovoy and Nikita Kiselev, aimed at the analysis of the Hessian of a neural network as an important aspect for understanding the loss landscape and the characteristics of network architecture. The Hessian matrix captures important information about the curvature, sensitivity, and local behavior of the loss function. Their work proposes a method that enhances the understanding of the local behavior of the loss function and can be used to analyze the behavior of neural networks, and also for interpreting the parameters in these networks. In the thesis, the author considers an approach to investigate the properties of the deep neural network, using the Hessian. He proposes a method for estimating the Hessian matrix norm for a specific type of convolutional network. He has obtained the results for both 1D and 2D convolutions, as well as for the fully connected head in these networks. Their empirical analysis supports these findings, demonstrating convergence in the loss function landscape. He has evaluated the Hessian norm for neural networks represented as a product of matrices and considered how this estimate affects the landscape of the loss function.

From paper: Part (a) shows the loss function landscape, which is a surface in the parameter space. Part (b) shows the loss difference. It arises when one more object is added to the dataset. Here we exhibit the behavior for a dimension equal to . Near the minimum, the mean loss value for objects tends to be similar to the same for objects.

Ilya Stepanov’s bachelor's thesis, under the supervision of Dr. Andrey Grabovoy and Andrey Filatov, investigates the problem of data augmentation. Data augmentation is a crucial tool for modern object detection researchers, enabling the expansion of training datasets. However, existing methods are limited as they fail to provide substantial semantic enrichment of data. This may reduce models' generalization capabilities. The study proposes a novel augmentation technique based on semantic object replacement in images. Ilya’s proposed approach enhances training datasets and improves detection model accuracy. Experiments were conducted demonstrating the proposed method's impact on quality metrics mAP50 and mAP50-95, along with a contribution analysis of individual components to these metrics.

From Ilya’s bachelor's thesis: original images.

From Ilya’s bachelor's thesis: generated images in which the dog is changed to a bird and a cow to a sheep.

Conclusion

The Intelligent Systems Department developed a sustainable process for student research, defence, and graduation. It starts immediately after a student joins our Department and meets their scientific advisor. Our advisors are obliged to have a Ph.D. or D.Sc. in physics and mathematics. They create a long-term research plan for students that meets their career goals, accounts for risks, and forecasts responses from the scientific society. During the study, all research projects cross checkpoints with presentations and discussions. We present two of the checkpoints. First, the 2025 student conference, which, along with the Department’s students, gathers students from other universities. Second, the 2025 predefence sessions, where students present their work and receive feedback from the faculty, are available for MS and BS works.

The Intelligent Systems Department ensures that the defended theses are of scientific publication quality. Many of the thesis works are either already published or currently under review at leading machine learning conferences and journals.

Hubs: