This article is based on the experience of developing the memsafe library, which, using the Clang plugin, adds safe memory management and invalidation control of reference data types to C++ during source code compilation.

Compilers *
From source code to machine code
Tree-sitter and Preprocessing: A Syntax Showdown
According to the description,
Tree-sitter is a parser generator tool and an incremental parsing library. It can build a concrete syntax tree for a source file and efficiently update the syntax tree as the source file is edited.
But how does Tree-sitter handle languages that require a preprocessing stage?
Writing an interpreter (virtual machine) for a simple byte-code + JIT compilation
There are two articles on Russian, the author of which writes a virtual machine (interpreter) for executing a simple bytecode and then applies different optimizations to make this virtual machine faster. Besides that, there is a compiler of a simple C-like language into this bytecode. After reading this article and getting familiar with the compiler, I thought that it would be interesting to try writing a virtual machine for this language that would be able to apply JIT-compilation to this bytecode with the libjit library. This article describes the experience of doing that.
I found several articles online that describe the usage of this library, but those that I saw, describe the compilation of concrete programs with libjit, while I was interested in compiling arbitrary bytecode. For people interested in further reading, there is an official titorial, a series of articles and a series of comparisons (in Russian).
The implementation was done in C++ because we aren`t playing games here. All my code is in my repository. The "main" branch has just the interpreter of the PigletVM bytecode; "labels-with-fallbacks" has a partial JIT compilation implementation (that doesn`t support JUMP instructions), "full-jit" has fully working JIT-compilationl; "making-jit-code-faster" makes code generated by JIT work faster and "universal-base-vm*" branches merge the interpreter and JIT-compilation implementations, by implementing a base generalised executor, which can be used for different implementations of PigletVM (both the interpreter and libjit compilation)
DSL (domain-specific language) implementation with macros
This is a translation of my own article
The release of NewLang language with a brand new "feature" is coming, a remodeled version of the preprocessor that allows you to extend the language syntax to create different DSL dialects using macros.
What is it about?
DSL (Subject Oriented Language) is a programming language specialized for a specific application area. It is believed that the use of DSL significantly increases the level of abstractness of the code, and this allows to develop more quickly and efficiently and greatly simplifies the solution of many problems.
Conditionally, we can distinguish two approaches to DSL implementation:
- Development of independent syntax translators using lexer and parser generators to define the grammar of the target language through BNF (Backus–Naur form) and regular expressions (Lex, Yacc, ANTLR, etc.) and then compiling the resulting grammar into machine code.
- Development or integration of the DSL dialect into a general-purpose language (metalanguage), including the use of various libraries or special parsers / preprocessors.
We will talk about the second option, namely the implementation of DSL on the basis of general-purpose languages (metalanguages) and the new implementation of macros in NewLang as the basis for DSL development.
NFun — expression evaluator for .Net

Nfun is an embedded language and expression executor that supports primitive types, arrays, structures and lambda expressions.
Most likely, you have already met tasks that require such a tool, and in this article I want to show examples of its application, its capabilities and why it may be useful to you.
Lexical Analysis in 11l
The task of the lexical analyzer is to split the source code of the program into tokens.
So for example the code
print(1 + 2)
will be tokenized asprint
, (
, 1
, +
, 2
and )
Structured Logging and Interpolated Strings in C# 10
Structured logging is gaining more and more popularity in the developers' community. In this article I'd like to demonstrate how we can use structured logging with the Microsoft.Extensions.Logging package and show the idea how we can extend it using the new features of C# 10.
Detecting errors in the LLVM release 13.0.0
Commercial static analyzers perform deeper and fuller code analysis compared to compilers. Let's see what PVS-Studio found in the source code of the LLVM 13.0.0 project.
Comparing Huawei ExaGear to Apple's Rosetta 2 and Microsoft's solution
November 10, 2020 was in many ways a landmark event in the microprocessor industry: Apple unveiled its new Mac Mini, the main feature of which was the new M1 chip, developed in-house. It is not an exaggeration to say that this processor is a landmark achievement for the ARM ecosystem: finally an ARM architecture chip whose performance surpassed x86 architecture chips from competitors such as Intel, a niche that had been dominated for decades.
But the main interest for us is not the M1 processor itself, but the Rosetta 2 binary translation technology. This allows the user to run legacy x86 software that has not been migrated to the ARM architecture. Apple has a lot of experience in developing binary translation solutions and is a recognized leader in this area. The first version of the Rosetta binary translator appeared in 2006 were it aided Apple in the transition from PowerPC to x86 architecture. Although this time platforms were different from those of 2006, it was obvious that all the experience that Apple engineers had accumulated over the years, was not lost, but used to develop the next version - Rosetta 2.
We were keen to compare this new solution from Apple, a similar product Huawei ExaGear (with its lineage from Eltechs ExaGear) developed by our team. At the same time, we evaluated the performance of binary translation from x86 to Arm provided by Microsoft (part of MS Windows 10 for Arm devices) on the Huawei MateBook E laptop. At present, these are the only other x86 to Arm binary translation solution that we are aware of on the open market.
Is PHP compilable?! PVS-Studio searches for errors in PeachPie

PHP is widely known as an interpreted programming language used mainly for website development. However, few people know that PHP also has a compiler to .NET – PeachPie. But how well is it made? Will the static analyzer be able to find actual bugs in this compiler? Let's find out!
Intermodular analysis of C++ projects in PVS-Studio

Recently PVS-Studio has implemented a major feature—we supported intermodular analysis of C++ projects. This article covers our and other tools' implementations. You'll also find out how to try this feature and what we managed to detect using it.
On commutativity of addition
Let's check out.
Let's write:
__int128 add1(__int128 a, __int128 b) {
return b + a;
}
and compile it with risc-v gcc 8.2.0:
Checking Clang 11 with PVS-Studio

Every now and then, we have to write articles about how we've checked another fresh version of some compiler. That's not really much fun. However, as practice shows, if we stop doing that for a while, folks start doubting whether PVS-Studio is worth its title of a good catcher of bugs and vulnerabilities. What if the new compiler can do that too? Sure, compilers evolve, but so does PVS-Studio – and it proves, again and again, its ability to catch bugs even in high-quality projects such as compilers.
Using Flex (Fast Lexical Analyzer Generator)
It's possible to write a lexer from scratch, but much more convenient to use any lexer generator. If we define some parsing rules, corresponding to an input language syntax, we get a complete lexical analyzer (tokenizer), which can extract tokens from an input program text and pass them to a parser.
One of such generators is Flex. In this article, we'll examine how it works in general, and observe some nontrivial nuances of developing a lexer with Flex.
PVS-Studio is now in Compiler Explorer

Not so long ago, a landmark event has happened: PVS-Studio appeared in Compiler Explorer! Now you can quickly and easily analyze the code for errors right on the godbolt.org site (Compiler Explorer). This feature opens up a large number of new possibilities – from quenching curiosity about the analyzer's abilities to being able to quickly share check results with a friend. This article will cover the topic on how to use these features. Caution – large GIFs!
Checking the GCC 10 Compiler with PVS-Studio

The GCC compiler is written with copious use of macros. Another check of the GCC code using PVS-Studio once again confirms the opinion of our team that macros are evil in the flesh. Not only does the static analyzer struggle with reviewing such code, but also a developer. GCC developers are certainly used to the project and are well versed in it. Nonetheless, it is very difficult to understand something on the third hand. Actually, due to macros, it was not possible to fully perform code checking. However, the PVS-Studio analyzer, as always, showed that it can find errors even in compilers.
Checking the Ark Compiler Recently Made Open-Source by Huawei

During the summer of 2019, Huawei gave a series of presentations announcing the Ark Compiler technology. The company claims that this open-source project will help developers make the Android system and third-party software much more fluent and responsive. By tradition, every new promising open-source project goes through PVS-Studio for us to evaluate the quality of its code.
Introduction
The Ark Compiler was first announced by Huawei at the launch of the new smartphone models P30 and P30 Pro. It is claimed that the Ark Compiler will improve the fluency of the Android system by 24% and response speed by 44%. Third-party Android applications will also gain a 60% speed-up after recompilation with the Ark Compiler. The open-source version of the project is called OpenArkCompiler; its source code is available on Gitee, a Chinese fork of GitHub.
Why LLVM may call a never called function?
I don’t care what your dragon’s said, it’s a lie. Dragons lie. You don’t know what’s waiting for you on the other side.This article is based on the post in the Krister Walfridsson’s blog, “Why undefined behavior may call a never called function?”.
Michael Swanwick, The Iron Dragon’s Daughter
The article draws a simple conclusion: undefined behavior in a compiler can do anything, even something absolutely unexpected. In this article, I examine the internal mechanism of this optimization works.
Finding Bugs in LLVM 8 with PVS-Studio

It's been two years since we last checked the code of the LLVM project with PVS-Studio, so let's see if PVS-Studio is still the leader among tools for detecting bugs and security weaknesses. We'll do that by scanning the LLVM 8.0.0 release for new bugs.
Checking the Roslyn Source Code

Once in a while we go back to the projects that we have previously checked using PVS-Studio, which results in their descriptions in various articles. Two reasons make these comebacks exciting for us. Firstly, the opportunity to assess the progress of our analyzer. Secondly, monitoring the feedback of the project's authors to our article and the report of errors, which we usually provide them with. Of course, errors can be corrected without our participation. However, it is always nice when our efforts help to make a project better. Roslyn was no exception. The previous article about this project check dates back to December 23, 2015. It's quite a long time, in the view of the progress that our analyzer has made since that time. Since the C# core of the PVS-Studio analyzer is based on Roslyn, it gives us additional interest in this project. As a result, we're as keen as mustard about the code quality of this project. Now let's test it once again and find out some new and interesting issues (but let's hope that nothing significant) that PVS-Studio will be able to find.