IntelliJ IDEA: Structural Search and Replace

Original author: Viktor Verbitsky
  • Translation
  • Tutorial

Modern IDEs are very powerful tools that can help developers in all kinds of situations. Unfortunately, much of this power is often lost because many functions remain unknown to developers, hiding in the shadows.


Simple example of the one of the such functions

Did you know that when you press F2 in IntelliJ IDEA, the cursor will jump to the nearest error in the file? And in the absence of an error – to the nearest warning? It seems that this is a secret only a few people know about.


Structural search and replace is one such pair of features. They can be extremely useful in situations where a whole variety of other functions can’t quite get the job done.


In this post, I will present some of these situations and go beyond artificial cases by demonstrating examples of real code from two projects:


  1. 3D-engine for game development, jMonkeyEngine, which is an example of a big, interesting project.
  2. My own pet project, plantuml-native-image, where I experiment with compiling PlantUML into native executable code using GraalVM Native Image.

In fact, it is this second project that encouraged me to write this post but I’m getting ahead of myself. First things first...


A simple task


Before we start looking at Structural Search, let’s consider some simple tasks where this search could be useful. Here is an example of one of my recent tasks, using a revision of a jMonkeyEngine project as the code for demonstration (rather than closed source code). This task requires me to search for open lock objects using the synchronized keyword (see "Item 82 – Document thread safety", from chapter 11, "Concurrency", in Joshua Bloch’s Effective Java).


The point is that using synchronization for objects that are publicly available is not a great idea. In this case, control over synchronization is lost and third-party code may start interfering with it, which could lead to undesirable effects and eventually to deadlocks.


It is important to bear in mind that synchronized keyword has two use cases:


As a method modifier:


class ClassA {
    public synchronized void someMethod() {
        // ...
    }
}

And as an internal method structure:


class ClassA {
    public void someMethod() {
        synchronized(this) {
            // ...
        }
    }
}

In fact, these two examples demonstrate how synchronization works in an open object. It would be correct to write the following code:


class ClassA {
    private final Object sync = new Object();
    public void someMethod() {
        synchronized(sync) {
            // ...
        }
    }
}

In this example, no third-party code can interfere with synchronization.


But how can you tell if there is such a pattern in the project code?


The easiest way is to do a full-text search for the synchronized keyword and carefully analyze each occurrence of it. But such an approach is only good for small projects. If we try the same search in jMonkeyEngine, we’ll find an overwhelming 117 occurrences. The word “synchronized” appears not only as the structure we’re looking for, but also in comments and text strings. And it can be rather tedious to deal with so many occurrences.


So what can be done? This is where Structural Search and Replace in IntelliJ IDEA help.


Structural Search: the fundamentals


First, let’s look at the structural search action in IntelliJ IDEA.


Open the jMonkeyEngine project and call the Structural Search window (Edit -> Find -> Search Structurally...), which has two areas:




  1. Search template text area.
  2. Filter setting area.

So how is this different from a typical search?


In contrast to a regular search, where we just look for the occurrence of some substring (directly or via regular expression), in this search we look for a structural template in the programming or markup language (up to HTML).


In general, if you want to see examples of different searches you can run using this tool, you can use an expandable catalog of examples. To do this, you just need to open the menu under the wrench icon and choose the Existing Templates... item:




A window with a wealth of ready templates opens:




This catalog contains a lot of examples that are useful for both studying and practical use. But we’ve decided we want to look specifically for synchronization points.


Let’s begin with the case where the synchronized keyword is used as a method modifier, because in this case synchronization occurs based on a public object, simply by design. We’ll enter the following structure into the left field:


synchronized $type$ $method$ ($ptype$ $param$) {
    $statement$;
}

Here we are defining a search pattern. We are not searching for an exact match, but rather a set of syntax constructions that meet the criterion.


You can see many tokens in this template limited by $ symbols at the front and at the back. These are template variables. If the token is set without $ symbols (like synchronized, for instance), it should be present in the discovered code snippet as-is. And if the token is inside the dollar signs, then by default anything can be in that position. It doesn’t matter what is there exactly.


The names of the variables inside the dollars signs can be arbitrary. They can be used later on for setting additional criteria in the right window area if necessary.


So, if we look at the template in an abstract way, we can understand that this is a search for an arbitrary method definition, with some return type (including void), some single parameter, and a single line in the method body. And above all, this method is synchronizable.


But what if we want to find all the synchronizable methods with an arbitrary quantity of parameters (including methods without parameters) and with an arbitrary compound body?


This is where template variable filters come into play. These are set in the right part of the window.


I must admit, it took me quite a while to understand how to use these filters, until I figured out that they are context-dependent based on the input cursor position in the search template text area.


For instance, if we highlight variable $param$ on the left, then on the right we may set the criterion for the number of parameter occurrences in the function declaration. For this purpose, it will be enough to click the Add Filter hyperlink and select the second menu item, Count.




In the filter line that appears, we want to specify that the number of parameters can be from 0 to infinity, so we leave the second field empty:




Now the template will search for all the synchronizable methods with an arbitrary number of parameters, but with only one line of code in the body. We set exactly the same zero-to-infinity limitation for the $statement$ variable in order to correct this.


That’s it. Now you can press Find and see all the synchronizable methods in the project, which is 18 altogether:




This is an amazing start!


Down the rabbit hole: Script Filter


What about using synchronization with arbitrary objects? At first glance, this seems to be even simpler:


synchronized($Obj$) {
    $statement$;
}

But the problem is how to define that the $Obj$ variable is private? We can see from the pattern shown in the book that this should be a variable of the Object type. Then we can add a Type filter and set the Object class type name, provided that the conformance is strict, without regard to hierarchy (i.e. with no ticks in the checkboxes below the filter):




This allows us to find the places where synchronization uses closed objects (but this still won’t find all of them!). This, in turn, will give us the opposite result of what we require. In some particular cases, this could be useful if we need to find all synchronizations for objects of a certain type, and even for descendants of this type (if we tick the checkbox with type hierarchy below the filter).


But in general, we would like to find the cases when a non-private object of any type is used.


Unfortunately, there is no simple way to find those. The only thing that can help us is the Script type constraint — the most powerful, but also the most complicated, structural search function.


The point is that arbitrary code can be written as a Groovy function that returns true or false, and variables from the search pattern defined in $ symbols will be used as parameters along with a pair of service variables: __context__ and __log__.


This is the part I find the most upsetting. These variables represent the objects of PSI parse tree elements, yet at the same time:


  • The script entry field (all of a sudden) does not support IntelliSense.
  • It is almost impossible to guess exactly which PSI tree element will turn out to be the variable.
  • There is no reference to the PSI tree structure; the only way to understand what I can to do with the PSI is the sources of PSI itself.
  • The general usability of the filer is awful. You simply have to accept that it contracts when the input focus is lost and that you need to expand the field each time and change the size of the elements to see the full script.

So what do we need to do? Let’s get into it. First of all, we need to understand what value appears in the $Obj$ variable. The best option I came up with was to use the regular Groovy function: println. We add the Script type filter to the $Obj$ variable and enter the following text:


println(Obj)
return true



As we can see, names of variables can be used in the script, but without dollar signs. While performing this search, we will see all the occurrences of synchronized in the code. However, we are not interested in this, we want to see the logs that were printed using println.


And they were printed to the log of IntelliJ IDEA itself. You can find it via the menu: Help->Show Log in.... As I have KDE, the full name of the menu item is Show Log in Dolphin. As a result, the system file manager will open, specifying the actual log file. This is where we should look to find information about the object of interest to us. In this case, we can find the following lines:


2020-07-05 15:03:00,998 [14151177] INFO - STDOUT - PsiReferenceExpression:pending
2020-07-05 15:03:01,199 [14151378] INFO - STDOUT - PsiReferenceExpression:source
2020-07-05 15:03:01,216 [14151395] INFO - STDOUT - PsiThisExpression:this
2020-07-05 15:03:01,219 [14151398] INFO - STDOUT - PsiReferenceExpression:receiveObjectLock
2020-07-05 15:03:01,222 [14151401] INFO - STDOUT - PsiReferenceExpression:invoke
2020-07-05 15:03:01,226 [14151405] INFO - STDOUT - PsiReferenceExpression:chatServer
2020-07-05 15:03:01,231 [14151410] INFO - STDOUT - PsiReferenceExpression:obj
2020-07-05 15:03:01,236 [14151415] INFO - STDOUT - PsiReferenceExpression:sync
2020-07-05 15:03:01,242 [14151421] INFO - STDOUT - PsiReferenceExpression:image
2020-07-05 15:03:01,377 [14151556] INFO - STDOUT - PsiClassObjectAccessExpression:TerrainExecutorService.class
2020-07-05 15:03:01,409 [14151588] INFO - STDOUT - PsiReferenceExpression:byteBuffer
2020-07-05 15:03:01,429 [14151608] INFO - STDOUT - PsiReferenceExpression:lock
2020-07-05 15:03:01,432 [14151611] INFO - STDOUT - PsiReferenceExpression:eventQueue
2020-07-05 15:03:01,456 [14151635] INFO - STDOUT - PsiReferenceExpression:sensorData.valuesLock
2020-07-05 15:03:01,593 [14151772] INFO - STDOUT - PsiReferenceExpression:createdLock
2020-07-05 15:03:01,614 [14151793] INFO - STDOUT - PsiReferenceExpression:taskLock
2020-07-05 15:03:01,757 [14151936] INFO - STDOUT - PsiReferenceExpression:loaders
2020-07-05 15:03:01,765 [14151944] INFO - STDOUT - PsiReferenceExpression:threadLock

So, we see that objects of at least three types can serve as the Obj value:


  • PsiThisExpression — token this;
  • PsiClassObjectAccessExpression — synchronization by object of type Class (synchronized (TerrainExecutorService.class) {...});
  • PsiReferenceExpression — some expression whose result is used as the synchronization object.

The first two types can be determined automatically as open object synchronization. This means that, if we have Obj as the object of a PsiThisExpression or PsiClassObjectAccessExpression type, true should be returned.


But what can be done with the PsiReferenceExpression type?


Unfortunately, the only way to look for an answer to this question is to address the sources. As the Java parser from JetBrains is open and published on GitHub within IntelliJ IDEA sources, there is no reason not to open it up for a little look. The class we’re interested in is located
here.


I won’t trouble you with the details of rummaging through PSI sources. I will just show the resulting script:


if (Obj instanceof com.intellij.psi.PsiThisExpression) return true;
if (Obj instanceof com.intellij.psi.PsiClassObjectAccessExpression) return true;
if (Obj instanceof com.intellij.psi.PsiReferenceExpression) {
    def var = Obj.advancedResolve(false).element;
    if (var instanceof com.intellij.psi.PsiParameter) return true;
    if (var instanceof com.intellij.psi.PsiLocalVariable) {
        return !(var.initializer instanceof com.intellij.psi.PsiNewExpression);
    }
    if (var instanceof com.intellij.psi.PsiField) {
        return !var.hasModifier(com.intellij.lang.jvm.JvmModifier.PRIVATE) &&
               !var.hasModifier(com.intellij.lang.jvm.JvmModifier.PROTECTED);
    }
}
return true;

This search yields only 12 occurrences. The result obtained is not ideal, as some of these occurrences are false positives. But I think it is enough for this example. However, we can see that the Psi structure is very powerful and allows us to work with the code structure at a very high level of abstraction.


Your first static code analysis inspection


As we saw a little earlier, looking some things up can turn out to be quite challenging. And it is frustrating to see so much effort going to waste. In general, it would be great for the IDE to give us an immediate prompt when we write something wrong.


This is another area where IntelliJ IDEA can help. It has plenty of inspections to help developers write more correct code by highlighting incorrect parts and explaining what is wrong. There, hidden in plain sight, is one remarkable inspection that is disabled by default. This is the Structural search inspection.


You can find this inspection in the Settings window (File->Settings...->Editor->Inspections):




Enable this inspection and press the plus icon to add the structural search template. The results of your last search will appear in the search window by default. Press the OK button and enter the name of the search template you need, for instance, Open object sync (or something more descriptive).


That’s it. Now IntelliJ IDEA will start automatically highlighting all the parts of the code that fall within this search:




Presto!!! You’ve created your very first code inspection ever! Congrats! All that is left now is to commit it together with the project so that others can use it too. To do this, you just have to add the .idea/inspectionProfiles directory where these settings are stored in the version control system.


Structural Replace


Just like there is a structural search action, there is a structural replace function in IntelliJ IDEA
(Edit -> Find -> Replace Structurally...):




In contrast to the Search Structurally window, one more area appears in this window when you open it. This allows you to set the replacement for parts of the project that match the template. As in the template, the replacement can contain variables limited by dollar signs on both sides. These variables can be set up similarly in the filter entry field. But, in this case, these are not going to be filtration terms. Instead, they are exclusively Groovy scripts for computing replacement text instead of the variable.


I also would like to provide a practical example here. I had to replace all calls for the functions of one kind:


classInitializationSupport.initializeAtRunTime(WindowPropertyGetter.class, AWT_SUPPORT);

in a massive class with calls of all kinds:


classInitializationSupport.initializeAtRunTime("sun.awt.X11.WindowPropertyGetter", AWT_SUPPORT);

I had to remove the explicit use of classes in the code, and later I had to do the same with respect to several kinds of functions.


The last thing I wanted was to try to do it all manually. So instead I used Replace Structurally. First, I specified the search template:


classInitializationSupport.initializeAtRunTime($Clazz$.class, AWT_SUPPORT)

I imposed no limitations for the $Clazz$ variable, as I did not care which class would appear there.


Then I set the replacement:


classInitializationSupport.initializeAtRunTime("$FullClass$", AWT_SUPPORT)

And this is where I needed to calculate the new value of the $FullClass$ variable. For this, I highlighted it with the cursor and set the following script in the limitations entry field:


Clazz.type.canonicalText

This script allows us to take the type that falls within the variable from the $Clazz$ search template, obtain the full name of this type, and insert into the method parameter as a line.


The resulting Structural Replace window looks as follows:




Then we press Find and get a list of the possible replacements:




Here we can view every occurrence and what it turns into (using Preview Replacement). We can also exclude any occurrences (from the context menu) or make transformations in one go (with Replace All).


Ultimately, once you’ve got a feel for PSI, it’s not that difficult.


Structural Replace as Intention


Now it is time to bring up another powerful IntelliJ IDEA tool – Intentions. Intentions allow us to use code transformations immediately at the cursor location.


For instance, if you write code that looks like:


public static void main(String[] args) {
    List<Integer> list = Arrays.asList(0, 10, 20, 30);
    for (int i = 0; i < list.size(); i++) {
        System.out.println(list.get(i));
    }
}

and then place the cursor on for and press Alt+Enter, you will receive the suggestion to replace this for loop with an alternative implementation. For instance, to transform it into a for each loop:


public static void main(String[] args) {
    List<Integer> list = Arrays.asList(0, 10, 20, 30);
    for (Integer integer : list) {
        System.out.println(integer);
    }
}

This means that the same intention can be implemented with Structural Replace. To do this, you can just expand on the same inspection that we set for the Structural Search. But we should specify when adding a new item that it is not going to be a search but rather a replacement:






Now the respective areas in the code will be highlighted and autocorrection will be suggested there:






Congratulations once again! You've made the first Intention in your life!


Conclusion


In this post, I tried to help you familiarize yourself with one of the most complex, but at the same time one of the most powerful, functions in IntelliJ IDEA. Yes, not everything about this function is convenient right now. However, there are cases where what you need to do is not possible by any other means.


I haven’t described these functions in full detail. Loads of aspects remain beyond the scope of this post. But I hope this has provided enough of a foundation for you to get started and perhaps think about diving deeper for yourself.


Good luck!


P.S. Great thanks to Anton antonarhipov Arhipov for awesome help with this translation!

НПО Криста
Мы ж программисты, кодим для страны, греем сервера

Comments 0

Only users with full accounts can post comments. Log in, please.