NLazeba Jun 1 2020 at 08:41

NSA, Ghidra, and Unicorns

12 min

968

PVS-Studio corporate blogJava*Open source*Reverse engineering*

This time, the PVS-Studio team's attention was attracted by Ghidra, a big bad reverse-engineering framework allowing developers to analyze binary files and do horrible things to them. The most remarkable fact about it is not even that it's free and easily extensible with plugins but that it was developed and uploaded to GitHub for public access by NSA. On the one hand, you bet NSA has enough resources for keeping their code base clean. On the other hand, new contributors, who are not well familiar with it, may have accidentally introduced bugs that could stay unnoticed. So, we decided to feed the project to our static analyzer and see if it has any code issues.

Overture

PVS-Studio issued a total of 651 high-level, 904 medium-level, and 909 low-level warnings on the Java part of Ghidra (release 9.1.2, commit 687ce7f). About half of the high-level and medium-level ones were "V6022 Parameter is not used inside method's body" warnings, which typically show up after refactoring, when some parameter becomes no longer needed or some feature is temporarily commented out. A quick look through these warnings (there are too many of them to closely examine each as a third-party reviewer) didn't reveal anything outright suspicious. I guess it would be OK to turn off this diagnostic for the project in the analyzer's settings at least temporarily so that it doesn't distract developers. In general practice, though, you don't want to neglect it because setters' or constructors' parameters often have typos in their names. I'm sure most of you have seen an annoying pattern like this at least once:

public class A {
  private String value;
  public A(String val) { // V6022
    this.value = value;
  }
  public int hashCode() {
    return value.hashCode(); // NullPointerException
  }
}

More than half of the low-level warnings were produced by the "V6008 Potential null dereference of 'variable'" diagnostic – for example, the value returned by File.getParentFile() is often used without a prior null check. If the file object the method is called on was constructed without the absolute path, the method will return null, which may cause the application to crash if the check is missing.

As usual, I'll be discussing only high-level and medium-level warnings since they usually reveal the major portion of actual bugs found in the project. When you deal with analysis reports, we recommend always going through the warnings in descending order of their certainty.

Below I'll share several analyzer-reported code snippets, which I found suspicious or interesting. The large size of Ghidra's code base makes it almost impossible to find these defects manually.

Snippet 1: broken validation

private boolean parseDataTypeTextEntry()
throws InvalidDataTypeException {
  ...
  try {
    newDataType = parser.parse(selectionField.getText(),
                               getDataTypeRootForCurrentText());
  }
  catch (CancelledException e) {
    return false;
  }
  if (newDataType != null) {
    if (maxSize >= 0
        && newDataType.getLength() > newDataType.getLength()) { // <=
      throw new InvalidDataTypeException("data-type larger than "
                                         + maxSize + " bytes");
    }
    selectionField.setSelectedValue(newDataType);
    return true;
  }
  return false;
}

PVS-Studio warning: V6001 There are identical sub-expressions 'newDataType.getLength()' to the left and to the right of the '>' operator. DataTypeSelectionEditor.java:366

This class provides a UI component for selecting data types with autocompletion support. The developer can specify the maximum size of the selectable data type (using the maxSize field) or make it unlimited by specifying a negative value. Should the limit be exceeded, the check of input data is supposed to throw an exception, which will be caught further up in the call stack, with the user getting an error message accordingly.

It seems the author of this code got distracted right in the middle of writing that check – or maybe they just started meditating upon the purpose of life and all. Whatever it was, they ended up with broken validation: the condition checks if the value can be greater than itself, and since it never can, the check is ignored. And that means the component could provide invalid data.

Another similar mistake was found in two other classes: GuidUtil and NewGuid.

public class GuidUtil {
  ...
  public static GuidInfo parseLine(...) {
    ...
    long[] data = new long[4];
    ...
    if (isOK(data)) {
      if (!hasVersion) {
        return new GuidInfo(guidString, name, guidType);
      }
      return new VersionedGuidInfo(guidString, version, name, guidType);
    }
    return null;
  }
  ...
  private static boolean isOK(long[] data) {
    for (int i = 0; i < data.length; i++) {
      if ((data[i] != 0) || (data[i] != 0xFFFFFFFFL)) { // <=
        return true;
      }
    }
    return false;
  }
  ...
}

PVS-Studio warning: V6007 Expression 'data[i] != 0xFFFFFFFFL' is always true. GuidUtil.java:200

The condition in the for loop of the isOK method checks if a value is not equal to two different values at the same time. If it's true, the GUID will be acknowledged as valid right away. Thus, the GUID will be acknowledged as invalid only when the data array is empty, which will never be the case because the variable in question gets its value only once – at the beginning of the parseLine method.

The isOK method has the same body in both classes, which makes me think it's just another case of cloning incorrect code using copy-paste. I'm not sure what exactly the author wanted to check, but I'd say this method should be fixed as follows:

private static boolean isOK(long[] data) {
  for (int i = 0; i < data.length; i++) {
    if ((data[i] == 0) || (data[i] == 0xFFFFFFFFL)) {
      return false;
    }
  }
  return true;
}

Snippet 2: hiding exceptions

public void putByte(long offsetInMemBlock, byte b)
throws MemoryAccessException, IOException {
  long offsetInSubBlock = offsetInMemBlock - subBlockOffset;
  try {
    if (ioPending) {
      new MemoryAccessException("Cyclic Access"); // <=
    }
    ioPending = true;
    doPutByte(mappedAddress.addNoWrap(offsetInSubBlock / 8),
              (int) (offsetInSubBlock % 8), b);
  }
  catch (AddressOverflowException e) {
    new MemoryAccessException("No memory at address"); // <=
  }
  finally {
    ioPending = false;
  }
}

PVS-Studio warning: V6006 The object was created but it is not being used. The 'throw' keyword could be missing: 'new MemoryAccessException(«Cyclic Access»)'. BitMappedSubMemoryBlock.java:99

As is widely known, exception objects don't do anything by themselves (at least they shouldn't). New instances are almost always thrown using the throw statement, and in certain rare cases they are passed somewhere or stored in a collection.

The class the method above belongs to is a wrapper over a memory block allowing the developer to read and write data. Since the method doesn't throw exceptions, the restriction imposed on accessing the current memory block using the ioPending flag may get violated, and, furthermore, AddressOverflowException is ignored. That's how the data may get spoiled and you'll never know about it; instead of getting an explicit error message in a particular spot, you'll get strange artifacts that you'll have to debug.

There was a total of eight missing exceptions like that:

BitMappedSubMemoryBlock.java: lines 77, 99, 106, 122
ByteMappedSubMemoryBlock.java: lines 52, 73, 92, 114

Curiously, those files also contain methods that look very much like the method above but do have throw in them. I suspect the programmer made a mistake in the original method, similar to the one discussed, cloned it a few times, and eventually discovered the mistake and fixed every clone of the method that they could remember of.

Snippet 3: minefield

private void processSelection(OptionsTreeNode selectedNode) {
  if (selectedNode == null) {
    setViewPanel(defaultPanel, selectedNode); // <=
    return;
  }
  ...
}
private void setViewPanel(JComponent component, OptionsTreeNode selectedNode) {
  ...
  setHelpLocation(component, selectedNode);
  ...
}
private void setHelpLocation(JComponent component, OptionsTreeNode node) {
  Options options = node.getOptions();
  ...
}

PVS-Studio warning: V6008 Null dereference of 'selectedNode' in function 'setViewPanel'. OptionsPanel.java:266

The analyzer isn't entirely correct about this one: the call of the processSelection method won't currently lead to NullPointerException as this method is called only twice, with an explicit null check of selectedNode before the call. That's not a good thing to do because another developer may assume that since the method explicitly handles the selectedNode == null case, this value should be valid and use it, potentially ending up with a crash. This is particularly true for open-source projects because contributing developers may not be well familiar with the code base.

Actually, the entire processSelection method looks strange. This might well be a copy-paste error because you can also find two if blocks with identical bodies but different conditions further down in this same method. However, selectedNode won't be null by that time and the entire call chain setViewPanel-setHelpLocation won't lead to a NullPointerException.

Snippet 4: harmful autocompletion

public static final int[] UNSUPPORTED_OPCODES_LIST = { ... };
public static final Set<Integer> UNSUPPORTED_OPCODES = new HashSet<>();

static {
  for (int opcode : UNSUPPORTED_OPCODES) {
    UNSUPPORTED_OPCODES.add(opcode);
  }
}

PVS-Studio warning: V6053 The collection is modified while the iteration is in progress. ConcurrentModificationException may occur. DWARFExpressionOpCodes.java:205

Again, the analyzer is not completely right: the exception will not be thrown because the UNSUPPORTED_OPCODES collection is always empty and the loop just won't execute at all. Besides, this collection happens to be a set, which means adding an already existing element won't affect it. The programmer must have entered the collection name in the for-each loop using autocompletion, which suggested the wrong field, with the programmer never noticing that. A collection can't be modified while it is being iterated, but if you are lucky enough – as in this case – the application won't crash. In this snippet, the typo affects the program indirectly: the DWARF file parser relies on the collection to stop analysis when encountering unsupported opcodes.

Starting with Java 9, it's better to use factory methods of the standard library to create constant collections: for example, Set.of(T… elements) is not only much more convenient to use but also makes the newly created collection immutable from the start, thus enhancing reliability.

Snippet 5: all shall be found

public void setValueAt(Object aValue, int row, int column) {
  ...
  int index = indexOf(newName);
  if (index >= 0) {                  // <=
    Window window = tool.getActiveWindow();
    Msg.showInfo(getClass(), window, "Duplicate Name",
                 "Name already exists: " + newName);
    return;
  }

  ExternalPath path = paths.get(row); // <=
  ...
}
private int indexOf(String name) {
  for (int i = 0; i < paths.size(); i++) {
    ExternalPath path = paths.get(i);
    if (path.getName().equals(name)) {
      return i;
    }
  }
  return 0;
}

PVS-Studio warnings:

V6007 Expression 'index >= 0' is always true. ExternalNamesTableModel.java:105
V6019 Unreachable code detected. It is possible that an error is present. ExternalNamesTableModel.java:109

Something distracted the developer, and they accidentally implemented the indexOf method in such a way that it returns 0, i.e. the index of the first element of the paths collection, instead of -1 for a non-existent value. This will happen even if the collection is empty. Or maybe they generated the method but forgot to change the default return value. Anyway, the setValueAt method will refuse any offered value and show the message «Name already exists» even if there's not a single name in the collection.

By the way, the indexOf method is not used anywhere else, and its value is actually needed only to determine if the sought element exists. Rather than writing a separate method and playing around with indexes, it would probably be better to write a for-each loop right in the setValueAt method and have it return when encountering the matching element.

Note: I didn't manage to reproduce the assumed bug. Perhaps the setValueAt method is no longer used or is called only under certain conditions.

Snippet 6: quiet please

final static Map<Character, String> DELIMITER_NAME_MAP = new HashMap<>(20);
// Any non-alphanumeric char can be used as a delimiter.
static {
  DELIMITER_NAME_MAP.put(' ', "Space");
  DELIMITER_NAME_MAP.put('~', "Tilde");
  DELIMITER_NAME_MAP.put('`', "Back quote");
  DELIMITER_NAME_MAP.put('@', "Exclamation point");
  DELIMITER_NAME_MAP.put('@', "At sign");
  DELIMITER_NAME_MAP.put('#', "Pound sign");
  DELIMITER_NAME_MAP.put('$', "Dollar sign");
  DELIMITER_NAME_MAP.put('%', "Percent sign");
  ...
}

PVS-Studio warning: V6033 An item with the same key '@' has already been added. FilterOptions.java:45

Ghidra supports data filtering in different contexts. For example, you can filter project files by name. Filtering by several keywords at a time is also supported – specifying '.java,.c' with 'OR' mode enabled will display all files whose names contain either '.java' or '.c'. Theoretically, you can use any special character as a separator (it's selected in the filter settings), but in reality the exclamation point is not available.

It's very easy to make a typo in long initialization lists like that because they are often created using copy-paste, and your attention weakens quickly when reviewing such code manually. If the duplicate lines are not adjacent, there's almost no chance of noticing the typo through manual review.

Snippet 7: remainder is always 0

void setFactorys(FieldFactory[] fieldFactorys,
                 DataFormatModel dataModel, int margin) {
  factorys = new FieldFactory[fieldFactorys.length];

  int x = margin;
  int defaultGroupSizeSpace = 1;
  for (int i = 0; i < factorys.length; i++) {
    factorys[i] = fieldFactorys[i];
    factorys[i].setStartX(x);
    x += factorys[i].getWidth();
    // add in space between groups
    if (((i + 1) % defaultGroupSizeSpace) == 0) { // <=
      x += margin * dataModel.getUnitDelimiterSize();
    }
  }
  width = x - margin * dataModel.getUnitDelimiterSize() + margin;
  layoutChanged();
}

PVS-Studio warnings:

V6007 Expression '((i + 1) % defaultGroupSizeSpace) == 0' is always true. ByteViewerLayoutModel.java:66
V6048 This expression can be simplified. Operand 'defaultGroupSizeSpace' in the operation equals 1. ByteViewerLayoutModel.java:66

The hex byte viewer allows selecting the length of groups to be displayed: for example, you can set the data output format as 'ffff ffff' or 'ff ff ff ff'. The setFactorys method is responsible for arranging these groups in the UI. While both the customization and representation are fine here, the loop in this method doesn't look right at all: the remainder of division by one is always 0, which means the x coordinate will be increased at each iteration. The fact that the dataModel parameter has a groupSize property makes it even more suspicious.

Are these some refactoring leftovers? Or maybe some calculations of the defaultGroupSizeSpace variable are missing? In any case, attempting to replace its value with dataModel.getGroupSize() results in the broken layout, and only the author of this code could tell us what it's all about.

Snippet 8: broken validation, part 2

private String parseArrayDimensions(String datatype,
                                    List<Integer> arrayDimensions) {
  String dataTypeName = datatype;
  boolean zeroLengthArray = false;
  while (dataTypeName.endsWith("]")) {
    if (zeroLengthArray) {                   // <=
      return null; // only last dimension may be 0
    }
    int rBracketPos = dataTypeName.lastIndexOf(']');
    int lBracketPos = dataTypeName.lastIndexOf('[');
    if (lBracketPos < 0) {
      return null;
    }
    int dimension;
    try {
      dimension = Integer.parseInt(dataTypeName.substring(lBracketPos + 1,
                                                          rBracketPos));
      if (dimension < 0) {
        return null; // invalid dimension
      }
    }
    catch (NumberFormatException e) {
      return null;
    }
    dataTypeName = dataTypeName.substring(0, lBracketPos).trim();
    arrayDimensions.add(dimension);
  }
  return dataTypeName;
}

PVS-Studio warning: V6007 Expression 'zeroLengthArray' is always false. PdbDataTypeParser.java:278

This method parses the dimensions of a multi-dimensional array and returns either the remaining text or null for invalid data. The comment next to one of the validity checks says only the last dimension read may be 0. The parsing runs from right to left, which means '[0][1][2]' is valid input text and '[2][1][0]' is not.

But, unfortunately, the loop has no condition to check if the read dimension is 0, so the parser consumes invalid data without any questions. I think it should be fixed by modifying the try block as follows:

try {
  dimension = Integer.parseInt(dataTypeName.substring(lBracketPos + 1,
                                                      rBracketPos));
  if (dimension < 0) {
    return null; // invalid dimension
  } else if (dimension == 0) {
    zeroLengthArray = true;
  }
}
catch (NumberFormatException e) {
  return null;
}

Sure, this validity criteria may have eventually turned out to be irrelevant or the programmer meant something else by their comment and it's the first read dimension that should be checked. In any case, data validation is a critical spot of any application, and you must take it seriously. Mistakes in it may cause both relatively harmless crashes and severe security holes, data leaks, data corruption, or data loss (overlooking an SQL injection during request validation is one such example).

A few words about other warnings

You may have noticed that while there were many warnings issued, I've actually shared just a few of them. The cloc utility, which I didn't bother to fine tune, counted about 1.25 million lines of Java code (blank lines and comments excluded). It's just that almost all of these warnings look very much the same: some deal with missing null checks, others with legacy code no longer needed but still present. I don't want to bore you by showing you the same types of bugs over and over again, and I did mention some of them in the beginning.

To show you one last example, let's talk about half a hundred "V6009 Function receives an odd argument" warnings in the context of unsafe usage of the substring method (CParserUtils.java:280, ComplexName.java:48, etc.) to get the part of a string following a separator. Developers often hope for the separator to be present in the string and forget that indexOf will return -1 otherwise, which is an incorrect value for substring. Of course, if the data has been validated or doesn't come from «the outside», the program will be much less likely to crash. But generally, these defects are potentially unsafe and we want to help eliminate them.

Conclusion

The overall quality of Ghidra is pretty high – there are no outright screwups, which is cool. The code is mostly well formatted and the style is quite consistent: most of the variables, methods, and other entities have intelligible names, certain details are cleared up through comments, and the number of tests is huge.

Of course, there were some issues, including the following:

Dead-code fragments, which must be leftovers from multiple refactoring sessions;
Many of the javadocs are long obsolete and, for example, point at non-existent parameters;
Convenient development using IntelliJ IDEA is unsupported;
The modular system built around reflection makes project navigation and component dependency search much, much harder.

Please don't disregard developer tools. Static analysis, just like seat belts, is not a cure-all, but it helps prevent some of the accidents before the release. After all, no one likes using bug-teeming software, do they?

Check our blog for other posts about projects we have checked. We also offer a trial license and a handful of options of using the analyzer without paying for it.

Hubs: