SvyatoslavMC Aug 19 2019 at 06:21

CMake: the Case when the Project's Quality is Unforgivable

11 min

PVS-Studio corporate blogC*C++*Open source*Build automation*

CMake is a cross-platform system for automating project builds. This system is much older than the PVS-Studio static code analyzer, but no one has tried to apply the analyzer on its code and review the errors. As it turned out, there are a lot of them. The CMake audience is huge. New projects start on it and old ones are ported. I shudder to think of how many developers could have had any given error.

Introduction

CMake is a cross-platform system for automating software building from source code. CMake isn't meant directly for building, it only generates files to control a build from CMakeLists.txt files. The first release of the program took place in 2000. For comparison, the PVS-Studio analyzer appeared only in 2008. At that time, it was aimed at searching for bugs resulted from porting 32-bit systems to 64-bit ones. In 2010, the first set of general purpose diagnostics appeared (V501-V545). By the way, the CMake code has a few warnings from this first set.

Unforgivable Errors

V1040 Possible typo in the spelling of a pre-defined macro name. The '__MINGW32_' macro is similar to '__MINGW32__'. winapi.h 4112

/* from winternl.h */
#if !defined(__UNICODE_STRING_DEFINED) && defined(__MINGW32_)
#define __UNICODE_STRING_DEFINED
#endif

The V1040 diagnostic was implemented not so long ago. Most likely, at the time of posting the article, it won't be released yet, nevertheless, we already found a cool error with its help.

There's a typo made in the name __MINGW32_. At the end, one underline character is missing. If you search the code with this name, you can see that the version with two underline characters on both sides is used in the project:

V531 It is odd that a sizeof() operator is multiplied by sizeof(). cmGlobalVisualStudioGenerator.cxx 558

bool IsVisualStudioMacrosFileRegistered(const std::string& macrosFile,
                                        const std::string& regKeyBase,
                                        std::string& nextAvailableSubKeyName)
{
  ....
  if (ERROR_SUCCESS == result) {
    wchar_t subkeyname[256];                                           // <=
    DWORD cch_subkeyname = sizeof(subkeyname) * sizeof(subkeyname[0]); // <=
    wchar_t keyclass[256];
    DWORD cch_keyclass = sizeof(keyclass) * sizeof(keyclass[0]);
    FILETIME lastWriteTime;
    lastWriteTime.dwHighDateTime = 0;
    lastWriteTime.dwLowDateTime = 0;

    while (ERROR_SUCCESS ==
           RegEnumKeyExW(hkey, index, subkeyname, &cch_subkeyname, 0, keyclass,
                         &cch_keyclass, &lastWriteTime)) {
    ....
  }
  ....
}

For a statically declared array, the sizeof operator will calculate size in bytes, taking into account the number of elements and their size. When evaluating the value of the cch_subkeyname variable, a developer didn't take it into account and got a value 4 times greater than intended. Let's explain where «four times» come from.

The array and its wrong size is passed to the function RegEnumKeyExW:

LSTATUS RegEnumKeyExW(
  HKEY      hKey,
  DWORD     dwIndex,
  LPWSTR    lpName,    // <= subkeyname
  LPDWORD   lpcchName, // <= cch_subkeyname
  LPDWORD   lpReserved,
  LPWSTR    lpClass,
  LPDWORD   lpcchClass,
  PFILETIME lpftLastWriteTime
);

The lpcchName pointer must point to the variable, containing the buffer size in characters: «A pointer to a variable that specifies the size of the buffer specified by the lpClass parameter, in characters». The subkeyname array size is 512 bytes and can store 256 characters of the wchar_t type (in Windows, wchar_t is 2 bytes). It is 256 that should be passed to the function. Instead, 512 is multiplied by 2 and we get 1024.

I think, it's clear now how to correct this error. You need to use division instead of multiplication:

DWORD cch_subkeyname = sizeof(subkeyname) / sizeof(subkeyname[0]);

By the way, the same error occurs when evaluating the value of the cch_keyclass variable.

The error described can potentially lead to buffer overflow. All such fragments definitely have to be corrected:

V531 It is odd that a sizeof() operator is multiplied by sizeof(). cmGlobalVisualStudioGenerator.cxx 556
V531 It is odd that a sizeof() operator is multiplied by sizeof(). cmGlobalVisualStudioGenerator.cxx 572
V531 It is odd that a sizeof() operator is multiplied by sizeof(). cmGlobalVisualStudioGenerator.cxx 621
V531 It is odd that a sizeof() operator is multiplied by sizeof(). cmGlobalVisualStudioGenerator.cxx 622
V531 It is odd that a sizeof() operator is multiplied by sizeof(). cmGlobalVisualStudioGenerator.cxx 649

V595 The 'this->BuildFileStream' pointer was utilized before it was verified against nullptr. Check lines: 133, 134. cmMakefileTargetGenerator.cxx 133

void cmMakefileTargetGenerator::CreateRuleFile()
{
  ....
  this->BuildFileStream->SetCopyIfDifferent(true);
  if (!this->BuildFileStream) {
    return;
  }
  ....
}

The pointer this->BuildFileStream is dereferenced right before the check for its validity. Didn't that cause any problems for anyone? Below there is another example of such snippet. It's made just like a carbon copy. But in fact, there are a lot of V595 warnings and most of them are not so obvious. From my experience, I can say that correcting warnings of this diagnostic takes the longest time.

V595 The 'this->FlagFileStream' pointer was utilized before it was verified against nullptr. Check lines: 303, 304. cmMakefileTargetGenerator.cxx 303

V614 Uninitialized pointer 'str' used. cmVSSetupHelper.h 80

class SmartBSTR
{
public:
  SmartBSTR() { str = NULL; }
  SmartBSTR(const SmartBSTR& src)
  {
    if (src.str != NULL) {
      str = ::SysAllocStringByteLen((char*)str, ::SysStringByteLen(str));
    } else {
      str = ::SysAllocStringByteLen(NULL, 0);
    }
  }
  ....
private:
  BSTR str;
};

The analyzer detected usage of the uninitialized str pointer. It appeared due to an ordinary typo. When calling the SysAllocStringByteLen function, one should have used the src.str pointer.

V557 Array overrun is possible. The value of 'lensymbol' index could reach 28. archive_read_support_format_rar.c 2749

static int64_t
expand(struct archive_read *a, int64_t end)
{
  ....
  if ((lensymbol = read_next_symbol(a, &rar->lengthcode)) < 0)
    goto bad_data;
  if (lensymbol > (int)(sizeof(lengthbases)/sizeof(lengthbases[0])))
    goto bad_data;
  if (lensymbol > (int)(sizeof(lengthbits)/sizeof(lengthbits[0])))
    goto bad_data;
  len = lengthbases[lensymbol] + 2;
  if (lengthbits[lensymbol] > 0) {
    if (!rar_br_read_ahead(a, br, lengthbits[lensymbol]))
      goto truncated_data;
    len += rar_br_bits(br, lengthbits[lensymbol]);
    rar_br_consume(br, lengthbits[lensymbol]);
  }
  ....
}

This piece of code hides several problems at once. When accessing lengthbases and lengthbits arrays, an array index might go out of bounds, as developers wrote the '>' operator instead of '>=' above. This check began to miss one unacceptable value. Here we have nothing but a classic error pattern called Off-by-one Error.

Here's the entire list of array access operations by a non-valid index:

V557 Array overrun is possible. The value of 'lensymbol' index could reach 28. archive_read_support_format_rar.c 2750
V557 Array overrun is possible. The value of 'lensymbol' index could reach 28. archive_read_support_format_rar.c 2751
V557 Array overrun is possible. The value of 'lensymbol' index could reach 28. archive_read_support_format_rar.c 2753
V557 Array overrun is possible. The value of 'lensymbol' index could reach 28. archive_read_support_format_rar.c 2754
V557 Array overrun is possible. The value of 'offssymbol' index could reach 60. archive_read_support_format_rar.c 2797

Memory Leak

V773 The function was exited without releasing the 'testRun' pointer. A memory leak is possible. cmCTestMultiProcessHandler.cxx 193

void cmCTestMultiProcessHandler::FinishTestProcess(cmCTestRunTest* runner,
                                                   bool started)
{
  ....
  delete runner;
  if (started) {
    this->StartNextTests();
  }
}

bool cmCTestMultiProcessHandler::StartTestProcess(int test)
{
  ....
  cmCTestRunTest* testRun = new cmCTestRunTest(*this);    // <=
  ....
  if (testRun->StartTest(this->Completed, this->Total)) {
      return true;                                        // <=
    }
  }

  this->FinishTestProcess(testRun, false);                // <=
  return false;
}

The analyzer detected a memory leak. The memory by the testRun pointer isn't released, if the function testRun->StartTest returns true. When executing another code branch, this memory gets released in the function this-> FinishTestProcess.

Resource Leak

V773 The function was exited without closing the file referenced by the 'fd' handle. A resource leak is possible. rhash.c 450

RHASH_API int rhash_file(....)
{
  FILE* fd;
  rhash ctx;
  int res;

  hash_id &= RHASH_ALL_HASHES;
  if (hash_id == 0) {
    errno = EINVAL;
    return -1;
  }

  if ((fd = fopen(filepath, "rb")) == NULL) return -1;

  if ((ctx = rhash_init(hash_id)) == NULL) return -1;  // <= fclose(fd); ???

  res = rhash_file_update(ctx, fd);
  fclose(fd);

  rhash_final(ctx, result);
  rhash_free(ctx);
  return res;
}

Strange Logic in Conditions

V590 Consider inspecting the '* s != '\0' && * s == ' '' expression. The expression is excessive or contains a misprint. archive_cmdline.c 76

static ssize_t
get_argument(struct archive_string *as, const char *p)
{
  const char *s = p;

  archive_string_empty(as);

  /* Skip beginning space characters. */
  while (*s != '\0' && *s == ' ')
    s++;
  ....
}

*s character comparison with null is redundant. The condition of the while loop depends only on whether the character is equal to a space or not. This is not an error, but an unnecessary complication of the code.

V592 The expression was enclosed by parentheses twice: ((expression)). One pair of parentheses is unnecessary or misprint is present. cmCTestTestHandler.cxx 899

void cmCTestTestHandler::ComputeTestListForRerunFailed()
{
  this->ExpandTestsToRunInformationForRerunFailed();

  ListOfTests finalList;
  int cnt = 0;
  for (cmCTestTestProperties& tp : this->TestList) {
    cnt++;

    // if this test is not in our list of tests to run, then skip it.
    if ((!this->TestsToRun.empty() &&
         std::find(this->TestsToRun.begin(), this->TestsToRun.end(), cnt) ==
           this->TestsToRun.end())) {
      continue;
    }

    tp.Index = cnt;
    finalList.push_back(tp);
  }
  ....
}

The analyzer warns that the negation operation probably should be taken out of brackets. It seems that there is no such a bug here — just unnecessary double brackets. But most likely, there is a logic error in the code.

The continue operator is executed only in the case if the list of tests this->TestsToRun isn't empty and cnt is absent in it. It is reasonable to assume that if the tests list is empty, the same action needs to take place. Most probably, the condition should be as follows:

if (this->TestsToRun.empty() ||
    std::find(this->TestsToRun.begin(), this->TestsToRun.end(), cnt) ==
      this->TestsToRun.end()) {
  continue;
}

V592 The expression was enclosed by parentheses twice: ((expression)). One pair of parentheses is unnecessary or misprint is present. cmMessageCommand.cxx 73

bool cmMessageCommand::InitialPass(std::vector<std::string> const& args,
                                   cmExecutionStatus&)
{
  ....
  } else if (*i == "DEPRECATION") {
    if (this->Makefile->IsOn("CMAKE_ERROR_DEPRECATED")) {
      fatal = true;
      type = MessageType::DEPRECATION_ERROR;
      level = cmake::LogLevel::LOG_ERROR;
    } else if ((!this->Makefile->IsSet("CMAKE_WARN_DEPRECATED") ||
                this->Makefile->IsOn("CMAKE_WARN_DEPRECATED"))) {
      type = MessageType::DEPRECATION_WARNING;
      level = cmake::LogLevel::LOG_WARNING;
    } else {
      return true;
    }
    ++i;
  }
  ....
}

It's a similar example, but this time I'm more confident that an error takes place. The function IsSet(«CMAKE_WARN_DEPRECATED») checks that the value CMAKE_WARN_DEPRECATED is set globally, and the function IsOn(«CMAKE_WARN_DEPRECATED») checks that the value is set in the project configuration. Most likely, the complementary operator is redundant, as in both cases, it's correct to set same values of type and level.

V728 An excessive check can be simplified. The '(A && !B) || (!A && B)' expression is equivalent to the 'bool(A) != bool(B)' expression. cmCTestRunTest.cxx 151

bool cmCTestRunTest::EndTest(size_t completed, size_t total, bool started)
{
  ....
  } else if ((success && !this->TestProperties->WillFail) ||
(!success && this->TestProperties->WillFail)) {
    this->TestResult.Status = cmCTestTestHandler::COMPLETED;
    outputStream << "   Passed  ";
  }
  ....
}

This code can be simpler. One can rewrite the conditional expression in the following way:

} else if (success != this->TestProperties->WillFail)
{
    this->TestResult.Status = cmCTestTestHandler::COMPLETED;
    outputStream << "   Passed  ";
}

A few more places to simplify:

V728 An excessive check can be simplified. The '(A && B) || (!A && !B)' expression is equivalent to the 'bool(A) == bool(B)' expression. cmCTestTestHandler.cxx 702
V728 An excessive check can be simplified. The '(A && !B) || (!A && B)' expression is equivalent to the 'bool(A) != bool(B)' expression. digest_sspi.c 443
V728 An excessive check can be simplified. The '(A && !B) || (!A && B)' expression is equivalent to the 'bool(A) != bool(B)' expression. tcp.c 1295
V728 An excessive check can be simplified. The '(A && !B) || (!A && B)' expression is equivalent to the 'bool(A) != bool(B)' expression. testDynamicLoader.cxx 58
V728 An excessive check can be simplified. The '(A && !B) || (!A && B)' expression is equivalent to the 'bool(A) != bool(B)' expression. testDynamicLoader.cxx 65
V728 An excessive check can be simplified. The '(A && !B) || (!A && B)' expression is equivalent to the 'bool(A) != bool(B)' expression. testDynamicLoader.cxx 72

Various Warnings

V523 The 'then' statement is equivalent to the subsequent code fragment. archive_read_support_format_ar.c 415

static int
_ar_read_header(struct archive_read *a, struct archive_entry *entry,
  struct ar *ar, const char *h, size_t *unconsumed)
{
  ....
  /*
   * "__.SYMDEF" is a BSD archive symbol table.
   */
  if (strcmp(filename, "__.SYMDEF") == 0) {
    archive_entry_copy_pathname(entry, filename);
    /* Parse the time, owner, mode, size fields. */
    return (ar_parse_common_header(ar, entry, h));
  }

  /*
   * Otherwise, this is a standard entry.  The filename
   * has already been trimmed as much as possible, based
   * on our current knowledge of the format.
   */
  archive_entry_copy_pathname(entry, filename);
  return (ar_parse_common_header(ar, entry, h));
}

The expression in the last condition is similar to the last two lines of the function. A developer can simplify this code by removing the condition, or there is an error in the code and it should be fixed.

V535 The variable 'i' is being used for this loop and for the outer loop. Check lines: 2220, 2241. multi.c 2241

static CURLMcode singlesocket(struct Curl_multi *multi,
                              struct Curl_easy *data)
{
  ....
  for(i = 0; (i< MAX_SOCKSPEREASYHANDLE) &&                           // <=
        (curraction & (GETSOCK_READSOCK(i) | GETSOCK_WRITESOCK(i)));
      i++) {
    unsigned int action = CURL_POLL_NONE;
    unsigned int prevaction = 0;
    unsigned int comboaction;
    bool sincebefore = FALSE;

    s = socks[i];

    /* get it from the hash */
    entry = sh_getentry(&multi->sockhash, s);

    if(curraction & GETSOCK_READSOCK(i))
      action |= CURL_POLL_IN;
    if(curraction & GETSOCK_WRITESOCK(i))
      action |= CURL_POLL_OUT;

    actions[i] = action;
    if(entry) {
      /* check if new for this transfer */
      for(i = 0; i< data->numsocks; i++) {                            // <=
        if(s == data->sockets[i]) {
          prevaction = data->actions[i];
          sincebefore = TRUE;
          break;
        }
      }
    }
  ....
}

The i variable is used as a loop counter in the outer and inner loops. At the same time, the value of the counter again begins from zero in the inner loop. It might not be a bug here, but the code is suspicious.

V519 The 'tagString' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 84, 86. cmCPackLog.cxx 86

void cmCPackLog::Log(int tag, const char* file, int line, const char* msg,
                     size_t length)
{
  ....
  if (tag & LOG_OUTPUT) {
    output = true;
    display = true;
    if (needTagString) {
      if (!tagString.empty()) {
        tagString += ",";
      }
      tagString = "VERBOSE";
    }
  }
  if (tag & LOG_WARNING) {
    warning = true;
    display = true;
    if (needTagString) {
      if (!tagString.empty()) {
        tagString += ",";
      }
      tagString = "WARNING";
    }
  }
  ....
}

The tagString variable is overwritten with a new value in all places. It's hard to say what's the issue or why they did it. Perhaps, the '=' and '+=' operators were muddled.

The entire list of such places:

V519 The 'tagString' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 94, 96. cmCPackLog.cxx 96
V519 The 'tagString' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 104, 106. cmCPackLog.cxx 106
V519 The 'tagString' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 114, 116. cmCPackLog.cxx 116
V519 The 'tagString' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 125, 127. cmCPackLog.cxx 127

V519 The 'aes->aes_set' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 4052, 4054. archive_string.c 4054

int
archive_mstring_copy_utf8(struct archive_mstring *aes, const char *utf8)
{
  if (utf8 == NULL) {
    aes->aes_set = 0;            // <=
  }
  aes->aes_set = AES_SET_UTF8;   // <=
  ....
  return (int)strlen(utf8);
}

Forced setting of the AES_SET_UTF8 value looks suspicious. I think such code will confuse any developer, who comes to refining this fragment.

This code was copied to another place:

V519 The 'aes->aes_set' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 4066, 4068. archive_string.c 4068

How to Find Bugs in a Project on CMake

In this section, I'll briefly tell you how to check CMake projects with PVS-Studio as easy as one-two-three.

Windows/Visual Studio

For Visual Studio, you can generate a project file using CMake GUI or the following command:

cmake -G "Visual Studio 15 2017 Win64" ..

Next, you can open the .sln file and check the project using the plugin for Visual Studio.

Linux/macOS

The file compile_commands.json is used for checks on these systems. By the way, it can be generated in different build systems. This is how you do it in CMake:

cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=On ..

The last thing to do is run the analyzer in the directory with the .json file:

pvs-studio-analyzer analyze -l /path/to/PVS-Studio.lic
  -o /path/to/project.log -e /path/to/exclude-path -j<N>

We have also developed a module for CMake projects. Some people like using it. CMake module and examples of its usage can be found in our repository on GitHub: pvs-studio-cmake-examples.

Conclusion

A huge audience of CMake users is great to test the project, but many issues could be prevented before the release by using static code analysis tools, such as PVS-Studio.

If you liked the analyzer results, but your project isn't written in C and C++, I'd like to remind that the analyzer also supports analysis of projects in C# and Java. You can test the analyzer on your project by going to this page.

Hubs: