Static code analysis of the PMDK library collection by Intel and errors that are not actual errors / Habr

We were asked to check a collection of open source PMDK libraries for developing and debugging applications with NVRAM support by PVS-Studio. Well, why not? Moreover, this is a small project in C and C++ with a total code base size of about 170 KLOC without comments. Which means, the results review won't take much energy and time. Let's go.

The PVS-Studio 7.08 tool will be used to analyze the source code. Of course, readers of our blog have long been familiar with our tool, so I won't focus on it. For those who have visited us for the first time, I suggest you refer to the article "How to quickly check out interesting warnings given by the PVS-Studio analyzer for C and C++ code?" and try the free trial version of the analyzer.

This time I will take a look inside the PMDK project and tell you about the errors and shortcomings that I've noticed. My inner feeling was telling me there weren't many of them, which indicates a high quality of the project code. As for some peculiar things, I found several fragments of incorrect code, which nevertheless was working correctly :). What I mean will become clearer from the rest of the story.

So PMDK is a collection of open source libraries and tools designed to simplify the development, debugging, and management of applications that support NVRAM. Check out more details here: PMDK Introduction. The source code is available here: pmdk.

Let's see what errors and shortcomings I can find in it. I must say straight away that I wasn't always attentive when analyzing the report and could have missed a lot. Therefore, I urge the authors of the project not to be guided by this article when correcting defects, but to double-check the code themselves. As for me, to write the article, it will be enough to cite what I noted while viewing the list of warnings :).

Incorrect code that works

Size of allocated memory

Programmers often spend time debugging code when the program doesn't behave as it should. However, sometimes there are cases when the program works correctly, but the code contains an error. The programmer just got lucky, and the error doesn't reveal itself. In the PMDK project, I stumbled upon several such interesting cases, so I decided to gather them together in a separate section.

int main(int argc, char *argv[])
{
  ....
  struct pool *pop = malloc(sizeof(pop));
  ....
}

PVS-Studio warning: V568 It's odd that 'sizeof()' operator evaluates the size of a pointer to a class, but not the size of the 'pop' class object. util_ctl.c 717

A classic typo due to which the wrong amount of memory is allocated. The sizeof operator will return the size of the pointer to the structure instead of the size of this structure. The correct version is:

struct pool *pop = malloc(sizeof(pool));

struct pool *pop = malloc(sizeof(*pop));

However, this incorrectly written code works fine. The fact is that the pool structure contains exactly one pointer:

struct pool {
  struct ctl *ctl;
};

It turns out that the structure takes exactly as much space as the pointer. So that's all right.

String length

Let's move on to the next case where an error was made again using the sizeof operator.

typedef void *(*pmem2_memcpy_fn)(void *pmemdest, const void *src, size_t len,
    unsigned flags);

static const char *initial_state = "No code.";

static int
test_rwx_prot_map_priv_do_execute(const struct test_case *tc,
  int argc, char *argv[])
{
  ....
  char *addr_map = pmem2_map_get_address(map);
  map->memcpy_fn(addr_map, initial_state, sizeof(initial_state), 0);
  ....
}

PVS-Studio warning: V579 [CWE-687] The memcpy_fn function receives the pointer and its size as arguments. It is possibly a mistake. Inspect the third argument. pmem2_map_prot.c 513

To copy a string, a pointer to a special copy function is used. Note the call to this function, or rather its third argument.

The programmer assumes that the sizeof operator will calculate the size of the string literal. But, in fact, it is the size of the pointer that is calculated again.

The lucky thing is that the string consists of 8 characters, and its size matches the size of the pointer if the 64-bit application is being built. As a result, all 8 characters of the string "No code." will be copied successfully.

In fact, the situation is even more complicated and intriguing. The interpretation of this error depends on whether the author wanted to copy the terminal null or not. Let's consider two scenarios.

Scenario 1. Terminal null had to be copied. This way, I'm wrong and this is not just a harmless bug that doesn't manifest itself. Only 8 bytes were copied, not 9 bytes. There is no terminal null, and the consequences can't be predicted. In this case, one can correct the code by changing the definition of the initial_state constant string as follows:

static const char initial_state [] = "No code.";

Now the value of sizeof(initial_state) is 9.

Scenario 2. Terminal null is not required at all. For example, you can see this line of code below:

UT_ASSERTeq(memcmp(addr_map, initial_state, strlen(initial_state)), 0);

As you can see, the strlen function returns 8 and terminal null is not involved in the comparison. Then it's really good luck and all is well.

Bitwise shift

The following example is related to the bitwise shift operation.

static int
clo_parse_single_uint(struct benchmark_clo *clo, const char *arg, void *ptr)
{
  ....
  uint64_t tmax = ~0 >> (64 - 8 * clo->type_uint.size);
  ....
}

PVS-Studio warning: V610 [CWE-758] Unspecified behavior. Check the shift operator '>>'. The left operand '~0' is negative. clo.cpp 205

The result of shifting the negative value to the right depends on the compiler implementation. Therefore, although this code may work correctly and expectedly under all currently existing application compilation modes, it is still a piece of luck.

Operation precedence

And let's look at the last case related to the operation precedence.

#define BTT_CREATE_DEF_SIZE  (20 * 1UL << 20) /* 20 MB */

PVS-Studio warning: V634 [CWE-783] The priority of the '*' operation is higher than that of the '<<' operation. It's possible that parentheses should be used in the expression. bttcreate.c 204

To get a constant equal to 20 MB, the programmer decided to follow these steps:

Shifted 1 by 20 bits to get the value 1048576, i.e. 1 MB.
Multiplied 1 MB by 20.

In other words, the programmer thinks that the calculations occur like this: (20 * (1UL << 20)).

But in fact, the priority of the multiplication operator is higher than the priority of the shift operator and the expression is calculated like this: ((20 * 1UL) << 20).

Agree it is unlikely that the programmer wanted the expression to be calculated in such a sequence. There is no point in multiplying 20 by 1. So this is the case where the code doesn't work the way the programmer intended.

But this error won't manifest itself in any way. It doesn't matter how to write it:

(20 * 1UL << 20)
(20 * (1UL << 20))
((20 * 1UL) << 20)

The result is always the same! The desired value 20971520 is always obtained and the program works perfectly correctly.

Other errors

Parentheses in the wrong place

#define STATUS_INFO_LENGTH_MISMATCH 0xc0000004

static void
enum_handles(int op)
{
  ....
  NTSTATUS status;
  while ((status = NtQuerySystemInformation(
      SystemExtendedHandleInformation,
      hndl_info, hi_size, &req_size)
        == STATUS_INFO_LENGTH_MISMATCH)) {
    hi_size = req_size + 4096;
    hndl_info = (PSYSTEM_HANDLE_INFORMATION_EX)REALLOC(hndl_info,
        hi_size);
  }
  UT_ASSERT(status >= 0);
  ....
}

PVS-Studio warning: V593 [CWE-783] Consider reviewing the expression of the 'A = B == C' kind. The expression is calculated as follows: 'A = (B == C)'. ut.c 641

Take a careful look here:

while ((status = NtQuerySystemInformation(....) == STATUS_INFO_LENGTH_MISMATCH))

The programmer wanted to store the value returned from the NtQuerySystemInformation function in the status variable and then compare it with a constant.

The programmer probably knew that the priority of the comparison operator (==) is higher than that of the assignment operator (=), and therefore parentheses should be used. But probably made a mistake and put them in the wrong place. As a result, parentheses don't help in any way. Correct code:

while ((status = NtQuerySystemInformation(....)) == STATUS_INFO_LENGTH_MISMATCH)

Because of this error, the UT_ASSERT macro will never work. After all, the status variable always contains the result of comparison, i.e. false (0) or true (1). So the condition ([0..1] >= 0) is always true.

Potential memory leak

static enum pocli_ret
pocli_args_obj_root(struct pocli_ctx *ctx, char *in, PMEMoid **oidp)
{
  char *input = strdup(in);
  if (!input)
    return POCLI_ERR_MALLOC;

  if (!oidp)
    return POCLI_ERR_PARS;
  ....
}

PVS-Studio warning: V773 [CWE-401] The function was exited without releasing the 'input' pointer. A memory leak is possible. pmemobjcli.c 238

If oidp turns out to be a null pointer, the copy of the string created by calling the strdup function will be lost. It is best to postpone the check until memory is allocated:

static enum pocli_ret
pocli_args_obj_root(struct pocli_ctx *ctx, char *in, PMEMoid **oidp)
{
  if (!oidp)
    return POCLI_ERR_PARS;

  char *input = strdup(in);
  if (!input)
    return POCLI_ERR_MALLOC;
  ....
}

Or one can explicitly free up memory:

static enum pocli_ret
pocli_args_obj_root(struct pocli_ctx *ctx, char *in, PMEMoid **oidp)
{
  char *input = strdup(in);
  if (!input)
    return POCLI_ERR_MALLOC;

  if (!oidp)
  {
    free(input);
    return POCLI_ERR_PARS;
  }
  ....
}

Potential overflow

typedef long long os_off_t;

void
do_memcpy(...., int dest_off, ....., size_t mapped_len, .....)
{
  ....
  LSEEK(fd, (os_off_t)(dest_off + (int)(mapped_len / 2)), SEEK_SET);
  ....
}

PVS-Studio warning: V1028 [CWE-190] Possible overflow. Consider casting operands, not the result. memcpy_common.c 62

Explicit casting the addition result to the os_off_t type doesn't make sense. First, this doesn't protect against the potential overflow that can occur when two int values are added together. Second, the result of addition would have been perfectly extended to the os_off_t type implicitly. Explicit type casting is simply redundant.

I think it would be more correct to write this way:

LSEEK(fd, dest_off + (os_off_t)(mapped_len) / 2, SEEK_SET);

Here an unsigned value of the size_t type is converted to a signed value (to avoid a warning from the compiler). At the same time, overflow won't occur when adding.

Incorrect protection against overflow

static DWORD
get_rel_wait(const struct timespec *abstime)
{
  struct __timeb64 t;
  _ftime64_s(&t);
  time_t now_ms = t.time * 1000 + t.millitm;
  time_t ms = (time_t)(abstime->tv_sec * 1000 +
    abstime->tv_nsec / 1000000);

  DWORD rel_wait = (DWORD)(ms - now_ms);

  return rel_wait < 0 ? 0 : rel_wait;
}

PVS-Studio warning: V547 [CWE-570] Expression 'rel_wait < 0' is always false. Unsigned type value is never < 0. os_thread_windows.c 359

It's not very clear to me what is this case which the check should protect us from. Anyway, the check doesn't work. The rel_wait variable is of the DWORD unsigned type. This means that rel_wait < 0 doesn't make sense, since the result is always true.

Missing check that memory was successfully allocated

Checking that memory is allocated is performed using assert macros, which do nothing if the Release version of the application is compiled. So we can say that there is no handling of the situation when malloc calls return NULL. Example:

static void
remove_extra_node(TOID(struct tree_map_node) *node)
{
  ....
  unsigned char *new_key = (unsigned char *)malloc(new_key_size);
  assert(new_key != NULL);
  memcpy(new_key, D_RO(tmp)->key, D_RO(tmp)->key_size);
  ....
}

PVS-Studio warning: V575 [CWE-628] The potential null pointer is passed into 'memcpy' function. Inspect the first argument. Check lines: 340, 338. rtree_map.c 340

There is even no assert in other places:

static void
calc_pi_mt(void)
{
  ....
  HANDLE *workers = (HANDLE *) malloc(sizeof(HANDLE) * pending);
  for (i = 0; i < pending; ++i) {
    workers[i] = CreateThread(NULL, 0, calc_pi,
      &tasks[i], 0, NULL);
    if (workers[i] == NULL)
      break;
  }
  ....
}

PVS-Studio warning: V522 [CWE-690] There might be dereferencing of a potential null pointer 'workers'. Check lines: 126, 124. pi.c 126

I counted at least 37 of such code fragments. So I don't see the point in listing all of them in the article.

At a first glance, the lack of checks can be considered self-indulgence and smelly code. I don't go along with this point of view. Programmers underestimate the danger of missing such checks. A null pointer won't necessarily immediately manifest itself as a crash when dereferencing. The consequences can be more bizarre and dangerous, especially in multithreaded programs. To understand more about what is happening and why checks are needed, I strongly recommend that everyone read the article "Why it is important to check what the malloc function returned".

Code smell

Double call of CloseHandle

static void
prepare_map(struct pmem2_map **map_ptr,
  struct pmem2_config *cfg, struct pmem2_source *src)
{
  ....
  HANDLE mh = CreateFileMapping(....);
  ....
  UT_ASSERTne(CloseHandle(mh), 0);
  ....
}

PVS-Studio warning: V586 [CWE-675] The 'CloseHandle' function is called twice for deallocation of the same resource. pmem2_map.c 76

Looking at this code and the PVS-Studio warning, it is clear that nothing is clear. Where is double call of CloseHandle possible here? To find the answer, let's look at the implementation of the UT_ASSERTne macro.

#define UT_ASSERTne(lhs, rhs)\
  do {\
    /* See comment in UT_ASSERT. */\
    if (__builtin_constant_p(lhs) && __builtin_constant_p(rhs))\
      UT_ASSERT_COMPILE_ERROR_ON((lhs) != (rhs));\
    UT_ASSERTne_rt(lhs, rhs);\
  } while (0)

It didn't get much clearer. What is UT_ASSERT_COMPILE_ERROR_ON? What is UT_ASSERTne_rt?

I'm not going to clutter the article with description of each macro and torture a reader by forcing to nest one macro into another in their head. Let's look at the final version of the expanded code from the preprocessed file.

do {
  if (0 && 0) (void)((CloseHandle(mh)) != (0));
  ((void)(((CloseHandle(mh)) != (0)) ||
    (ut_fatal(".....", 76, __FUNCTION__, "......: %s (0x%llx) != %s (0x%llx)",
              "CloseHandle(mh)", (unsigned long long)(CloseHandle(mh)), "0",
              (unsigned long long)(0)), 0))); } while (0);

Let's delete the always false condition 0 && 0) and every part that's irrelevant. Here's what we get:

((void)(((CloseHandle(mh)) != (0)) ||
  (ut_fatal(...., "assertion failure: %s (0x%llx) != %s (0x%llx)",
            ....., (unsigned long long)(CloseHandle(mh)), .... ), 0)));

The handle is closed. If an error occurs, a debugging message is generated and CloseHandle is called for the same incorrect handle to get the error code again.

There seems to be no mistake. Once the handle is invalid, it's okay that the CloseHandle function is called twice for it. However, this code has a smell, indeed. It would be more ideologically correct to call the function only once and save the status that it returned, so that if necessary, it can display its value in the message.

The mismatch between the interface of the implementation (constness dropping)

static int
status_push(PMEMpoolcheck *ppc, struct check_status *st, uint32_t question)
{
  ....
  } else {
    status_msg_info_and_question(st->msg);            // <=
    st->question = question;
    ppc->result = CHECK_RESULT_ASK_QUESTIONS;
    st->answer = PMEMPOOL_CHECK_ANSWER_EMPTY;
    PMDK_TAILQ_INSERT_TAIL(&ppc->data->questions, st, next);
  }
  ....
}

The analyzer issues the message: V530 [CWE-252] The return value of function 'status_msg_info_and_question' is required to be utilized. check_util.c 293

The reason is that the status_msg_info_and_question function, from the analyzer's point of view, doesn't change the state of objects external to it, including the passed constant string. In other words, the function just counts something and returns the result. And if so, it is strange not to use the result that this function returns. Although the analyzer is wrong this time, it points to the code smell. Let's see how the called status_msg_info_and_question function works.

static inline int
status_msg_info_and_question(const char *msg)
{
  char *sep = strchr(msg, MSG_SEPARATOR);
  if (sep) {
    *sep = ' ';
    return 0;
  }
  return -1;
}

When calling the strchr function, constness is implicitly cast away. The fact is that in C it is declared as follows:

char * strchr ( const char *, int );

Not the best solution. But the C language is the way it is :).

The analyzer got confused and didn't get that the passed string was actually being changed. If this is the case, then the return value is not the most important one and you don't need to use it.

However, even though the analyzer got confused, it points to a code smell. What confuses the analyzer can also confuse the person who maintains the code. It would be better to declare the function more honestly by removing const:

static inline int
status_msg_info_and_question(char *msg)
{
  char *sep = strchr(msg, MSG_SEPARATOR);
  if (sep) {
    *sep = ' ';
    return 0;
  }
  return -1;
}

This way the intent is immediately clear, and the analyzer will be silent.

Overcomplicated code

static struct memory_block
heap_coalesce(struct palloc_heap *heap,
  const struct memory_block *blocks[], int n)
{
  struct memory_block ret = MEMORY_BLOCK_NONE;

  const struct memory_block *b = NULL;
  ret.size_idx = 0;
  for (int i = 0; i < n; ++i) {
    if (blocks[i] == NULL)
      continue;
    b = b ? b : blocks[i];
    ret.size_idx += blocks[i] ? blocks[i]->size_idx : 0;
  }
  ....
}

PVS-Studio warning: V547 [CWE-571] Expression 'blocks[i]' is always true. heap.c 1054

If blocks[i] == NULL, the continue statement executes and the loop starts the next iteration. Therefore, rechecking the blocks[i]] element doesn't make sense and the ternary operator is unnecessary. The code can be simplified:

....
for (int i = 0; i < n; ++i) {
  if (blocks[i] == NULL)
    continue;
  b = b ? b : blocks[i];
  ret.size_idx += blocks[i]->size_idx;
}
....

Suspicious use of a null pointer

void win_mmap_fini(void)
{
  ....
  if (mt->BaseAddress != NULL)
    UnmapViewOfFile(mt->BaseAddress);
  size_t release_size =
    (char *)mt->EndAddress - (char *)mt->BaseAddress;
  void *release_addr = (char *)mt->BaseAddress + mt->FileLen;
  mmap_unreserve(release_addr, release_size - mt->FileLen);
  ....
}

PVS-Studio warning: V1004 [CWE-119] The '(char *) mt->BaseAddress' pointer was used unsafely after it was verified against nullptr. Check lines: 226, 235. win_mmap.c 235

The mt->BaseAddress pointer can be null, as shown by the check:

if (mt->BaseAddress != NULL)

However, this pointer is already used in arithmetic operations below without checking. For example, here:

size_t release_size =
  (char *)mt->EndAddress - (char *)mt->BaseAddress;

Some large integer value will be obtained, which is actually equal to the value of the mt->EndAddress pointer. This may not be an error, but it looks very suspicious, and I think the code should be rechecked. The code smells as it is incomprehensible and it clearly lacks explanatory comments.

Short names of global variables

I believe that the code smells if it contains global variables with short names. It is easy to mistype and accidentally use a global variable in some function instead of a local one. Example:

static struct critnib *c;

PVS-Studio warnings for such variables:

V707 Giving short names to global variables is considered to be bad practice. It is suggested to rename 'ri' variable. map.c 131
V707 Giving short names to global variables is considered to be bad practice. It is suggested to rename 'c' variable. obj_critnib_mt.c 56
V707 Giving short names to global variables is considered to be bad practice. It is suggested to rename 'Id' variable. obj_list.h 68
V707 Giving short names to global variables is considered to be bad practice. It is suggested to rename 'Id' variable. obj_list.c 34

Stranger things

As for me, the do_memmove function contained the weirdest code. The analyzer issued two warnings that indicate either very serious errors, or the fact that I simply don't understand what was meant. Since the code is very peculiar, I decided to review the warnings issued in a separate section of the article. So, the first warning is issued here.

void
do_memmove(char *dst, char *src, const char *file_name,
    size_t dest_off, size_t src_off, size_t bytes,
    memmove_fn fn, unsigned flags, persist_fn persist)
{
  ....
  /* do the same using regular memmove and verify that buffers match */
  memmove(dstshadow + dest_off, dstshadow + dest_off, bytes / 2);
  verify_contents(file_name, 0, dstshadow, dst, bytes);
  verify_contents(file_name, 1, srcshadow, src, bytes);
  ....
}

PVS-Studio warning: V549 [CWE-688] The first argument of 'memmove' function is equal to the second argument. memmove_common.c 71

Note that the first and second arguments of the function are the same. So the function doesn't actually do anything. What options come to mind:

The author wanted to "touch" the memory block. But will this happen in reality? Will the optimizing compiler remove the code that copies a block of memory to itself?
This is some kind of a unit test for the memmove function.
The code contains a typo.

And here is an equally strange fragment in the same function:

void
do_memmove(char *dst, char *src, const char *file_name,
    size_t dest_off, size_t src_off, size_t bytes,
    memmove_fn fn, unsigned flags, persist_fn persist)
{
  ....
  /* do the same using regular memmove and verify that buffers match */
  memmove(dstshadow + dest_off, srcshadow + src_off, 0);
  verify_contents(file_name, 2, dstshadow, dst, bytes);
  verify_contents(file_name, 3, srcshadow, src, bytes);
  ....
}

PVS-Studio warning: V575 [CWE-628] The 'memmove' function processes '0' elements. Inspect the third argument. memmove_common.c 82

The function transfers 0 bytes. What's that – an error or just an extra check? A unit test? A typo?

For me, this code is incomprehensible and strange.

Why use code analyzers?

It may seem that since few errors are found, the introducing an analyzer in the code development process is not justified. But the point of using static analysis tools is not to perform one-time checks, but to regularly detect errors at the code writing stage. Otherwise, these errors are detected in more expensive and slower ways (debugging, testing, user feedback, and so on). This idea is described in more detail in the article "Errors that static code analysis does not find because it is not used", which I recommend getting acquainted with. And feel free to visit our website to download and try PVS-Studio to scan your projects.

Thanks for your attention!