Andrey2008 Jan 20 2021 at 12:49

Espressif IoT Development Framework: 71 Shots in the Foot

28 min

1.1K

PVS-Studio corporate blogInformation Security*C++*C*IOT

0790_Espressif_IoT_Development_Framework/image1.png
One of our readers recommended paying heed to the Espressif IoT Development Framework. He found an error in the project code and asked if the PVS-Studio static analyzer could find it. The analyzer can't detect this specific error so far, but it managed to spot many others. Based on this story and the errors found, we decided to write a classic article about checking an open source project. Enjoy exploring what IoT devices can do to shoot you in the foot.

Software and hardware systems

The father of the C++ language, Bjarne Stroustrup, once said:

"C" makes it very easy to shoot yourself in the foot. In "C++" it is harder to do this, but when you do it, it tears off the whole leg.

In our case, the statement begins to take on a slightly different meaning. Having started with a simple scenario of a programmer making a mistake that leads to incorrect program operation, we now face cases where such a misstep can cause real physical harm.

Projects such as the Espressif IoT Development Framework serve to implement software and hardware systems that interact with humans and control objects in the real world. All this imposes additional requirements for the quality and reliability of the program code. It is from here that such standards as MISRA or AUTOSAR take their foundations. Anyway, that's another story we won't even get into.

Back to the Espressif IoT Development Framework (source code on GitHub: esp-idf). Check out its brief description:

ESP-IDF is Espressif's official IoT Development Framework for the ESP32 and ESP32-S series of SoCs. It provides a self-sufficient SDK for any generic application development on these platforms, using programming languages such as C and C++. ESP-IDF currently powers millions of devices in the field and enables building a variety of network-connected products, ranging from simple light bulbs and toys to big appliances and industrial devices.

I think readers will be interested to see if the developers of this project pay enough attention to its quality and reliability. Unfortunately, there is no such certainty. After reading the article and descriptions of the defects noticed you will share my concerns. So, grab some tea/coffee, a nice longread of text and code is waiting for you.

Back story

I also would like to tell you how we came up with the idea of this article. Yuri Popov (Hardcore IoT fullstack dev & CTO) follows our publications with great interest. Once he wrote to me. He has just manually found an error in the Espressif IoT Development Framework and asked if PVS-Studio could detect that defect. The error relates to a typo in the code, and PVS-Studio has always been famous for being good at detecting such errors.

The incorrect code was in the mdns.c file:

mdns_txt_linked_item_t * txt = service->txt;
while (txt) {
  data_len += 2 + strlen(service->txt->key) + strlen(service->txt->value);
  txt = txt->next;
}

The list gets traversed. Various objects in the list refer to certain strings. Lengths of these strings have to sum up in a specific way. It all would be correct if it were not for the strings' length of only the first object which is summed up.

Correct code:

data_len += 2 + strlen(txt->key) + strlen(txt->value);

To our mutual disappointment of our reader Yura and me, PVS-Studio failed to notice the error. The tool just doesn't know about this error pattern. Actually, our team did not know about this pattern. PVS-Studio, like any other analyzer, can only notice what it has been programmed for :).

Well, it's a pity, but not a big deal. This is one of the sources where we can get ideas for the PVS-Studio development. Users and clients send various error patterns that they have found in the code of their projects. PVS-Studio is not aware of such errors yet. So, we are gradually creating new diagnostic rules. This will also happen with the pattern above. This example is already in the TODO list. We'll implement a new diagnostic rule for detecting similar cases in one of the upcoming analyzer versions.

As a result of all this, Yura himself wrote a small note about this error, how he was looking for it and also about PVS-Studio: "Bug in ESP-IDF: MDNS, Wireshark and what does unicorns have to do with it" [RU]. Plus, he notified the authors of the project about the found error: Spurious MDNS collision detection (IDFGH-4263).

This was not the end of story. Yura suggested that our team checked the project and wrote a note about the results. We did not refuse, as we often make such publications to promote the methodology of static code analysis and PVS-Studio tool as well :).

Honestly, our check was rather incomplete. Unfortunately, there is no "build all" example. Or we didn't figure it out. We started with getting_started\hello_world. It seems to use part of the framework, but not all of it. So, you can find other bugs by getting more framework files compiled. In other words, the fact that only 71 errors will be described in the article is our fault :).

I wasn't trying to find as many bugs as possible. So, when I skimmed through the incomplete report, I immediately realized that there was already more than enough material for the article. Therefore, I got too lazy to delve further into the project.

Fortunately, Yuri Popov, who started the ball rolling, is much more enthusiastic than I am. He told me he was able to achieve a more complete compilation of the framework and checked many more files. His article will most likely follow this one where he will consider an additional portion of errors.

Examples of where false/pointless positives come from

I'd like to warn all enthusiasts who'd like to check the Espressif IoT Development Framework, that you will need to pre-configure the analyzer. Without it, you will drown in a great number of false/useless positives. But the analyzer is not to blame.

Conditional compilation directives (#ifdef) and macros are very actively used in the project code. This coding style confuses the analyzer and generates many useless warnings of the same type. To make it clearer how and why this happens, let's look at a couple of examples.

PVS-Studio warning: V547 Expression 'ret != 0' is always true. esp_hidd.c 45

esp_err_t esp_hidd_dev_init(....)
{
  esp_err_t ret = ESP_OK;
  ....
  switch (transport) {
#if CONFIG_GATTS_ENABLE
  case ESP_HID_TRANSPORT_BLE:
    ret = esp_ble_hidd_dev_init(dev, config, callback);
    break;
#endif /* CONFIG_GATTS_ENABLE */
  default:
    ret = ESP_FAIL;
    break;
  }

  if (ret != ESP_OK) {
    free(dev);
    return ret;
  }
  ....
}

The developer selected the compilation mode, in which the macro CONFIG_GATTS_ENABLE is not defined. Therefore, for the analyzer, this code looks like this:

esp_err_t ret = ESP_OK;
....
switch (transport) {
default:
  ret = ESP_FAIL;
  break;
}
if (ret != ESP_OK) {

The analyzer seems to be right that the condition is always true. On the other hand, there is no benefit from this warning, since, as we understand, the code is completely correct and makes sense. Such situations are extremely common, which makes it difficult to view the report. This is such an unpleasant cost of active usage of conditional compilation :).

Let's have a look at another example. The code actively uses its own kind of assert macros. Unfortunately, they also confuse the analyzer. PVS-Studio warning: V547 Expression 'sntp_pcb != NULL' is always true. sntp.c 664

#define LWIP_PLATFORM_ASSERT(x) do \
  {printf("Assertion \"%s\" failed at line %d in %s\n", \
    x, __LINE__, __FILE__); fflush(NULL); abort();} while(0)

#ifndef LWIP_NOASSERT
#define LWIP_ASSERT(message, assertion) do { if (!(assertion)) { \
  LWIP_PLATFORM_ASSERT(message); }} while(0)
#else  /* LWIP_NOASSERT */
#define LWIP_ASSERT(message, assertion)
#endif /* LWIP_NOASSERT */

sntp_pcb = udp_new_ip_type(IPADDR_TYPE_ANY);
LWIP_ASSERT("Failed to allocate udp pcb for sntp client", sntp_pcb != NULL);
if (sntp_pcb != NULL) {

The LWIP_ASSERT macro expands into the code that will stop program execution if the sntp_pcb pointer is null (see the abort function call). The analyzer is well aware of this. That's why PVS-Studio warns the user that the sntp_pcb != NULL check is pointless.

On the one hand, the analyzer is right. But everything will change if the macro expands into "nothing" in a different compilation mode. In this case, the check will make sense. Yes, in the second scenario, the analyzer will not complain, but this does not change the main point. In the first case, we have an extra warning.

Still this is not that scary. One can reduce most of useless messages after diligent analyzer configuration. In a number of other places, one can improve the situation by changing the style of writing code and macros. But this goes beyond the scope of this article. Additionally, one can use the mechanism for suppressing warnings in specific places, in macros, etc. There is also a mass markup mechanism. You can read more about all this in the article "How to introduce a static code analyzer in a legacy project and not to discourage the team".

Security

Let's start with the warnings, which, in my opinion, relate to security issues. Developers of operating systems, frameworks, and other similar projects should pay special attention to finding code weaknesses that can potentially lead to vulnerabilities.

For the convenience of classifying code weaknesses, CWE (Common Weakness Enumeration) comes in handy. In PVS-Studio you can enable CWE ID display for warnings. For the warnings from this part of the article, I will additionally provide the corresponding CWE ID.

For more information, the search for potential vulnerabilities is covered in the article "PVS-Studio Static Analyzer as a Tool for Protection against Zero-Day Vulnerabilities".

Error N1; Order of arguments

PVS-Studio warning: V764 Possible incorrect order of arguments passed to 'crypto_generichash_blake2b__init_salt_personal' function: 'salt' and 'personal'. blake2b-ref.c 457

int blake2b_init_salt_personal(blake2b_state *S, const uint8_t outlen,
                               const void *personal, const void *salt);

int
blake2b_salt_personal(uint8_t *out, const void *in, const void *key,
                      const uint8_t outlen, const uint64_t inlen,
                      uint8_t keylen, const void *salt, const void *personal)
{
  ....
  if (blake2b_init_salt_personal(S, outlen, salt, personal) < 0)
    abort();
  ....
}

When calling the blake2b_init_salt_personal function the personal and salt arguments get confused. This is hardly intended on purpose and, most likely, this mistake occurred due to inattention. I'm not familiar with project code and cryptography, but my gut tells me that such confusion can have bad consequences.

According to the CWE, this error is classified as CWE-683: Function Call With Incorrect Order of Arguments.

Error N2; Potential loss of significant bits

PVS-Studio warning: V642 Saving the 'memcmp' function result inside the 'unsigned char' type variable is inappropriate. The significant bits could be lost breaking the program's logic. mbc_tcp_master.c 387

static esp_err_t mbc_tcp_master_set_request(
  char* name, mb_param_mode_t mode, mb_param_request_t* request,
  mb_parameter_descriptor_t* reg_data)
{
  ....
  // Compare the name of parameter with parameter key from table
  uint8_t comp_result = memcmp((const char*)name,
                               (const char*)reg_ptr->param_key,
                               (size_t)param_key_len);
  if (comp_result == 0) {
  ....
}

Storing the result of the memcmp function in a single-byte variable is a very bad practice. This is a flaw that could very well turn into a real vulnerability like this: CVE-2012-2122. For more information about why you can't write like this, see the V642 diagnostic documentation.

In short, some implementations of the memset function may return more than 1 or -1 values in case of mismatch of memory blocks. A function, for example, can return 1024. And the number written to a variable of type uint8_t will turn into 0.

According to the CWE, this error is classified as CWE-197: Numeric Truncation Error.

Error N3-N20; Private data remains in memory

PVS-Studio warning: V597 The compiler could delete the 'memset' function call, which is used to flush 'prk' buffer. The memset_s() function should be used to erase the private data. dpp.c 854

#ifndef os_memset
#define os_memset(s, c, n) memset(s, c, n)
#endif

static int dpp_derive_k1(const u8 *Mx, size_t Mx_len, u8 *k1,
       unsigned int hash_len)
{
  u8 salt[DPP_MAX_HASH_LEN], prk[DPP_MAX_HASH_LEN];
  const char *info = "first intermediate key";
  int res;

  /* k1 = HKDF(<>, "first intermediate key", M.x) */

  /* HKDF-Extract(<>, M.x) */
  os_memset(salt, 0, hash_len);
  if (dpp_hmac(hash_len, salt, hash_len, Mx, Mx_len, prk) < 0)
    return -1;
  wpa_hexdump_key(MSG_DEBUG, "DPP: PRK = HKDF-Extract(<>, IKM=M.x)",
      prk, hash_len);

  /* HKDF-Expand(PRK, info, L) */
  res = dpp_hkdf_expand(hash_len, prk, hash_len, info, k1, hash_len);

  os_memset(prk, 0, hash_len);             // <=
  if (res < 0)
    return -1;

  wpa_hexdump_key(MSG_DEBUG, "DPP: k1 = HKDF-Expand(PRK, info, L)",
                  k1, hash_len);
  return 0;
}

A very common mistake. The compiler has the right to remove the memset function call for optimization purposes, since after filling the buffer with zeros, it is no longer used. As a result, private data is not actually erased, but will continue to hang around somewhere in memory. For more information, see the article "Safe Clearing of Private Data".

According to the CWE, this error is classified as CWE-14: Compiler Removal of Code to Clear Buffers.

Other errors of this type:

V597 The compiler could delete the 'memset' function call, which is used to flush 'prk' buffer. The memset_s() function should be used to erase the private data. dpp.c 883
V597 The compiler could delete the 'memset' function call, which is used to flush 'prk' buffer. The memset_s() function should be used to erase the private data. dpp.c 942
V597 The compiler could delete the 'memset' function call, which is used to flush 'psk' buffer. The memset_s() function should be used to erase the private data. dpp.c 3939
V597 The compiler could delete the 'memset' function call, which is used to flush 'prk' buffer. The memset_s() function should be used to erase the private data. dpp.c 5729
V597 The compiler could delete the 'memset' function call, which is used to flush 'Nx' buffer. The memset_s() function should be used to erase the private data. dpp.c 5934
V597 The compiler could delete the 'memset' function call, which is used to flush 'val' buffer. The memset_s() function should be used to erase the private data. sae.c 155
V597 The compiler could delete the 'memset' function call, which is used to flush 'keyseed' buffer. The memset_s() function should be used to erase the private data. sae.c 834
V597 The compiler could delete the 'memset' function call, which is used to flush 'keys' buffer. The memset_s() function should be used to erase the private data. sae.c 838
V597 The compiler could delete the 'memset' function call, which is used to flush 'pkey' buffer. The memset_s() function should be used to erase the private data. des-internal.c 422
V597 The compiler could delete the 'memset' function call, which is used to flush 'ek' buffer. The memset_s() function should be used to erase the private data. des-internal.c 423
V597 The compiler could delete the 'memset' function call, which is used to flush 'finalcount' buffer. The memset_s() function should be used to erase the private data. sha1-internal.c 358
V597 The compiler could delete the 'memset' function call, which is used to flush 'A_MD5' buffer. The memset_s() function should be used to erase the private data. sha1-tlsprf.c 95
V597 The compiler could delete the 'memset' function call, which is used to flush 'P_MD5' buffer. The memset_s() function should be used to erase the private data. sha1-tlsprf.c 96
V597 The compiler could delete the 'memset' function call, which is used to flush 'A_SHA1' buffer. The memset_s() function should be used to erase the private data. sha1-tlsprf.c 97
V597 The compiler could delete the 'memset' function call, which is used to flush 'P_SHA1' buffer. The memset_s() function should be used to erase the private data. sha1-tlsprf.c 98
V597 The compiler could delete the 'memset' function call, which is used to flush 'T' buffer. The memset_s() function should be used to erase the private data. sha256-kdf.c 85
V597 The compiler could delete the 'memset' function call, which is used to flush 'hash' buffer. The memset_s() function should be used to erase the private data. sha256-prf.c 105

Error N21; Private data buffer is not deleted

PVS-Studio warning: V575 The null pointer is passed into 'free' function. Inspect the first argument. sae.c 1185

static int sae_parse_password_identifier(struct sae_data *sae,
           const u8 *pos, const u8 *end)
{
  wpa_hexdump(MSG_DEBUG, "SAE: Possible elements at the end of the frame",
        pos, end - pos);
  if (!sae_is_password_id_elem(pos, end)) {
    if (sae->tmp->pw_id) {
      wpa_printf(MSG_DEBUG,
           "SAE: No Password Identifier included, but expected one (%s)",
           sae->tmp->pw_id);
      return WLAN_STATUS_UNKNOWN_PASSWORD_IDENTIFIER;
    }
    os_free(sae->tmp->pw_id);
    sae->tmp->pw_id = NULL;
    return WLAN_STATUS_SUCCESS; /* No Password Identifier */
  }
  ....
}

If something is wrong with the password and the pw_id pointer is not null, a debug warning is displayed and the function exits. Interestingly, then there is an attempt to free the buffer using a null pointer. Moreover, NULL is written to the null pointer again. None of this makes sense. Most likely, the memory release lines are out of place. And I think the code should be like this:

if (!sae_is_password_id_elem(pos, end)) {
  if (sae->tmp->pw_id) {
    wpa_printf(MSG_DEBUG,
         "SAE: No Password Identifier included, but expected one (%s)",
         sae->tmp->pw_id);
    os_free(sae->tmp->pw_id);
    sae->tmp->pw_id = NULL;
    return WLAN_STATUS_UNKNOWN_PASSWORD_IDENTIFIER;
  }
  return WLAN_STATUS_SUCCESS; /* No Password Identifier */
}

First, it will probably fix the memory leak. Secondly, private data will no longer be stored for a long time in memory somewhere in vain.

According to the CWE, this error is formally classified as CWE-628: Function Call with Incorrectly Specified Arguments. This is how PVS-Studio classifies it. By judging by its essence and consequences, this is another weakness of the code.

Error N22, N23; An uninitialized buffer is used as a key

PVS-Studio warning: V614 Uninitialized buffer 'hex' used. Consider checking the second actual argument of the 'memcpy' function. wps_registrar.c 1657

int wps_build_cred(struct wps_data *wps, struct wpabuf *msg)
{
  ....
  } else if (wps->use_psk_key && wps->wps->psk_set) {
    char hex[65];
    wpa_printf(MSG_DEBUG,  "WPS: Use PSK format for Network Key");
    os_memcpy(wps->cred.key, hex, 32 * 2);
    wps->cred.key_len = 32 * 2;
  } else if (wps->wps->network_key) {
  ....
}

An uninitialized hex buffer is used to initialize a key. It is not clear why it is done is such a way. This may be an attempt to fill the key with a random value, but it's still a very bad option.

In any case, this code needs to be carefully checked.

According to the CWE, this error is classified as CWE-457: Use of Uninitialized Variable.

Similar error: V614 Uninitialized buffer 'hex' used. Consider checking the second actual argument of the 'memcpy' function. wps_registrar.c 1678

Typos and Copy-Paste

Error N24; Classic Copy-Paste

PVS-Studio warning: V523 The 'then' statement is equivalent to the 'else' statement. timer.c 292

esp_err_t timer_isr_register(....)
{
  ....
  if ((intr_alloc_flags & ESP_INTR_FLAG_EDGE) == 0) {
    intr_source = ETS_TG1_T0_LEVEL_INTR_SOURCE + timer_num;
  } else {
    intr_source = ETS_TG1_T0_LEVEL_INTR_SOURCE + timer_num;
  }
  ....
}

I suspect the author copied the line but forgot to change something in it. As a result, regardless of the condition, the same value is written in the intr_source variable.

Note. Well, chances are, this was intended this way. For example, if the values must really match so far (which is "todo-code"). However, in this case there must be an explanatory comment.

Error N25; Parenthesis is misplaced

PVS-Studio warning: V593 Consider reviewing the expression of the 'A = B != C' kind. The expression is calculated as following: 'A = (B != C)'. esp_tls_mbedtls.c 446

esp_err_t set_client_config(....)
{
 ....
 if ((ret = mbedtls_ssl_conf_alpn_protocols(&tls->conf, cfg->alpn_protos) != 0))
 {
   ESP_LOGE(TAG, "mbedtls_ssl_conf_alpn_protocols returned -0x%x", -ret);
   ESP_INT_EVENT_TRACKER_CAPTURE(tls->error_handle, ERR_TYPE_MBEDTLS, -ret);
   return ESP_ERR_MBEDTLS_SSL_CONF_ALPN_PROTOCOLS_FAILED;
 }
 ....
}

The priority of the comparison operator is higher than the priority of the assignment operator. Therefore, the condition is calculated as follows:

TEMP = mbedtls_ssl_conf_alpn_protocols(....) != 0;
if ((ret = TEMP))
  PRINT(...., -ret);

Basically, an erroneous situation is caught and handled in the code, but not as intended. It was supposed to print the error status that is stored in the ret variable. But the ret value will always be 0 or 1. So if something goes wrong, only one value (-1) will always be printed.

The error occurred due to the misplaced parenthesis. Correct code:

if ((ret = mbedtls_ssl_conf_alpn_protocols(&tls->conf, cfg->alpn_protos)) != 0)

Now everything will be calculated as needed:

ret = mbedtls_ssl_conf_alpn_protocols(....);
if (ret != 0)
  PRINT(...., -ret);

Now let's see another very similar case.

Error N26; MP_MEM turns into MP_YES

V593 Consider reviewing the expression of the 'A = B != C' kind. The expression is calculated as following: 'A = (B != C)'. libtommath.h 1660

Let's start with some constants. We will use them below.

#define MP_OKAY       0   /* ok result */
#define MP_MEM        -2  /* out of mem */
#define MP_VAL        -3  /* invalid input */
#define MP_YES        1   /* yes response */

Next, I should mention about the mp_init_multi function that can return MP_OKAY and MP_MEMvalues:

static int mp_init_multi(mp_int *mp, ...);

Here is the code with the error:

static int
mp_div(mp_int * a, mp_int * b, mp_int * c, mp_int * d)
{
  ....
  /* init our temps */
  if ((res = mp_init_multi(&ta, &tb, &tq, &q, NULL) != MP_OKAY)) {
     return res;
  }
  ....
}

Let's consider the check more carefully:

if ((res = mp_init_multi(....) != MP_OKAY))

Again, the parenthesis is placed incorrectly. Therefore, here's what we get at the beginning:

TEMP = (mp_init_multi(....) != MP_OKAY);

The TEMP value can only be 0 or 1. These numbers correspond to the constants MB_OKAY and MP_YES.

Further we see the assignment and the check at the same time:

if ((res = TEMP))
   return res;

You see the catch? The error status of MP_MEM (-2) suddenly turned into the status of MB_YES (1). Consequences are unpredictable, but there's nothing good about them.

Error N27; Forgot to dereference a pointer

PVS-Studio warning: V595 The 'outbuf' pointer was utilized before it was verified against nullptr. Check lines: 374, 381. protocomm.c 374

static int protocomm_version_handler(uint32_t session_id,
                                     const uint8_t *inbuf, ssize_t inlen,
                                     uint8_t **outbuf, ssize_t *outlen,
                                     void *priv_data)
{
    protocomm_t *pc = (protocomm_t *) priv_data;
    if (!pc->ver) {
        *outlen = 0;
        *outbuf = NULL;                                  // <=
        return ESP_OK;
    }

    /* Output is a non null terminated string with length specified */
    *outlen = strlen(pc->ver);
    *outbuf = malloc(*outlen);                           // <=
    if (outbuf == NULL) {                                // <=
        ESP_LOGE(TAG, "Failed to allocate memory for version response");
        return ESP_ERR_NO_MEM;
    }

    memcpy(*outbuf, pc->ver, *outlen);
    return ESP_OK;
}

At first glance, the warning might seem obscure. Let's figure it out.

If the pointer pc->ver is null, the function terminates its work ahead of time and writes a value to the address stored in the outbuf pointer:

*outbuf = NULL;

This address is accessed further as well:

*outbuf = malloc(*outlen);

The analyzer does not like the reason why this pointer is checked:

if (outbuf == NULL)

The approach is definitely incorrect — the pointer is checked after it is dereferenced. Actually, it is not the pointer that is to be checked but what is written in it. The author just made a typo and missed the dereferencing operator (*).

Correct code:

*outbuf = malloc(*outlen);
if (*outbuf == NULL) {
  ESP_LOGE(TAG, "Failed to allocate memory for version response");
  return ESP_ERR_NO_MEM;
}

Error N28; Reassignment

PVS-Studio warning: V519 The 'usRegCount' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 186, 187. mbfuncholding.c 187

eMBException
eMBFuncReadHoldingRegister( UCHAR * pucFrame, USHORT * usLen )
{
  ....
  USHORT          usRegCount;
  ....
  usRegCount = ( USHORT )( pucFrame[MB_PDU_FUNC_READ_REGCNT_OFF] << 8 );
  usRegCount = ( USHORT )( pucFrame[MB_PDU_FUNC_READ_REGCNT_OFF + 1] );
  ....
}

Copy-Paste has definitely held its hands to this code. The line was copied, but only partially changed. It is followed by this sensible code:

usRegCount = ( USHORT )( pucFrame[MB_PDU_FUNC_WRITE_MUL_REGCNT_OFF] << 8 );
usRegCount |= ( USHORT )( pucFrame[MB_PDU_FUNC_WRITE_MUL_REGCNT_OFF + 1] );

There should have probably been the = and |= operators in the first and second lines respectively in the code with the error.

Logical errors

Error N29-N31; Incorrect handling of return codes (Rare)

PVS-Studio warning: V547 Expression is always false. linenoise.c 256

static int getColumns(void) {
  ....
  /* Restore position. */
  if (cols > start) {
    char seq[32];
    snprintf(seq,32,"\x1b[%dD",cols-start);
    if (fwrite(seq, 1, strlen(seq), stdout) == -1) {
      /* Can't recover... */
    }
    flushWrite();
  }
  ....
}

This is a harmless variant of incorrect handling of the status returned by the function. The error is benign in the sense that no handling is required. One failed to write a line, so no big deal. Even though code fragment is harmless, this style of writing programs is clearly not a role model.

The point of the error itself is that the fwrite function does not return the status -1. This is practically impossible, since the fwrite function returns a value of the size_t integer type:

size_t fwrite( const void *restrict buffer, size_t size, size_t count,
               FILE *restrict stream );

And here's what this function returns:

The number of objects written successfully, which may be less than count if an error occurs.

If size or count is zero, fwrite returns zero and performs no other action.

So, the status check is incorrect.

Similar places of harmless incorrect status checks:

V547 Expression is always false. linenoise.c 481
V547 Expression is always false. linenoise.c 569

Error N32, N33; Incorrect handling of return codes (Medium)

PVS-Studio warning: V547 Expression is always false. linenoise.c 596

int linenoiseEditInsert(struct linenoiseState *l, char c) {
  ....
  if (fwrite(&c,1,1,stdout) == -1) return -1;
  ....
}

This error is more serious, although it is similar to the previous one. If the character can't be written to the file, the linenoiseEditInsert function must stop working and return the status -1. But this will not happen, as fwrite will never return the value -1. So, this is a logical error of handling the situation when it is not possible to write something to a file.

Here is a similar error: V547 Expression is always false. linenoise.c 742

Error N34; Incorrect handling of return codes (Well Done)

PVS-Studio warning: V547 Expression is always false. linenoise.c 828

static int linenoiseEdit(char *buf, size_t buflen, const char *prompt)
  ....
  while(1) {
    ....
    if (fread(seq+2, 1, 1, stdin) == -1) break;
    ....
  }
  ....
}

As in the case of fwrite, the error is that the fread function does not return the value -1 as the status.

size_t fread( void *restrict buffer, size_t size, size_t count,
              FILE *restrict stream );

Return value

Number of objects read successfully, which may be less than count if an error or end-of-file condition occurs.

If size or count is zero, fread returns zero and performs no other action.

fread does not distinguish between end-of-file and error, and callers must use feof and ferror to determine which occurred.

This code is even more dangerous. The error of reading from the file is not caught, and the program continues to work with data that is accidentally available at this moment in the data buffer. That is, the program always believes that it has successfully read another byte from the file, although this may not be the case.

Error N35; || operator instead of &&

PVS-Studio warning: V547 Expression is always true. essl_sdio.c 209

esp_err_t essl_sdio_init(void *arg, uint32_t wait_ms)
{
  ....
  // Set block sizes for functions 1 to given value (default value = 512).
  if (ctx->block_size > 0 || ctx->block_size <= 2048) {
    bs = ctx->block_size;
  } else {
    bs = 512;
  }
  ....
}

One can attribute this bug to typos. In my opinion, by its nature it's closer to logical errors. I think the reader understands that classification of errors is often quite conditional.

So, what we have here is an always true condition. As a certain variable is always either greater than 0 or less than 2048. Because of this, the size of a block will not be limited to 512.

Here is the correct version of code:

if (ctx->block_size > 0 && ctx->block_size <= 2048) {
  bs = ctx->block_size;
} else {
  bs = 512;
}

Error N35-N38; Variable does not change

PVS-Studio warning: V547 Expression 'depth <= 0' is always false. panic_handler.c 169

static void print_backtrace(const void *f, int core)
{
  XtExcFrame *frame = (XtExcFrame *) f;
  int depth = 100;                                          // <=
  //Initialize stk_frame with first frame of stack
  esp_backtrace_frame_t stk_frame =
    {.pc = frame->pc, .sp = frame->a1, .next_pc = frame->a0};
  panic_print_str("\r\nBacktrace:");
  print_backtrace_entry(esp_cpu_process_stack_pc(stk_frame.pc),
                        stk_frame.sp);

  //Check if first frame is valid
  bool corrupted =
    !(esp_stack_ptr_is_sane(stk_frame.sp) &&
      (esp_ptr_executable((void *)esp_cpu_process_stack_pc(stk_frame.pc)) ||
       /* Ignore the first corrupted PC in case of InstrFetchProhibited */
       frame->exccause == EXCCAUSE_INSTR_PROHIBITED));

  //Account for stack frame that's already printed
  uint32_t i = ((depth <= 0) ? INT32_MAX : depth) - 1;      // <=
  ....
}

The depth variable is assigned a value of 100, and until this variable is checked, its value does not change anywhere. It is very suspicious. Did someone forget to do something with it?

Similar cases:

V547 Expression 'xAlreadyYielded == ((BaseType_t) 0)' is always true. event_groups.c 260
V547 Expression 'xAlreadyYielded == ((BaseType_t) 0)' is always true. tasks.c 1475
V547 Expression 'xAlreadyYielded == ((BaseType_t) 0)' is always true. tasks.c 1520

Error N39; Uninitialized buffer

PVS-Studio warning: V614 Potentially uninitialized buffer 'k' used. Consider checking the second actual argument of the 'sae_derive_keys' function. sae.c 854

int sae_process_commit(struct sae_data *sae)
{
  u8 k[SAE_MAX_PRIME_LEN];
  if (sae->tmp == NULL ||
      (sae->tmp->ec && sae_derive_k_ecc(sae, k) < 0) ||
      (sae->tmp->dh && sae_derive_k_ffc(sae, k) < 0) ||
      sae_derive_keys(sae, k) < 0)
    return ESP_FAIL;
  return ESP_OK;
}

Logical error. Let's say the ec and dh pointers are null. In this case, the k array is not initialized, but the sae_derive_keys function will still start processing it.

Error N40; Always false condition

PVS-Studio warning: V547 Expression 'bit_len == 32' is always false. spi_flash_ll.h 371

static inline void spi_flash_ll_set_usr_address(spi_dev_t *dev, uint32_t addr,
                                                int bit_len)
{
  // The blank region should be all ones
  if (bit_len >= 32) {
    dev->addr = addr;
    dev->slv_wr_status = UINT32_MAX;
  } else {
    uint32_t padding_ones = (bit_len == 32? 0 : UINT32_MAX >> bit_len);
    dev->addr = (addr << (32 - bit_len)) | padding_ones;
  }
}

As you can easily see, the condition bit_len == 32 will always give a false result. Perhaps the above should not have been written with greater than-or-equal to (>=), but simply using greater than (>).

Error N41; Ressignment

PVS-Studio warning: V519 The '* pad_num' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 46, 48. touch_sensor_hal.c 48

void touch_hal_get_wakeup_status(touch_pad_t *pad_num)
{
  uint32_t touch_mask = 0;
  touch_ll_read_trigger_status_mask(&touch_mask);
  if (touch_mask == 0) {
    *pad_num = -1;
  }
  *pad_num = (touch_pad_t)(__builtin_ffs(touch_mask) - 1);
}

The code is clearly wrong and there may be a missing else statement. I'm not sure, but maybe the code should look like this:

void touch_hal_get_wakeup_status(touch_pad_t *pad_num)
{
  uint32_t touch_mask = 0;
  touch_ll_read_trigger_status_mask(&touch_mask);
  if (touch_mask == 0) {
    *pad_num = -1;
  } else {
    *pad_num = (touch_pad_t)(__builtin_ffs(touch_mask) - 1);
  }
}

Array index out of bounds

Error N42; Incorrect boundary check

PVS-Studio warning: V557 Array overrun is possible. The value of 'frame->exccause' index could reach 16. gdbstub_xtensa.c 132

int esp_gdbstub_get_signal(const esp_gdbstub_frame_t *frame)
{
  const char exccause_to_signal[] =
    {4, 31, 11, 11, 2, 6, 8, 0, 6, 7, 0, 0, 7, 7, 7, 7};
  if (frame->exccause > sizeof(exccause_to_signal)) {
    return 11;
  }
  return (int) exccause_to_signal[frame->exccause];
}

An index might overrun the array boundary by 1 element. For correct check, one should use the greater than-or-equal operator instead of the greater than operator:

if (frame->exccause >= sizeof(exccause_to_signal)) {

Error N43; Long error example :)

In the function below array overrun might happen in two places, so there are two relevant analyzer warnings at once:

V557 Array overrun is possible. The value of 'other_if' index could reach 3. mdns.c 2206
V557 Array overrun is possible. The '_mdns_announce_pcb' function processes value '[0..3]'. Inspect the first argument. Check lines: 1674, 2213. mdns.c 1674

Get ready, it will be a difficult case. First, let's take a look at the following named constants:

typedef enum mdns_if_internal {
    MDNS_IF_STA = 0,
    MDNS_IF_AP = 1,
    MDNS_IF_ETH = 2,
    MDNS_IF_MAX
} mdns_if_t;

Note that the value of the MDNS_IF_MAX constant is 3.

Now let's take a look at the definition of the mdns_server_s structure. Here it is important that the array interfaces consists of 3 elements.

typedef struct mdns_server_s {
    struct {
        mdns_pcb_t pcbs[MDNS_IP_PROTOCOL_MAX];
    } interfaces[MDNS_IF_MAX];
    const char * hostname;
    const char * instance;
    mdns_srv_item_t * services;
    SemaphoreHandle_t lock;
    QueueHandle_t action_queue;
    mdns_tx_packet_t * tx_queue_head;
    mdns_search_once_t * search_once;
    esp_timer_handle_t timer_handle;
} mdns_server_t;

mdns_server_t * _mdns_server = NULL;

But there's more. We'll need to look inside the _mdns_get_other_if function. Note that it can return the MDNS_IF_MAX constant. That is, it can return the value 3.

static mdns_if_t _mdns_get_other_if (mdns_if_t tcpip_if)
{
  if (tcpip_if == MDNS_IF_STA) {
    return MDNS_IF_ETH;
  } else if (tcpip_if == MDNS_IF_ETH) {
     return MDNS_IF_STA;
  }
  return MDNS_IF_MAX;
}

And now, finally, we got to the errors!

static void _mdns_dup_interface(mdns_if_t tcpip_if)
{
    uint8_t i;
    mdns_if_t other_if = _mdns_get_other_if (tcpip_if);
    for (i=0; i<MDNS_IP_PROTOCOL_MAX; i++) {
        if (_mdns_server->interfaces[other_if].pcbs[i].pcb) {        // <=
            //stop this interface and mark as dup
            if (_mdns_server->interfaces[tcpip_if].pcbs[i].pcb) {
                _mdns_clear_pcb_tx_queue_head(tcpip_if, i);
                _mdns_pcb_deinit(tcpip_if, i);
            }
            _mdns_server->interfaces[tcpip_if].pcbs[i].state = PCB_DUP;
            _mdns_announce_pcb(other_if, i, NULL, 0, true);          // <=
        }
    }
}

So, we know that the _mdns_get_other_iffunction can return 3. The variable other_if can be equal to 3. And here is the first potential array boundary violation:

if (_mdns_server->interfaces[other_if].pcbs[i].pcb)

The second place where the other_if variable is used dangerously is when calling the _mdns_announce_pcb function:

_mdns_announce_pcb(other_if, i, NULL, 0, true);

Let's look inside this function:

static void _mdns_announce_pcb(mdns_if_t tcpip_if,
                               mdns_ip_protocol_t ip_protocol,
                               mdns_srv_item_t ** services,
                               size_t len, bool include_ip)
{
  mdns_pcb_t * _pcb = &_mdns_server->interfaces[tcpip_if].pcbs[ip_protocol];
  ....
}

Again, index 3 can be used to access an array consisting of 3 elements, whereas the maximum available index is two.

Null pointers

Error N44-N47; Incorrect order of checking pointers

PVS-Studio warning: V595 The 'hapd->wpa_auth' pointer was utilized before it was verified against nullptr. Check lines: 106, 113. esp_hostap.c 106

bool hostap_deinit(void *data)
{
  struct hostapd_data *hapd = (struct hostapd_data *)data;

  if (hapd == NULL) {
    return true;
  }

  if (hapd->wpa_auth->wpa_ie != NULL) {
    os_free(hapd->wpa_auth->wpa_ie);
  }

  if (hapd->wpa_auth->group != NULL) {
    os_free(hapd->wpa_auth->group);
  }

  if (hapd->wpa_auth != NULL) {
    os_free(hapd->wpa_auth);
  }
  ....
}

Incorrect order of checking pointers:

if (hapd->wpa_auth->group != NULL)
....
if (hapd->wpa_auth != NULL)

If the pointer hapd->wpa_auth is null, then everything will end up badly. The sequence of actions should be reversed and made nested:

if (hapd->wpa_auth != NULL)
{
  ....
  if (hapd->wpa_auth->group != NULL)
  ....
}

Similar errors:

V595 The 'hapd->conf' pointer was utilized before it was verified against nullptr. Check lines: 118, 125. esp_hostap.c 118
V595 The 'sm' pointer was utilized before it was verified against nullptr. Check lines: 1637, 1647. esp_wps.c 1637
V595 The 'sm' pointer was utilized before it was verified against nullptr. Check lines: 1693, 1703. esp_wps.c 1693

Error N48-N64; No pointer checks after memory allocation

As we can see from the project, authors usually check whether it was possible to allocate memory or not. That is, there is a lot of code with such checks:

dhcp_data = (struct dhcp *)malloc(sizeof(struct dhcp));
if (dhcp_data == NULL) {
  return ESP_ERR_NO_MEM;
}

But in some places checks are omitted.

PVS-Studio warning: V522 There might be dereferencing of a potential null pointer 'exp'. Check lines: 3470, 3469. argtable3.c 3470

TRex *trex_compile(const TRexChar *pattern,const TRexChar **error,int flags)
{
  TRex *exp = (TRex *)malloc(sizeof(TRex));
  exp->_eol = exp->_bol = NULL;
  exp->_p = pattern;
  ....
}

This type of error is more complex and dangerous than it may seem at first glance. This topic is discussed in more detail in the article "Why it is important to check what the malloc function returned".

Other places with no checks:

V522 There might be dereferencing of a potential null pointer 's_ledc_fade_rec[speed_mode][channel]'. Check lines: 668, 667. ledc.c 668
V522 There might be dereferencing of a potential null pointer 'environ'. Check lines: 108, 107. syscall_table.c 108
V522 There might be dereferencing of a potential null pointer 'it'. Check lines: 150, 149. partition.c 150
V522 There might be dereferencing of a potential null pointer 'eth'. Check lines: 167, 159. wpa_auth.c 167
V522 There might be dereferencing of a potential null pointer 'pt'. Check lines: 222, 219. crypto_mbedtls-ec.c 222
V522 There might be dereferencing of a potential null pointer 'attr'. Check lines: 88, 73. wps.c 88
V575 The potential null pointer is passed into 'memcpy' function. Inspect the first argument. Check lines: 725, 724. coap_mbedtls.c 725
V575 The potential null pointer is passed into 'memset' function. Inspect the first argument. Check lines: 3504, 3503. argtable3.c 3504
V575 The potential null pointer is passed into 'memcpy' function. Inspect the first argument. Check lines: 496, 495. mqtt_client.c 496
V575 The potential null pointer is passed into 'strcpy' function. Inspect the first argument. Check lines: 451, 450. transport_ws.c 451
V769 The 'buffer' pointer in the 'buffer + n' expression could be nullptr. In such case, resulting value will be senseless and it should not be used. Check lines: 186, 181. cbortojson.c 186
V769 The 'buffer' pointer in the 'buffer + len' expression could be nullptr. In such case, resulting value will be senseless and it should not be used. Check lines: 212, 207. cbortojson.c 212
V769 The 'out' pointer in the 'out ++' expression could be nullptr. In such case, resulting value will be senseless and it should not be used. Check lines: 233, 207. cbortojson.c 233
V769 The 'parser->m_bufferPtr' pointer in the expression equals nullptr. The resulting value of arithmetic operations on this pointer is senseless and it should not be used. xmlparse.c 2090
V769 The 'signature' pointer in the 'signature + curve->prime_len' expression could be nullptr. In such case, resulting value will be senseless and it should not be used. Check lines: 4112, 4110. dpp.c 4112
V769 The 'key' pointer in the 'key + 16' expression could be nullptr. In such case, resulting value will be senseless and it should not be used. Check lines: 634, 628. eap_mschapv2.c 634

Error N65, N66; No pointer checks after memory allocation (indicative case)

The following code contains exactly the same error as we discussed above, but it is more revealing and vivid. Note that the realloc function is used to allocate memory.

PVS-Studio warning: V701 realloc() possible leak: when realloc() fails in allocating memory, original pointer 'exp->_nodes' is lost. Consider assigning realloc() to a temporary pointer. argtable3.c 3008

static int trex_newnode(TRex *exp, TRexNodeType type)
{
  TRexNode n;
  int newid;
  n.type = type;
  n.next = n.right = n.left = -1;
  if(type == OP_EXPR)
    n.right = exp->_nsubexpr++;
  if(exp->_nallocated < (exp->_nsize + 1)) {
    exp->_nallocated *= 2;
    exp->_nodes = (TRexNode *)realloc(exp->_nodes,
                                      exp->_nallocated * sizeof(TRexNode));
  }
  exp->_nodes[exp->_nsize++] = n; // NOLINT(clang-analyzer-unix.Malloc)
  newid = exp->_nsize - 1;
  return (int)newid;
}

First, if the realloc function returns NULL, the previous value of the exp->_nodes pointer will be lost. A memory leak will happen.

Secondly, if the realloc function returns NULL, then the value will not be written by the null pointer at all. By saying that I mean this line:

exp->_nodes[exp->_nsize++] = n;

exp->_nsize++ can have any value. If something is written in a random memory area that is available for writing, the program will continue its execution as if nothing had happened. In doing so, data structures will be destroyed, which will lead to unpredictable consequences.

Another such error: V701 realloc() possible leak: when realloc() fails in allocating memory, original pointer 'm_context->pki_sni_entry_list' is lost. Consider assigning realloc() to a temporary pointer. coap_mbedtls.c 737

Miscellaneous errors

Error N67; Extra or incorrect code

PVS-Studio warning: V547 Expression 'ret != 0' is always false. sdio_slave.c 394

esp_err_t sdio_slave_start(void)
{
  ....
  critical_exit_recv();
  ret = ESP_OK;
  if (ret != ESP_OK) return ret;

  sdio_slave_hal_set_ioready(context.hal, true);
  return ESP_OK;
}

This is strange code that can be shortened to:

esp_err_t sdio_slave_start(void)
{
  ....
  critical_exit_recv();
  sdio_slave_hal_set_ioready(context.hal, true);
  return ESP_OK;
}

I can't say for sure if there is an error or not. Perhaps what we see here is not something that was intended. Or perhaps this code appeared in the process of unsuccessful refactoring and is actually correct. In this case, it is really enough to simplify it a little, so that it looks more decent and understandable. One thing is for sure — this code deserves attention and review by the author.

Error N68; Extra or invalid code

PVS-Studio warning: V547 Expression 'err != 0' is always false. sdio_slave_hal.c 96

static esp_err_t sdio_ringbuf_send(....)
{
  uint8_t* get_ptr = ....;
  esp_err_t err = ESP_OK;
  if (copy_callback) {
    (*copy_callback)(get_ptr, arg);
  }
  if (err != ESP_OK) return err;

  buf->write_ptr = get_ptr;
  return ESP_OK;
}

This case is very similar to the previous one. The err variable is redundant, or someone forgot to change it.

Error N69; A potentially uninitialized buffer

PVS-Studio warning: V614 Potentially uninitialized buffer 'seq' used. Consider checking the first actual argument of the 'strlen' function. linenoise.c 435

void refreshShowHints(struct abuf *ab, struct linenoiseState *l, int plen) {
    char seq[64];
    if (hintsCallback && plen+l->len < l->cols) {
        int color = -1, bold = 0;
        char *hint = hintsCallback(l->buf,&color,&bold);
        if (hint) {
            int hintlen = strlen(hint);
            int hintmaxlen = l->cols-(plen+l->len);
            if (hintlen > hintmaxlen) hintlen = hintmaxlen;
            if (bold == 1 && color == -1) color = 37;
            if (color != -1 || bold != 0)
                snprintf(seq,64,"\033[%d;%d;49m",bold,color);
            abAppend(ab,seq,strlen(seq));                       // <=
            abAppend(ab,hint,hintlen);
            if (color != -1 || bold != 0)
                abAppend(ab,"\033[0m",4);
            /* Call the function to free the hint returned. */
            if (freeHintsCallback) freeHintsCallback(hint);
        }
    }
}

The seq buffer may or may not be full! It is filled only when the condition is met:

if (color != -1 || bold != 0)
  snprintf(seq,64,"\033[%d;%d;49m",bold,color);

It is logical to assume that the condition may not be met, and then the buffer will remain uninitialized. In this case, it can't be used to add to the ab string.

To remedy the situation, one should change the code as follows:

if (color != -1 || bold != 0)
{
  snprintf(seq,64,"\033[%d;%d;49m",bold,color);
  abAppend(ab,seq,strlen(seq));
}

Error N70; Strange mask

PVS-Studio warning: V547 Expression is always false. tasks.c 896

#ifndef portPRIVILEGE_BIT
  #define portPRIVILEGE_BIT ( ( UBaseType_t ) 0x00 )
#endif

static void prvInitialiseNewTask(...., UBaseType_t uxPriority, ....)
{
  StackType_t *pxTopOfStack;
  UBaseType_t x;

  #if (portNUM_PROCESSORS < 2)
  xCoreID = 0;
  #endif

  #if( portUSING_MPU_WRAPPERS == 1 )
    /* Should the task be created in privileged mode? */
    BaseType_t xRunPrivileged;
    if( ( uxPriority & portPRIVILEGE_BIT ) != 0U )
    {
      xRunPrivileged = pdTRUE;
    }
    else
    {
      xRunPrivileged = pdFALSE;
    }
  ....
}

The portPRIVILEGE_BIT constant has the value 0. So, it's weird to use it as a mask:

if( ( uxPriority & portPRIVILEGE_BIT ) != 0U )

Error N71, Memory leak

PVS-Studio warning: V773 The function was exited without releasing the 'sm' pointer. A memory leak is possible. esp_wpa2.c 753

static int eap_peer_sm_init(void)
{
  int ret = 0;
  struct eap_sm *sm;
  ....
  sm = (struct eap_sm *)os_zalloc(sizeof(*sm));
  if (sm == NULL) {
    return ESP_ERR_NO_MEM;
  }

  s_wpa2_data_lock = xSemaphoreCreateRecursiveMutex();
  if (!s_wpa2_data_lock) {
    wpa_printf(MSG_ERROR, ".......");  // NOLINT(clang-analyzer-unix.Malloc)
    return ESP_ERR_NO_MEM;             // <=
  }
  ....
}

If the xSemaphoreCreateRecursiveMutex function fails to create a mutex, then the eap_peer_sm_init function will terminate and a memory leak will occur. As I understand it, one should add a call to the os_free function to clear the memory:

  s_wpa2_data_lock = xSemaphoreCreateRecursiveMutex();
  if (!s_wpa2_data_lock) {
    wpa_printf(MSG_ERROR, ".......");
    os_free(sm);
    return ESP_ERR_NO_MEM;
  }

Interestingly, the Clang compiler also warns us about this error. However, the author of the code for some reason ignored and even specifically suppressed the corresponding warning:

// NOLINT(clang-analyzer-unix.Malloc)

The presence of this suppressing comment is unclear to me. There is definitely a bug. Perhaps the code author simply did not understand what the compiler complained about and decided that it was a false positive.

Conclusion

Thanks for your attention. As you can see, there are a lot of errors. And this was only a cursory review of an incomplete report. I hope that Yuri Popov will take the baton and describe even more mistakes in his subsequent article :).

Use the PVS-Studio static analyzer regularly. This will let you:

find many errors at an early stage, which will significantly reduce the cost of detecting and correcting them;
detect and correct stupid typos and other mistakes using static analysis. You will free up time that can be spent on a higher-level review of the code and algorithms;
better control the quality of the code of beginners and teach them to write clean and reliable code faster.

In addition, when it comes to software for embedded devices, it is very important to eliminate as many errors as possible before the devices are released into service. Therefore, any additional error found using the code analyzer is a great finding. Each undetected error in the hardware and software device potentially carries reputational risks as well as costs for updating the firmware.

You're welcome to download and try a trial PVS-Studio analyzer version. I also remind you that if you are developing an open source project or using the analyzer for academic purposes, we offer several free licenses options for such cases. Don't wait for an insidious bug to eat your leg, start using PVS-Studio right now.

Hubs: