GGribkov Oct 31 2019 at 09:55

On request of Embedded Developers: Detecting Errors in Amazon FreeRTOS

15 min

1.4K

PVS-Studio corporate blogC*Open source*Programming microcontrollers*System Programming*

Anyone who programs microcontrollers probably knows about FreeRTOS, or at least heard of this operating system. Amazon developers decided to enhance the abilities of this operating system to work with AWS Internet of Things services. This is how Amazon FreeRTOS appeared. We, developers of the PVS-Studio static code analyzer, were asked by mail and in comments to check these projects. Well, now get what you asked for. Keep reading to find out what came out of it.

Briefly about projects

To begin with, I'll tell you a bit about the forerunner of the project being tested — FreeRTOS (the source code is available here by link). As Wikipedia states, FreeRTOS is a real-time multitasking operating system for embedded systems.

It is written in good old C, which is not surprising — this operating system should work in conditions typical of microcontrollers: low processing power, a small amount of RAM, and the like. The C language allows you to work with resources at a low level and has high performance, so it is best suited to develop such an OS.

Now back to Amazon, which is always on the move developing various promising directions. For example, Amazon is developing an Amazon Lumberyard AAA-engine, which we've also checked.

One of such directions is Internet of Things (IoT). To develop in this area, Amazon decided to write their own operating system — and they took the FreeRTOS core as a basis.

The resulting system, Amazon FreeRTOS, is positioned to «provide a secure connection to Amazon Web Services, such as AWS IoT Core or AWS IoT Greengrass». The source code of this project is available on Github.

In this article, we'll find out if there are errors in FreeRTOS as well as how secure the Amazon operating system is in terms of static code analysis.

The course of the check

The check was performed using the automatic error-finding tool — the PVS-Studio static code analyzer. It is able to detect errors in programs written in C, C++, C#, and Java.

Before the analysis, we have to build the project. This way, I'll be confident that I have all needed dependencies and the project is ready to be checked. One can check the project in a number of ways — for example, using a compilation monitoring system. In this case, I performed the analysis using the plugin for Visual Studio – it's good that repositories of both projects comprise the sets of project files that make it easy to build under Windows.

I just had to build projects to make sure that I had everything ready for the check. Next I ran the analysis and – voila! – I have a ready-made analyzer report in front of me.

Third-party libraries included in these projects can also contain errors, and they can, of course, also affect the program. However, I have excluded them from the analysis for the sake of the purity of the narrative.

So, the projects are analyzed, reports are received, interesting errors are highlighted. It's time to get their review!

What FreeRTOS hides

Initially, I expected to write two separate articles: one for each operating system. I was already rubbing my hands? as I prepared to write a good article about FreeRTOS. Anticipating the discovery of at least a couple of juicy bugs (like CWE-457), I was looking through sparse warnings of the analyzer, and… found nothing. I didn't find any interesting error.

Many of the warnings issued by the analyzer were not relevant to FreeRTOS. For example, such warnings were 64-bit flaws such as casting size_t to uint32_t. It is related to the fact that FreeRTOS is meant to be working on devices with a pointer size no larger than 32 bits.

I have thoroughly checked all V1027 warnings indicating castings between pointers to unrelated structures. If casted structures have the same alignment, then such a casting is a mistake. And I haven't found a single dangerous casting!

All other suspicious places were either associated with coding style or were staffed with a comment explaining why it was done that way and why it wasn't a mistake.

So I'd like to appeal to FreeRTOS developers. Guys, you're awesome! We have hardly seen such clean and high-quality projects as yours. And it was a pleasure to read the clean, neat, and well-documented code. Hats off to you, guys.

Even though I couldn't find any interesting bugs that day, I knew I wouldn't stop there. I was going home with the firm confidence that Amazon's version would 100% have something interesting, and that tomorrow I'd definitely pick up enough bugs for the article. As you may have guessed, I was right.

What Amazon FreeRTOS hides

Amazon's version of the system turned out to be… to put it mildly, a little worse. The legacy of FreeRTOS remained as clean whereas the new improvements concealed a lot of interesting issues.

Some fragments had the program logic broken, some handled pointers incorrectly. In some places, the code could lead to undefined behavior, and there were cases in which the programmer simply did not know about the pattern of an error he made. I even found several serious potential vulnerabilities.

Seems like I've tightened up with the introduction. Let's start figure out errors!

Breaking of the program logic

Let's start with problem places that obviously indicate that the program works not in the way the programmer expected. Suspicious array handling will come first:

/**
 * @brief Pool of request and associated response buffers, 
 *  handles, and configurations.
 */
static _requestPool_t _requestPool = { 0 };

....

static int _scheduleAsyncRequest(int reqIndex,
                                 uint32_t currentRange)
{
  ....

  /* Set the user private data to use in the asynchronous callback context. 
   */
  _requestPool.pRequestDatas[reqIndex].pConnHandle = &_connHandle;
  _requestPool.pRequestDatas[reqIndex].pConnConfig = &_connConfig;
  _requestPool.pRequestDatas[reqIndex].reqNum = reqIndex;
  _requestPool.pRequestDatas[reqIndex].currRange = currentRange;
  _requestPool.pRequestDatas[reqIndex].currDownloaded = 0;
  _requestPool.pRequestDatas[reqIndex].numReqBytes = numReqBytes;

  ....

  _requestPool.pRequestDatas->scheduled = true;

  ....
}

PVS-Studio issued two warnings for this piece of code:

V619 The array '_requestPool.pRequestDatas' is being utilized as a pointer to single object. iot_demo_https_s3_download_async.c 973
V574 The '_requestPool.pRequestDatas' pointer is used simultaneously as an array and as a pointer to single object. Check lines: 931, 973. iot_demo_https_s3_download_async.c 973

Just in case, let me remind you: the array name is the pointer to its first element. That is, if _requestPool.pRequestDatas is an array of structures, _requestPool.pRequestDatas[i].scheduled is an assess to the scheduled member of the i array structure.And if we write _requestPool.pRequestDatas->scheduled, it will turn out that themember of namely the first array structure will beaccessed.

In the excerpt of the code above, that's what happens. In the last line, the value of only the member of the first array structure is changed. In itself, such an accessing is already suspicious, but here the case is even more clear: the _requestPool.pRequestDatas array is assessed by index throughout the body of the function. But at the end the indexing operation was forgotten.

As I understand it, the last line should look like this:

_requestPool.pRequestDatas[reqIndex].scheduled = true;

The next error lies in a small function, so I'll give it completely:

/* Return true if the string " pcString" is found
 * inside the token pxTok in JSON file pcJson. */
static BaseType_t prvGGDJsoneq( const char * pcJson,   
                                const jsmntok_t * const pxTok,
                                const char * pcString )
{
  uint32_t ulStringSize = ( uint32_t ) pxTok->end 
                         - ( uint32_t ) pxTok->start;
  BaseType_t xStatus = pdFALSE;

  if( pxTok->type == JSMN_STRING )
  {
    if( ( uint32_t ) strlen( pcString ) == ulStringSize )
    {
      if( ( int16_t ) strncmp( &pcJson[ pxTok->start ], // <=
                               pcString,
                               ulStringSize ) == 0 )
      {
        xStatus = pdTRUE;
      }
    }
  }

  return xStatus;
}

PVS-Studio warning: V642 [CWE-197] Saving the 'strncmp' function result inside the 'short' type variable is inappropriate. The significant bits could be lost breaking the program's logic. aws_greengrass_discovery.c 637

Let's take a look at the definition of the strncmp function:

int strncmp( const char *lhs, const char *rhs, size_t count );

In the example, the result of the int type, which is 32 bits in size is converted in a variable of the int16_t type. With this «narrowing» converting, the older bits of the returned value will be lost. For example, if the strncmp function returns 0x00010000, unit will be lost during the conversion, and the condition will be executed.

It's actually strange to see such a casting in the condition. Why is it ever needed here, if an ordinary int can be compared with zero? On the other hand, if a programmer wanted this function to sometimes return true even if it shouldn't, why not support such tricky behavior with a comment? But this way it's some kind of a backdoor. Anyway, I'm inclined to think it's an error. What do you think?

Undefined behavior and pointers

Here comes a large example. It covers up a potential null pointer dereference:

static void _networkReceiveCallback(....)
{
  IotHttpsReturnCode_t status = IOT_HTTPS_OK;
  _httpsResponse_t* pCurrentHttpsResponse = NULL;
  IotLink_t* pQItem = NULL;

  ....

  /* Get the response from the response queue. */
  IotMutex_Lock(&(pHttpsConnection->connectionMutex));
  pQItem = IotDeQueue_PeekHead(&(pHttpsConnection->respQ));
  IotMutex_Unlock(&(pHttpsConnection->connectionMutex));

  /* If the receive callback is invoked 
   * and there is no response expected,
   * then this a violation of the HTTP/1.1 protocol. */
  if (pQItem == NULL)
  {
    IotLogError(....);
    fatalDisconnect = true;

    status = IOT_HTTPS_NETWORK_ERROR;
    goto iotCleanup;
  }

  ....

  iotCleanup :

  /* Report errors back to the application. */
  if (status != IOT_HTTPS_OK)
  {
    if ( pCurrentHttpsResponse->isAsync
      && pCurrentHttpsResponse->pCallbacks->errorCallback)
    {
      pCurrentHttpsResponse->pCallbacks->errorCallback(....);
    }

    pCurrentHttpsResponse->syncStatus = status;
  }

  ....
}

PVS-Studio warning: V522 [CWE-690] There might be dereferencing of a potential null pointer 'pCurrentHttpsResponse'. iot_https_client.c 1184

The last if block contains problematic dereferences. Let's find out what's going on here.

The function begins with pCurrentHttpsResponse and pQItem variables initialized by the NULL value and the status variable is initialized by the IOT_HTTPS_OK value, which means that it's all correct.

Further pQItem is assigned the value, returned from the IotDeQueue_PeekHead function, which returns the pointer to the beginning of the double-linked queue.

What happens if the queue is empty? In this case, the IotDeQueue_PeekHead function will return NULL:

static inline IotLink_t* IotDeQueue_PeekHead
                         (const IotDeQueue_t* const pQueue)
{
  return IotListDouble_PeekHead(pQueue);
}
....
static inline IotLink_t* IotListDouble_PeekHead
                         (const IotListDouble_t* const pList)
/* @[declare_linear_containers_list_double_peekhead] */
{
  IotLink_t* pHead = NULL;

  if (pList != NULL)
  {
    if (IotListDouble_IsEmpty(pList) == false)
    {
      pHead = pList->pNext;
    }
  }

  return pHead;
}

Further the condition pQItem == NULL will become true and the control flow will be passed by goto to the lower part of the function. By this time, the pCurrentHttpsResponse pointer will remain null, while status will not be equal to IOT_HTTPS_OK. In the end, we'll get to the same if branch, and ...boom! Well, you know about consequences of such a dereference.

Okay. It was a slightly tricky example. Now I suggest that you take a look at a very simple and understandable potential dereferencing:

int PKI_mbedTLSSignatureToPkcs11Signature
    (uint8_t * pxSignaturePKCS, uint8_t * pxMbedSignature )
{
  int xReturn = 0;
  uint8_t * pxNextLength;

  /* The 4th byte contains the length of the R component */
  uint8_t ucSigComponentLength = pxMbedSignature[ 3 ]; // <=

  if(  ( pxSignaturePKCS == NULL )
    || ( pxMbedSignature == NULL ) )
  {
      xReturn = FAILURE;
  }

  ....
}

PVS-Studio warning: V595 [CWE-476] The 'pxMbedSignature' pointer was utilized before it was verified against nullptr. Check lines: 52, 54. iot_pki_utils.c 52

This function receives to pointers to uint8_t. Both pointers are checked for NULL, which is a good practice — such situations should be worked out immediately.

But here's the problem: by the time pxMbedSignature is checked, it will already be dereferenced literally one line above. Ta-daa!

Another example of speculative code:

CK_RV vAppendSHA256AlgorithmIdentifierSequence
             ( uint8_t * x32ByteHashedMessage,
               uint8_t * x51ByteHashOidBuffer )
{
  CK_RV xResult = CKR_OK;
  uint8_t xOidSequence[] = pkcs11STUFF_APPENDED_TO_RSA_SIG;

  if(  ( x32ByteHashedMessage == NULL )
    || ( x51ByteHashOidBuffer == NULL ) )
  {
      xResult = CKR_ARGUMENTS_BAD;
  }

  memcpy( x51ByteHashOidBuffer,
          xOidSequence,
          sizeof( xOidSequence ) );

  memcpy( &x51ByteHashOidBuffer[ sizeof( xOidSequence ) ],
          x32ByteHashedMessage,
          32 );

  return xResult;
}

PVS-Studio warnings:

V1004 [CWE-628] The 'x51ByteHashOidBuffer' pointer was used unsafely after it was verified against nullptr. Check lines: 275, 280. iot_pkcs11.c 280
V1004 [CWE-628] The 'x32ByteHashedMessage' pointer was used unsafely after it was verified against nullptr. Check lines: 275, 281. iot_pkcs11.c 281

The analyzer warns that function parameters that are pointers, are unsafely used after their check for NULL. Indeed, the arguments are checked. But in case if any of them isn't NULL, no action is taken except for writing in xResult. This section of the code kind of says: «Yeah, so the arguments turned out to be bad. We're going to note it now, and you — keep going, keep going.»

Result: NULL will be passed to memcpy. What can come of it? Where will the values be copied and which ones? In fact, guessing won't help, as the standard clearly states that such a call leads to undefined behavior (see the section 1).

There are other examples of incorrect pointers handling in the analyzer report found in Amazon FreeRTOS, but I think, given examples are enough to show capabilities of PVS-Studio in detecting such errors. Let's take a look at something new.

TRUE != 1

There were several errors related to the pattern, which, unfortunately, is often overlooked.

The fact is that the bool type (from C++) is different from the BOOL type (commonly used in C). The first can only contain a true or false value. The second one is the typedef of an integer type (int, long, and others). The 0 value is «false» for it, and any other value different from zero is «true».

Since there is no built-in Boolean type in C, these constants are defined for convenience:

#define FALSE 0
#define TRUE 1

Let's look at the example.

int mbedtls_hardware_poll(void* data,
                          unsigned char* output,
                          size_t len,
                          size_t* olen)
{
  int lStatus = MBEDTLS_ERR_ENTROPY_SOURCE_FAILED;
  HCRYPTPROV hProv = 0;

  /* Unferenced parameter. */
  (void)data;

  /*
   * This is port-specific for the Windows simulator,
   * so just use Crypto API.
   */

  if (TRUE == CryptAcquireContextA(
                &hProv, NULL, NULL, 
                PROV_RSA_FULL, 
                CRYPT_VERIFYCONTEXT))
  {
    if (TRUE == CryptGenRandom(hProv, len, output))
    {
      lStatus = 0;
      *olen = len;
    }

    CryptReleaseContext(hProv, 0);
  }

  return lStatus;
}

PVS-Studio warnings:

V676 [CWE-253] It is incorrect to compare the variable of BOOL type with TRUE. aws_entropy_hardware_poll.c 48
V676 [CWE-253] It is incorrect to compare the variable of BOOL type with TRUE. Correct expression is: 'FALSE != CryptGenRandom(hProv, len, output)'. aws_entropy_hardware_poll.c 51

Did you find an error? Don't doubt, it is here :) The CryptAcquireContextA and CryptGenRandom functions are standard functions from the wincrypt.h header. If successful, they return the non-zero value. Let me emphasize that it is non-zero. So, theoretically, it could be any value different from zero: 1, 314, 42, 420.

Apparently, the programmer who was writing the function from the example, wasn't thinking about that, and in the end, resulting values are compared to one.

How likely is that the TRUE == CryptGenRandom(....) condition will not be fulfilled? It's hard to say. Perhaps, CryptGenRandom might return 1 more often than other values, but maybe it might return only 1. We can't know this for sure: the implementation of this cryptographic function is hidden from the eyes of mortal programmers :)

It is important to remember that such comparisons are potentially dangerous. Instead of:

if (TRUE == GetBOOL())

Use a safer version of the code:

if (FALSE != GetBOOL())

Optimization problems

Several warnings of the analyzer were related to slowly operating structures. For example:

int _IotHttpsDemo_GetS3ObjectFileSize(....)
{
  ....

  pFileSizeStr = strstr(contentRangeValStr, "/");

  ....
}

PVS-Studio warning: V817 It is more efficient to seek '/' character rather than a string. iot_demo_https_common.c 205

It's short and simple, isn't it? The strstr function is used here for searching only one character, passed in the parameter as a string (it's in double quotes).

This place can potentially be optimized by replacing strstr with strchr:

int _IotHttpsDemo_GetS3ObjectFileSize(....)
{
  ....

  pFileSizeStr = strchr(contentRangeValStr, '/');

  ....
}

This way, searching will work a bit faster. A small, but nice thing.

Well, such optimizations are good, but the analyzer has also found another place, which could be optimized in a much more noticeable way:

void vRunOTAUpdateDemo(void)
{
  ....

  for (; ; )
  {
    ....
    
    xConnectInfo.cleanSession = true;

    xConnectInfo.clientIdentifierLength 
      = (uint16_t)strlen(clientcredentialIOT_THING_NAME);

    xConnectInfo.pClientIdentifier 
      = clientcredentialIOT_THING_NAME;
    
    ....
  }
}

PVS-Studio warning: V814 Decreased performance. The 'strlen' function was called multiple times inside the body of a loop. aws_iot_ota_update_demo.c 235

Hmm… Inside the loop, with each iteration strlen is called that evaluates the length of the same line each time. Not the most effective operation :)

Let's look inside the definition of clientcredentialIOT_THING_NAME:

/*
 * @brief Host name.
 *
 * @todo Set this to the unique name of your IoT Thing.
 */
#define clientcredentialIOT_THING_NAME               ""

The user is asked to enter the name of their device here. By default, it's empty, and in this case, everything is fine. What if a user wants to enter a long and beautiful name there? For example, I'd love to call my brainchild "The Passionate And Sophisticated Coffee Machine BarBarista-N061E The Ultimate Edition." Can you imagine what my surprise would be like if my beautiful coffee machine started working a little slower after that? Nuisance!

To correct the error, it is worth taking strlen outside the body loop. After all, the name of the device does not change during the program working. Oh, constexpr from C++ would suit perfectly here…

Okay, well, let's not gild the lily. As my colleague Andrey Karpov noted, modern compilers know what is strlen and he personally watched them using a constant in binary code if they get that the length of the line can't change. So there's a good chance that in the release build mode, instead of a real line length evaluation, the pre-evaluated value will be used. However, this does not always work, so writing such code is not a good practice.

A few words about MISRA

The PVS-Studio analyzer has a large set of rules to check your code for compliance with MISRA C and MISRA C standards. What are these standards?

MISRA is the coding standard for highly responsible embedded systems. It contains a set of strict rules and guidelines for writing code and setting up a development process. These rules are quite a lot, and they are aimed not only at eliminating serious errors, but also at various «code smells». It is also aimed at writing the most comprehensible and readable code.

Thus, following the MISRA standard not only helps to avoid mistakes and vulnerabilities, but also to significantly reduce the likelihood of their appearing in already existing code.

MISRA is used in the aerospace, medical, automotive and military industries, where human lives depend on the quality of embedded software.

Apparently, Amazon FreeRTOS developers know about this standard, and for the most part follow it. Such approach is absolutely reasonable: if you write a broad-based OS for embedded systems, then you must think about security.

However, I have found quite a lot of violations of the MISRA standard. I'm not going to give examples of rules like «don't use union» or «function should only have one return at the end of the body» — unfortunately, they are not spectacular, as are most of MISRA rules. I'd rather give you examples of violations that could potentially lead to serious consequences.

Let's start with macros:

#define FreeRTOS_ms_to_tick(ms)  ( ( ms * configTICK_RATE_HZ + 500 ) / 1000 )

#define SOCKETS_htonl( ulIn )    ( ( uint32_t )                             \
  (   ( ( ulIn & 0xFF )     << 24 ) | ( ( ulIn & 0xFF00 )     << 8  )       \
    | ( ( ulIn & 0xFF0000 ) >> 8 )  | ( ( ulIn & 0xFF000000 ) >> 24 ) ) )

#define LEFT_ROTATE( x, c )    ( ( x << c ) | ( x >> ( 32 - c ) ) )

PVS-Studio warnings:

V2546 [MISRA C 20.7] The macro and its parameters should be enclosed in parentheses. Consider inspecting the 'ms' parameter of the 'FreeRTOS_ms_to_tick' macro. FreeRTOS_IP.h 201
V2546 [MISRA C 20.7] The macro and its parameters should be enclosed in parentheses. Consider inspecting the 'ulIn' parameter of the 'SOCKETS_htonl' macro. iot_secure_sockets.h 512
V2546 [MISRA C 20.7] The macro and its parameters should be enclosed in parentheses. Consider inspecting parameters 'x', 'c' of the 'LEFT_ROTATE' macro. iot_device_metrics.c 90

Yes, that's exactly what you're thinking. The parameters of these macros are not bracketed. If someone accidentally writes something like

val = LEFT_ROTATE(A[i] | 1, B);

such a «call» of a macro will expand into:

val = ( ( A[i] | 1 << B ) | ( A[i] | 1 >> ( 32 - B ) ) );

Remember the priorities of operations? First, a bitwise shift is made, and only after it — a bitwise «or». Therefore, the logic of the program will be broken. A simpler example: what would happen if the expression "x + y" is passed in the macro FreeRTOS_ms_to_tick? One of the main objectives of MISRA is to prevent such situations.

Some might argue, «If you have programmers who don't know about this, no standard can help you!». I won't agree with that. Programmers are also people, and no matter how experienced a person is, they also can get tired and make a mistake at the end of the day. This is one of the reasons why MISRA strongly recommends using automatic analysis tools to test a project for compliance.

Let me address the developers of Amazon FreeRTOS: PVS-Studio has found 12 more unsafe macros, so you kind of need to be careful with them :)

Another interesting MISRA violation:

/**
 * @brief Callback for an asynchronous request to notify 
 *        that the response is complete.
 *
 * @param[in] 0pPrivData - User private data configured 
 *            with the HTTPS Client library request configuration.
 * @param[in] respHandle - Identifier for the current response finished.
 * @param[in] rc - Return code from the HTTPS Client Library
 *            signaling a possible error.
 * @param[in] status - The HTTP response status.
 */
 static void _responseCompleteCallback(void* pPrivData,
                                       IotHttpsResponseHandle_t respHandle,
                                       IotHttpsReturnCode_t rc,
                                       uint16_t status)
{
  bool* pUploadSuccess = (bool*)pPrivData;

  /* When the remote server response with 200 OK,
     the file was successfully uploaded. */
  if (status == IOT_HTTPS_STATUS_OK)
  {
    *pUploadSuccess = true;
  }
  else
  {
    *pUploadSuccess = false;
  }

  /* Post to the semaphore that the upload is finished. */
  IotSemaphore_Post(&(_uploadFinishedSem));
}

Can you find the bug yourself?

PVS-Studio warning: V2537 [MISRA C 2.7] Functions should not have unused parameters. Consider inspecting the parameter: 'rc'. iot_demo_https_s3_upload_async.c 234

Take a closer look: the rc parameter isn't used anywhere in the function body. While the comment of the function clearly says that this parameter is a return code of another function, and that it can signal an error. Why isn't this parameter handled in any way? Something is clearly wrong here.

However, even without such comments, unused parameters often point to the broken logic of the program. Otherwise, why do you need them in the function signature?

Here I've given a small function that's good for an example in the article. In addition to it, I found 10 other unused parameters. Many of them are used in larger functions, and it is not easy to detect them.

Suspiciously, they haven't been found before. After all, compilers easily detect such cases.

Conclusion

These were not all the issues found by the analyzer, but the article already turned out to be quite large. I hope that thanks to it, amazon FreeRTOS developers will be able to correct some of shortcomings, and may even want to try PVS-Studio on their own. This way it will be more convenient to thoroughly investigate warnings. And as a matter of fact — working with a convenient interface is much easier than looking at a text report.

Thanks for reading our articles! See you in the next publication :D

P.S. It just so happened that this article was published on October 31. Happy Halloween, guys and girls!

Hubs:

On request of Embedded Developers: Detecting Errors in Amazon FreeRTOS

Briefly about projects

The course of the check

What FreeRTOS hides

What Amazon FreeRTOS hides

Breaking of the program logic

Undefined behavior and pointers

TRUE != 1

Optimization problems

A few words about MISRA

Conclusion

Articles

Information