Checking BitTorrent in honor of the 20th anniversary. Time == quality / Habr

Couple of weeks ago (or to be more precise, on July 2, 2021), the legendary BitTorrent protocol turned twenty years old. Created by Bram Cohen, the protocol has been developing rapidly since its inception, and has quickly become one of the most popular ways to exchange files. So why not check out a couple of long-lived related projects with the PVS-Studio analyzer for Linux?

0846_BitTorrent/image1.png

Introduction

Today we're checking two projects: libtorrent (aka "Rasterbar libtorrent" or "rb-libtorrent") and Transmission.

Libtorrent is a free cross-platform library for working with the BitTorrent protocol, written in C++. On the official website the list of advantages mentions effective use of CPU and memory resources, and the ease of use. According to the English wiki, about half of the available BitTorrent clients are based on this library.

Transmission is an open-source cross-platform BitTorrent client. Just like libtorrent, the main advantages of Transmission are usability and efficient use of resources. Besides, the program has no ads, analytics, or paid versions. Besides, it also has GUI (graphical user interface) for various platforms, and headless versions (without GUI) for installation on servers, routers etc.

How it was checked

We used the PVS-Studio static analyzer for Linux running in a container with Ubuntu 20.04 via WSL2. First, run the following console commands to install it. Instructions for other systems are also available in the documentations.

wget -q -O - https://files.viva64.com/etc/pubkey.txt | \
  sudo apt-key add -

sudo wget -O /etc/apt/sources.list.d/viva64.list \
  https://files.viva64.com/etc/viva64.list

sudo apt-get update
sudo apt-get install pvs-studio

Then, before checking, enter the license data. Do it using the following command:

pvs-studio-analyzer credentials NAME KEY

(where NAME and KEY are the license name and key, respectively).

Thus, the license is saved in the ~/.config/PVS-Studio/ directory. We do not have to further specify it with every launch.

By the way, about license… We actively support open-source projects developers. Therefore, not only do we report bugs found in the repository, but also provide a free PVS-Studio version for them. Everyone else can download and try the PVS-Studio analyzer in action with a temporary license :)

Use the easiest way to start the analysis — ask the build system to generate the compile_commands.json file (which lists all the parameters and commands needed to build the project). Then pass it to the PVS-Studio analyzer. For this purpose, during the build, we add the -DCMAKE_EXPORT_COMPILE_COMMANDS=On argument to the cmake call. For example:

cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=On ..

To start the analysis, run the following command in the folder that contains the compile_commands.json file:

pvs-studio-analyzer analyze -o transmission.log -j 8

where the -o key specifies the file to save the results of the analyzer. And the -j flag allows to parallelize the analysis of the required number of threads.

If this way of PVS-Studio introduction is unsuitable, we have examples of using other various build systems and compilers in the documentation.

Another notable point is the use of the SARIF format for viewing the analyzer report. This is especially true for developers who prefer the Visual Studio Code editor. It is because the Sarif Viewer extension available for this editor allows you to view the report and go directly to the affected places in code from it. In the screenshot below you can see the Transmission Project check.

0846_BitTorrent/image3.png

To create a report in the SARIF format when working with PVS-Studio for Linux, run the following command after the analysis:

plog-converter -t sarif -o ./transmission.sarif ./transmission.log -d V1042

where -t sarif just indicates that the result should be saved in the SARIF format. The -o flag indicates the name of the report file. And the -d flag suppresses irrelevant diagnostics in this case.

Read more about the open standard for the exchange of static analysis results (SARIF) on the OASIS Open website. And check the "How to Get Nice Error Reports Using SARIF in GitHub" article to find the example of interaction with GitHub.

Check results

We should compliment the developers as the code is quite clean and a very few warnings are worth mentioning. Of course I wanted to find some interesting errors and look into the details, but… alas. The projects are small, and they are clearly handled by experienced developers. We also found references to the use of third-party static analyzers (Coverity, Cppcheck) in changelogs. However, PVS-Studio managed to find a couple of peculiar mistakes.

Transmission

Let's start with the Transmission project, as it's more popular and frequently used. Beware: the code is reduced and minimally refactored for ease of reading.

Fragment 1: using memset to clear memory.

static void freeMetaUI(gpointer p)
{
  MakeMetaUI* ui = p;

  tr_metaInfoBuilderFree(ui->builder);
  g_free(ui->target);
  memset(ui, ~0, sizeof(MakeMetaUI));
  g_free(ui);
}

Warning V597 The compiler could delete the 'memset' function call, which is used to flush 'ui' object. The memset_s() function should be used to erase the private data. makemeta-ui.c:53

The most frequent mistake is to use the memset function to clear memory. In short, the compiler has every right to delete memset calls if it considers them meaningless. It usually happens when the buffer is cleared in the end of an operation and is no longer used. To make sure that the compilers can remove an unnecessary call, check the same code with Compiler Explorer.

0846_BitTorrent/image5.png

Clang 12.0.1 cuts out the memset call when using the -O2 compilation flag. Many people may be like "whatever", but the problem is that the user's private data may not be cleared out. Maybe the data privacy problem isn't relevant to a torrent client. But the developer can write the code this way in a more significant place. To avoid this, specially designed functions (like memset_s or RtlSecureZeroMemory) should be used. My colleagues have already written one, two and three times about this problem in details.

Fragment 2: errors in libraries are also errors.

void jsonsl_jpr_match_state_init(jsonsl_t jsn,
                                 jsonsl_jpr_t *jprs,
                                 size_t njprs)
{
  size_t ii, *firstjmp;
  ...
  jsn->jprs = (jsonsl_jpr_t *)malloc(sizeof(jsonsl_jpr_t) * njprs);
  jsn->jpr_count = njprs;
  jsn->jpr_root = (size_t*)calloc(1, sizeof(size_t) * njprs * jsn->levels_max);
  memcpy(jsn->jprs, jprs, sizeof(jsonsl_jpr_t) * njprs);

  /* Set the initial jump table values */
  firstjmp = jsn->jpr_root;
  for (ii = 0; ii < njprs; ii++) {
    firstjmp[ii] = ii+1;
  }
}

Warning V575: The potential null pointer is passed into 'memcpy' function. Inspect the first argument. Check lines: 1142, 1139. jsonsl.c:1142

Warning V522 There might be dereferencing of a potential null pointer 'firstjmp'. Check lines: 1147, 1141. jsonsl.c:1147

Two problems were hiding in this fragment. They both relate to the lack of checking the pointer obtained from the malloc/calloc function. It is possible that the error may never manifest itself at all, but this code should be corrected. Why? It's simple — the developer uses third-party libraries and unconditionally trusts them a part of work and calculations. Few people would be pleased if the program suddenly damaged important data, especially because of a third-party library. This problem and its solutions are described in more detail in one of our previous articles: "Why it is important to check what the malloc function returns".

The analyzer also revealed similar suspicious code fragments:

V522 There might be dereferencing of a potential null pointer 'jsn'. Check lines: 117, 113. jsonsl.c:117
V522 There might be dereferencing of a potential null pointer 'i'. DetailsDialog.cc:133
V522 There might be dereferencing of a potential null pointer. TorrentFilter.cc:320

libtorrent

Let's finish with the Transmission and see what interesting things we found with the libtorrent project.

Fragment 1: insufficient check of array indexes

template <typename Handler>
void handshake2(error_code const& e, Handler h)
{
  ...
  std::size_t const read_pos = m_buffer.size();
  ...
  if (m_buffer[read_pos - 1] == '\n' && read_pos > 2) // <=
  {
    if (m_buffer[read_pos - 2] == '\n')
    {
      found_end = true;
    }
    else if (read_pos > 4
      && m_buffer[read_pos - 2] == '\r'
      && m_buffer[read_pos - 3] == '\n'
      && m_buffer[read_pos - 4] == '\r')
    {
      found_end = true;
    }
  }
  ...
}

Warning V781 The value of the 'read_pos' index is checked after it was used. Perhaps there is a mistake in program logic. http_stream.hpp:166.

A classic mistake. The developer first tries to get the m_buffer array element at the read_pos — 1 index and then check read_pos for correctness (read_pos > 2). It's hard to say what would happen in practice. Maybe another variable would be read or maybe Access Violation would occur. After all, undefined behavior was called that for a reason :) The correct solution here is to swap these actions:

if (read_pos > 2 && m_buffer[read_pos - 1] == '\n')

Fragment 2, 3: overwriting values

void dht_tracker::dht_status(session_status& s)
{
  s.dht_torrents += int(m_storage.num_torrents());    // <=

  s.dht_nodes = 0;
  s.dht_node_cache = 0;
  s.dht_global_nodes = 0;
  s.dht_torrents = 0;                                 // <=
  s.active_requests.clear();
  s.dht_total_allocations = 0;

  for (auto& n : m_nodes)
    n.second.dht.status(s);
}

Warning V519 The 's.dht_torrents' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 205, 210. dht_tracker.cpp 210.

In this fragment the variable s.dht_torrents is changed twice: the first time a value is assigned to it, and after a couple of lines it resets without being used between assignments. I.e., we are dealing with a so-called dead store. It's hard to say how code should look like, as the session_status type contains a large number of fields. Perhaps, one of the assignments is superfluous here or the wrong variable is accidentally reset to zero.

The similar problem lies in the next code fragment. It is aggravated by the fact that overwritten variables are more difficult to notice due to the large amount of code and comments between them. Meanwhile, there are three variables here at once and one of them gets the same value as before the condition. These problems are hard to catch manually, but static analysis does that with ease:

void torrent::bytes_done(torrent_status& st, status_flags_t const flags) const
{
  ...
  st.total_done = 0;
  st.total_wanted_done = 0;
  st.total_wanted = m_size_on_disk;
  ...
  if (m_seed_mode || is_seed())
  {
    st.total_done = m_torrent_file->total_size() - m_padding_bytes;
    st.total_wanted_done = m_size_on_disk;
    st.total_wanted = m_size_on_disk;
    ...
    return;
  }
  else if (!has_picker())
  {
    st.total_done = 0;
    st.total_wanted_done = 0;
    st.total_wanted = m_size_on_disk;
    return;
  }
  ...
}

Warnings from PVS-Studio:

V1048 The 'st.total_wanted' variable was assigned the same value. torrent.cpp 3784
V1048 The 'st.total_done' variable was assigned the same value. torrent.cpp 3792
V1048 The 'st.total_wanted_done' variable was assigned the same value. torrent.cpp 3793
V1048 The 'st.total_wanted' variable was assigned the same value. torrent.cpp 3794

Fragment 4: failed explicit type conversion

void torrent::get_download_queue(std::vector<partial_piece_info>* queue) const
{
  ...
  const int blocks_per_piece = m_picker->blocks_in_piece(piece_index_t(0));
  ...
  int counter = 0;
  for (auto i = q.begin(); i != q.end(); ++i, ++counter)
  {
    partial_piece_info pi;
    ...
    pi.blocks = &blk[std::size_t(counter * blocks_per_piece)];
  }
}

Warning V1028 Possible overflow. Consider casting operands of the 'counter * blocks_per_piece' operator to the 'size_t' type, not the result. torrent.cpp 7092

In this case, an explicit type conversion to size_t is used for correct access to array elements. The problem is that both operands are signed integers and an overflow may occur while multiplying them. Very often such code can be found when developers try to quickly silence compiler's warnings. But they only multiply errors. In this case it is enough to cast at least one operand to size_t type to fix the problem. Something like this:

pi.blocks = &blk[std::size_t(counter) * blocks_per_piece];

Similar problems are also found in the following fragments:

V1028 Possible overflow. Consider casting operands of the 'new_size_words + 1' operator to the 'size_t' type, not the result. bitfield.cpp 179
V1028 Possible overflow. Consider casting operands of the 'm_capacity + amount_to_grow' operator to the 'size_t' type, not the result. heterogeneous_queue.hpp 207

Fragment 5: unnecessary conditions

We found many warnings related to unnecessary conditions in libtorrent as well as in Transmission. They cannot be called false, but it makes no sense to list them, because they are not that interesting. To make it clear, look at the following fragment:

char const* operation_name(operation_t const op)
  {
    ...
    static char const* const names[] = {
      ...
    };

    int const idx = static_cast<int>(op);
    if (idx < 0 || idx >= int(sizeof(names) / sizeof(names[0])))
      return "unknown operation";
    return names[idx];
}

Warning V560 A part of conditional expression is always false: idx < 0. alert.cpp 1885.

Here the analyzer warns that the idx < 0 check does not make sense because the index variable gets a value from an enumeration in which only unsigned integers are included:

enum class operation_t : std::uint8_t

Should we pay attention to such warnings? Every developer may have their own opinion on this case. Someone can say that it's pointless to correct them, because they do not indicate real errors, and someone, on the contrary, may say that there is no need to clog the code. We think that such diagnostics is a great opportunity to find good places for future refactoring.

Conclusion

As you see, there were not so many captivating mistakes, which indicates high quality and purity of the tested projects' code. The projects have existed for quite a long time and are actively developed by an open-source community. Judging by the history of commits, projects were previously checked by static analyzers.

The PVS-Studio team loves and actively supports open-source projects. That's why we not only report bugs to developers, but also give them the opportunity to use PVS-Studio for free. Let me also quickly remind you about our free license for students and teachers. In case of commercial projects download and and try our PVS-Studio analyzer. To do this, request a trial license on our website :)

Checking BitTorrent in honor of the 20th anniversary. Time == quality