SvyatoslavMC Oct 22 2019 at 07:05

Analyzing the Code of ROOT, Scientific Data Analysis Framework

14 min

2.5K

PVS-Studio corporate blogBig Data*C*C++*Open source*

While Stockholm was holding the 118th Nobel Week, I was sitting in our office, where we develop the PVS-Studio static analyzer, working on an analysis review of the ROOT project, a big-data processing framework used in scientific research. This code wouldn't win a prize, of course, but the authors can definitely count on a detailed review of the most interesting defects plus a free license to thoroughly check the project on their own.

Introduction

ROOT is a modular scientific software toolkit. It provides all the functionalities needed to deal with big data processing, statistical analysis, visualisation and storage. It is mainly written in C++. ROOT was born at CERN, at the heart of the research on high-energy physics. Every day, thousands of physicists use ROOT applications to analyze their data or to perform simulations.

PVS-Studio is a tool for detecting software bugs and potential vulnerabilities in the source code of programs written in C, C++, C#, and Java. It runs on 64-bit Windows, Linux, and macOS and can analyze source code written for 32-bit, 64-bit, and embedded ARM platforms.

A new diagnostic's debut

V1046 Unsafe usage of the bool' and 'int' types together in the operation '&='. GSLMultiRootFinder.h 175

int AddFunction(const ROOT::Math::IMultiGenFunction & func) {
  ROOT::Math::IMultiGenFunction * f = func.Clone();
  if (!f) return 0;
  fFunctions.push_back(f);
  return fFunctions.size();
}

template<class FuncIterator>
bool SetFunctionList( FuncIterator begin, FuncIterator end) {
  bool ret = true;
  for (FuncIterator itr = begin; itr != end; ++itr) {
    const ROOT::Math::IMultiGenFunction * f = *itr;
    ret &= AddFunction(*f);
  }
  return ret;
}

First off, here's a wonderful bug found by the beta version of PVS-Studio, which I was using for this review.

Expectations. The SetFunctionList function traverses an iterator list. If at least one iterator is invalid, the function returns false, or true otherwise.

Reality. The SetFunctionList function can return false even for valid iterators. Let's figure out why. TheAddFunction function returns the number of valid iterators on the fFunctions list. That is, adding non-null iterators will cause the list to incrementally grow in size: 1, 2, 3, 4, and so on. This is where the bug comes into play:

ret &= AddFunction(*f);

Since the function returns a value of type int rather than bool, the '&=' operation will return false for even values because the least significant bit of an even number is always set to zero. This is how one subtle bug can break the return value of SetFunctionsList even when its arguments are valid.

Errors in conditional expressions

V501 There are identical sub-expressions to the left and to the right of the '&&' operator: module && module rootcling_impl.cxx 3650

virtual void HandleDiagnostic(....) override
{
  ....
  bool isROOTSystemModuleDiag = module && ....;
  bool isSystemModuleDiag = module && module && module->IsSystem;
  if (!isROOTSystemModuleDiag && !isSystemModuleDiag)
    fChild->HandleDiagnostic(DiagLevel, Info);
  ....
}

Let's start with the least harmful bug. The module pointer is checked twice. One of the checks is probably redundant, yet it would still be wise to fix it to avoid any confusion in the future.

V501 There are identical sub-expressions 'strchr(fHostAuth->GetHost(), '*')' to the left and to the right of the '||' operator. TAuthenticate.cxx 300

TAuthenticate::TAuthenticate(TSocket *sock, const char *remote,
                             const char *proto, const char *user)
{
  ....
  // If generic THostAuth (i.e. with wild card or user == any)
  // make a personalized memory copy of this THostAuth
  if (strchr(fHostAuth->GetHost(),'*') || strchr(fHostAuth->GetHost(),'*') ||
     fHostAuth->GetServer() == -1 ) {
    fHostAuth = new THostAuth(*fHostAuth);
    fHostAuth->SetHost(fqdn);
    fHostAuth->SetUser(checkUser);
    fHostAuth->SetServer(servtype);
  }
  ....
}

The fHostAuth->GetHost() string is scanned for the '*' character twice. One of these checks was probably meant to look for the '?' character as these two characters are typically the ones used to specify various wildcard masks.

V517 The use of 'if (A) {...} else if (A) {...}' pattern was detected. There is a probability of logical error presence. Check lines: 163, 165. TProofMonSenderML.cxx 163

Int_t TProofMonSenderML::SendSummary(TList *recs, const char *id)
{
  ....
  if (fSummaryVrs == 0) {
    if ((dsn = recs->FindObject("dataset"))) recs->Remove(dsn);
  } else if (fSummaryVrs == 0) {
    // Only the first records
    xrecs = new TList;
    xrecs->SetOwner(kFALSE);
    TIter nxr(recs);
    TObject *o = 0;
    while ((o = nxr())) {
       if (!strcmp(o->GetName(), "vmemmxw")) break;
       xrecs->Add(o);
    }
  }
  ....
}

The fSummaryVrs variable is compared with zero twice, so execution never reaches the code in the else-if branch. And there's quite a bit of code there…

V523 The 'then' statement is equivalent to the 'else' statement. TKDTree.cxx 805

template <typename  Index, typename Value>
void TKDTree<Index, Value>::UpdateRange(....)
{
  ....
  if (point[fAxis[inode]]<=fValue[inode]){
    //first examine the node that contains the point
    UpdateRange(GetLeft(inode),point, range, res);
    UpdateRange(GetRight(inode),point, range, res);
  } else {
    UpdateRange(GetLeft(inode),point, range, res);
    UpdateRange(GetRight(inode),point, range, res);
  }
  ....
}

The same block of code, which is a copy-paste clone, is executed no matter the condition. I guess there's a confusion between the words left and right.

The project is full of suspicious spots like that:

V523 The 'then' statement is equivalent to the 'else' statement. TContainerConverters.cxx 51
V523 The 'then' statement is equivalent to the 'else' statement. TWebFile.cxx 1310
V523 The 'then' statement is equivalent to the 'else' statement. MethodMLP.cxx 423
V523 The 'then' statement is equivalent to the 'else' statement. RooAbsCategory.cxx 394

V547 Expression '!file_name_value.empty()' is always false. SelectionRules.cxx 1423

bool SelectionRules::AreAllSelectionRulesUsed() const {
  for(auto&& rule : fClassSelectionRules){
    ....
    std::string file_name_value;
    if (!rule.GetAttributeValue("file_name", file_name_value))
     file_name_value.clear();

    if (!file_name_value.empty()) {                  // <=
      // don't complain about defined_in rules
      continue;
    }

    const char* attrName = nullptr;
    const char* attrVal = nullptr;
    if (!file_name_value.empty()) {                  // <=
      attrName = "file name";
      attrVal = file_name_value.c_str();
    } else {
      attrName = "class";
      if (!name.empty()) attrVal = name.c_str();
    }
    ROOT::TMetaUtils::Warning(0,"Unused %s rule: %s\n", attrName, attrVal);
  }
  ....
}

This is probably not a bug; the analyzer just found some code that can be simplified. Since the return value of file_name_value.empty() is already checked at the beginning of the loop, the second duplicate check can be removed, thus throwing away a good amount of unnecessary code.

V590 Consider inspecting the '!file1 || c <= 0 || c == '*' || c != '('' expression. The expression is excessive or contains a misprint. TTabCom.cxx 840

TString TTabCom::DetermineClass(const char varName[])
{
  ....
  c = file1.get();
  if (!file1 || c <= 0 || c == '*' || c != '(') {
    Error("TTabCom::DetermineClass", "variable \"%s\" not defined?",
        varName);
    goto cleanup;
  }
  ....
}

Here's the problem part of the conditional expression reported by the analyzer:

if (.... || c == '*' || c != '(') {
  ....
}

The check for the asterisk character won't affect the condition's result. This part will always be true for any character other than '('. You can easily check it for yourself by drawing a truth table.

Two more warnings about conditions with strange logic:

V590 Consider inspecting this expression. The expression is excessive or contains a misprint. TFile.cxx 3963
V590 Consider inspecting this expression. The expression is excessive or contains a misprint. TStreamerInfoActions.cxx 3084

V593 Consider reviewing the expression of the 'A = B < C' kind. The expression is calculated as following: 'A = (B < C)'. TProofServ.cxx 1903

Int_t TProofServ::HandleSocketInput(TMessage *mess, Bool_t all)
{
  ....
  if (Int_t ret = fProof->AddWorkers(workerList) < 0) {
    Error("HandleSocketInput:kPROOF_GETSLAVEINFO",
          "adding a list of worker nodes returned: %d", ret);
  }
  ....
}

This bug reveals itself only in the case of the program's faulty behavior. The ret variable is supposed to store the return code of the AddWorkers function and write that value to the log in case of error condition. But it doesn't work as intended. The condition lacks additional parentheses forcing the desired order of evaluation. What the ret variable actually stores is not the return code but the result of the logical comparison, i.e. either 0 or 1.

Another similar problem:

V593 Consider reviewing the expression of the 'A = B < C' kind. The expression is calculated as following: 'A = (B < C)'. TProofServ.cxx 3897

V768 The enumeration constant 'kCostComplexityPruning' is used as a variable of a Boolean-type. MethodDT.cxx 283

enum EPruneMethod {kExpectedErrorPruning=0, kCostComplexityPruning, kNoPruning};

void TMVA::MethodDT::ProcessOptions()
{
  ....
  if (fPruneStrength < 0) fAutomatic = kTRUE;
  else fAutomatic = kFALSE;
  if (fAutomatic && fPruneMethod==!DecisionTree::kCostComplexityPruning){
    Log() << kFATAL
          <<  "Sorry automatic pruning strength determination is ...." << Endl;
  }
  ....
}

Hm… Why negate the constant value kCostComplexityPruning? I suspect the negation character is a typo, which now distorts the execution logic.

Pointer handling errors

V522 Dereferencing of the null pointer 'pre' might take place. TSynapse.cxx 61

void TSynapse::SetPre(TNeuron * pre)
{
  if (pre) {
    Error("SetPre","this synapse is already assigned to a pre-neuron.");
    return;
  }
  fpre = pre;
  pre->AddPost(this);
}

I did my best trying to understand this strange code, and it seems the idea was to avoid assigning a new value to the fpre field. If so, the programmer is accidentally checking the wrong pointer. The current implementation leads to dereferencing a null pointer if you pass the nullptr value to the SetPre function.

I think this snippet should be fixed as follows:

void TSynapse::SetPre(TNeuron * pre)
{
  if (fpre) {
    Error("SetPre","this synapse is already assigned to a pre-neuron.");
    return;
  }
  fpre = pre;
  pre->AddPost(this);
}

This, however, wouldn't prevent the passing of a null pointer to the function, but at least this version is more logically consistent than the original one.

A slightly modified clone of this code can be found in another spot:

V522 Dereferencing of the null pointer 'post' might take place. TSynapse.cxx 74

V595 The 'N' pointer was utilized before it was verified against nullptr. Check lines: 484, 488. Scanner.cxx 484

bool RScanner::shouldVisitDecl(clang::NamedDecl *D)
{
   if (auto M = D->getOwningModule()) {                      // <= 2
      return fInterpreter.getSema().isModuleVisible(M);
   }
   return true;
}

bool RScanner::VisitNamespaceDecl(clang::NamespaceDecl* N)
{
 if (fScanType == EScanType::kOnePCM)
  return true;

 if (!shouldVisitDecl(N))                                    // <= 1
  return true;

 if((N && N->isImplicit()) || !N){                           // <= 3
    return true;
 }
 ....
}

This is an extremely dangerous piece of code! The N pointer isn't checked for null before it gets dereferenced the first time. What is more, you can't see it happen here because the dereference takes place inside the shouldVisitDecl function.

This diagnostic traditionally generates a bunch of relevant warnings. Here are just a few examples:

V595 The 'file' pointer was utilized before it was verified against nullptr. Check lines: 141, 153. TFileCacheRead.cxx 141
V595 The 'fFree' pointer was utilized before it was verified against nullptr. Check lines: 2029, 2038. TFile.cxx 2029
V595 The 'tbuf' pointer was utilized before it was verified against nullptr. Check lines: 586, 591. TGText.cxx 586
V595 The 'fPlayer' pointer was utilized before it was verified against nullptr. Check lines: 3425, 3430. TProof.cxx 3425
V595 The 'gProofServ' pointer was utilized before it was verified against nullptr. Check lines: 1192, 1194. TProofPlayer.cxx 1192
V595 The 'projDataTmp' pointer was utilized before it was verified against nullptr. Check lines: 791, 804. RooSimultaneous.cxx 791

The next one is not a bug, but it's yet another example of how macros encourage writing faulty or redundant code.

V571 Recurring check. The 'if (fCanvasImp)' condition was already verified in line 799. TCanvas.cxx 800

#define SafeDelete(p) { if (p) { delete p; p = 0; } }

void TCanvas::Close(Option_t *option)
{
  ....
  if (fCanvasImp)
    SafeDelete(fCanvasImp);
  ....
}

The fCanvasImp pointer is checked twice, with one of the checks already implemented in the SafeDelete macro. One of the problems with macros is that they are difficult to navigate from within the code, which is the reason why many programmers don't examine their contents before use.

Array handling errors

V519 The 'Line[Cursor]' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 352, 353. Editor.cpp 353

size_t find_last_non_alnum(const std::string &str,
                             std::string::size_type index = std::string::npos) {
  ....
  char tmp = Line.GetText()[Cursor];
  Line[Cursor] = Line[Cursor - 1];
  Line[Cursor] = tmp;
  ....
}

The element Line[Cursor] is assigned a new value, which then immediately gets overwritten. That doesn't look right…

V557 Array overrun is possible. The 'ivar' index is pointing beyond array bound. BasicMinimizer.cxx 130

bool BasicMinimizer::SetVariableValue(unsigned int ivar, double val) {
  if (ivar > fValues.size() ) return false;
  fValues[ivar] = val;
  return true;
}

Making this mistake when checking array indexes is a recent trend; we see it in almost every third project. While indexing into an array inside a loop is easy – you typically use the '<' operator to compare the index with the array's size – checks like the one shown above require the '>=' operator, not '>'. Otherwise you risk indexing one element beyond the array's bound.

This bug was cloned throughout the code a few times:

V557 Array overrun is possible. The 'ivar' index is pointing beyond array bound. BasicMinimizer.cxx 186
V557 Array overrun is possible. The 'ivar' index is pointing beyond array bound. BasicMinimizer.cxx 194
V557 Array overrun is possible. The 'ivar' index is pointing beyond array bound. BasicMinimizer.cxx 209
V557 Array overrun is possible. The 'ivar' index is pointing beyond array bound. BasicMinimizer.cxx 215
V557 Array overrun is possible. The 'ivar' index is pointing beyond array bound. BasicMinimizer.cxx 230

V621 Consider inspecting the 'for' operator. It's possible that the loop will be executed incorrectly or won't be executed at all. TDataMember.cxx 554

Int_t TDataMember::GetArrayDim() const
{
 if (fArrayDim<0 && fInfo) {
    R__LOCKGUARD(gInterpreterMutex);
    TDataMember *dm = const_cast<TDataMember*>(this);
    dm->fArrayDim = gCling->DataMemberInfo_ArrayDim(fInfo);
    // fArrayMaxIndex should be zero
    if (dm->fArrayDim) {
       dm->fArrayMaxIndex = new Int_t[fArrayDim];
       for(Int_t dim = 0; dim < fArrayDim; ++dim) {
          dm->fArrayMaxIndex[dim] = gCling->DataMemberInfo_MaxIndex(fInfo,dim);
       }
    }
 }
 return fArrayDim;
}

In the for loop, the developers apparently meant to compare the dim variable with dm->fArrayDim rather than fArrayDim. The value of fArrayDim is negative, which is guaranteed by the condition at the beginning of the function. Consequently, this loop will never execute.

V767 Suspicious access to element of 'current' array by a constant index inside a loop. TClingUtils.cxx 3082

llvm::StringRef ROOT::TMetaUtils::DataMemberInfo__ValidArrayIndex(....)
{
  ....
  while (current!=0) {
    // Check the token
    if (isdigit(current[0])) {
       for(i=0;i<strlen(current);i++) {
          if (!isdigit(current[0])) {
             if (errstr) *errstr = current;
             if (errnum) *errnum = NOT_INT;
             return llvm::StringRef();
          }
       }
    } else { // current token is not a digit
      ....
    }
    ....
  }
  ....
}

This code is parsing and checking some string. If the current string's first character (i.e. at index 0) has been recognized as a number, the loop will traverse all the rest characters to make sure all of them are numbers. Well, at least that's the idea. The problem is, the i counter is not used in the loop. The condition should be rewritten so that it checks current[i] rather than current[0].

Memory leak

V773 The function was exited without releasing the 'optionlist' pointer. A memory leak is possible. TDataMember.cxx 355

void TDataMember::Init(bool afterReading)
{
  ....
  TList *optionlist = new TList();       //storage for options strings

  for (i=0;i<token_cnt;i++) {
     if (strstr(tokens[i],"Items")) {
        ptr1 = R__STRTOK_R(tokens[i], "()", &rest);
        if (ptr1 == 0) {
           Fatal("TDataMember","Internal error, found \"Items....",GetTitle());
           return;
        }
        ptr1 = R__STRTOK_R(nullptr, "()", &rest);
        if (ptr1 == 0) {
           Fatal("TDataMember","Internal error, found \"Items....",GetTitle());
           return;
        }
        ....
     }
     ....
  }
  ....
  // dispose of temporary option list...
  delete optionlist;
  ....
}

The optionList pointer is not freed before returning from the function. I don't know whether such freeing is necessary in this particular case, but when we report errors like that, developers usually fix them. It all depends on whether or not you want your program to keep running in case of error condition. ROOT has a bunch of defects like that, so I'd advise the authors to recheck the project themselves.

memset again

V597 The compiler could delete the 'memset' function call, which is used to flush 'x' buffer. The memset_s() function should be used to erase the private data. TMD5.cxx 366

void TMD5::Transform(UInt_t buf[4], const UChar_t in[64])
{
  UInt_t a, b, c, d, x[16];
  ....
  // Zero out sensitive information
  memset(x, 0, sizeof(x));
}

Many think the comment won't make it to the binary file after compilation, and they are absolutely correct :D. What some may not know is that the compiler will remove the memset function as well. And this will happen for sure. If the buffer in question is no longer used further in the code, the compiler will optimize away the function call. Technically, it's a reasonable decision, but if the buffer was storing any private data, those data will stay there. This is a classic security weakness CWE-14.

Miscellaneous

V591 Non-void function should return a value. LogLikelihoodFCN.h 108

LogLikelihoodFCN & operator = (const LogLikelihoodFCN & rhs) {
  SetData(rhs.DataPtr() );
  SetModelFunction(rhs.ModelFunctionPtr() );
  fNEffPoints = rhs.fNEffPoints;
  fGrad = rhs.fGrad;
  fIsExtended = rhs.fIsExtended;
  fWeight = rhs.fWeight;
  fExecutionPolicy = rhs.fExecutionPolicy;
}

The overloaded operator has no return value. This is another recent trend.

V596 The object was created but it is not being used. The 'throw' keyword could be missing: throw runtime_error(FOO); RTensor.hxx 363

template <typename Value_t, typename Container_t>
inline RTensor<Value_t, Container_t> RTensor<Value_t, Container_t>::Transpose()
{
  if (fLayout == MemoryLayout::RowMajor) {
    fLayout = MemoryLayout::ColumnMajor;
  } else if (fLayout == MemoryLayout::ColumnMajor) {
    fLayout = MemoryLayout::RowMajor;
  } else {
    std::runtime_error("Memory layout is not known.");
  }
  ....
}

The problem is that the programmer accidentally left out the throw keyword, thus preventing the throwing of an exception in case of error condition.

There were only two warnings of this type. Here's the second:

V596 The object was created but it is not being used. The 'throw' keyword could be missing: throw runtime_error(FOO); Forest.hxx 137

V609 Divide by zero. Denominator range [0..100]. TGHtmlImage.cxx 340

const char *TGHtml::GetPctWidth(TGHtmlElement *p, char *opt, char *ret)
{
  int n, m, val;
  ....
  if (n < 0 || n > 100) return z;
  if (opt[0] == 'h') {
    val = fCanvas->GetHeight() * 100;
  } else {
    val = fCanvas->GetWidth() * 100;
  }
  if (!fInTd) {
    snprintf(ret, 15, "%d", val / n);  // <=
  } else {
    ....
  }
  ....
}

This one is similar to the array handling examples discussed earlier. The n variable is limited to the range from 0 up to 100. But then there's a branch that performs division by the n variable which may have the value 0. I think the range limits of n should be fixed as follows:

if (n <= 0 || n > 100) return z;

V646 Consider inspecting the application's logic. It's possible that 'else' keyword is missing. TProofServ.cxx 729

TProofServ::TProofServ(Int_t *argc, char **argv, FILE *flog)
       : TApplication("proofserv", argc, argv, 0, -1)
{
  ....
  if (!logmx.IsDigit()) {
    if (logmx.EndsWith("K")) {
      xf = 1024;
      logmx.Remove(TString::kTrailing, 'K');
    } else if (logmx.EndsWith("M")) {
      xf = 1024*1024;
      logmx.Remove(TString::kTrailing, 'M');
    } if (logmx.EndsWith("G")) {
      xf = 1024*1024*1024;
      logmx.Remove(TString::kTrailing, 'G');
    }
  }
  ....
}

The analyzer reports a strangely formatted if statement with the missing else keyword. The way this code looks suggests that it does need to be fixed.

A couple more warnings of this type:

V646 Consider inspecting the application's logic. It's possible that 'else' keyword is missing. TFormula_v5.cxx 3702
V646 Consider inspecting the application's logic. It's possible that 'else' keyword is missing. RooAbsCategory.cxx 604

V663 Infinite loop is possible. The 'cin.eof()' condition is insufficient to break from the loop. Consider adding the 'cin.fail()' function call to the conditional expression. MethodKNN.cxx 602

void TMVA::MethodKNN::ReadWeightsFromStream(std::istream& is)
{
  ....
  while (!is.eof()) {
    std::string line;
    std::getline(is, line);

    if (line.empty() || line.find("#") != std::string::npos) {
       continue;
    }
    ....
  }
  ....
}

When working with the std::istream class, calling the eof() function is not enough to terminate the loop. The eof() function will always return false if the data can't be read, and there are no other termination points in this code. To guarantee the termination of the loop, an additional check of the value returned by the fail() function is required:

while (!is.eof() && !is.fail())
{ 
....
}

As an alternative, it can be rewritten as follows:

while (is)
{ 
....
}

V678 An object is used as an argument to its own method. Consider checking the first actual argument of the 'Copy' function. TFormLeafInfo.cxx 2414

TFormLeafInfoMultiVarDim::TFormLeafInfoMultiVarDim(
  const TFormLeafInfoMultiVarDim& orig) : TFormLeafInfo(orig)
{
   fNsize = orig.fNsize;
   fSizes.Copy(fSizes);   // <=
   fCounter2 = orig.fCounter2?orig.fCounter2->DeepCopy():0;
   fSumOfSizes = orig.fSumOfSizes;
   fDim = orig.fDim;
   fVirtDim = orig.fVirtDim;
   fPrimaryIndex = orig.fPrimaryIndex;
   fSecondaryIndex = orig.fSecondaryIndex;
}

Let's finish the article with this nice little typo. The Copy function should be called with orig.fSizes, not fSizes.

Conclusion

About one year ago, we checked the NCBI Genome Workbench project, which is another program used in scientific research that deals with genome analysis. I'm mentioning this because the quality of scientific software is extremely crucial, yet developers tend to underestimate it.

By the way, macOS 10.15 Catalina was released the other day, where they ceased support of 32-bit applications. Luckily, PVS-Studio offers a large set of diagnostics specifically designed to detect bugs that accompany the porting of programs to 64-bit systems. Learn more in this post by the PVS-Studio team.

Hubs: