There is an open project COVID-19 CovidSim Model, written in C++. There is also a PVS-Studio static code analyzer that detects errors very well. One day they met. Embrace the fragility of mathematical modeling algorithms and why you need to make every effort to enhance the code quality.
This little story begins with my ordinary search on GitHub. While looking through the search results, I accidentally came across the COVID-19 CovidSim Model project. Without thinking twice, I decided to check it using the PVS-Studio analyzer.
The project turned out to be tiny. It contains only 13,000 lines of code, not counting empty lines and comments. And there are almost no errors there either. But one mistake is so simple and beautiful that I couldn't pass it by!
void CalcLikelihood(int run, std::string const& DataFile,
std::string const& OutFileBase)
{
....
double m = Data[row][col]; // numerator
double N = Data[row][col + 1]; // denominator
double ModelValue;
// loop over all days of infection up to day of sample
for (int k = offset; k < day; k++)
{
// add P1 to P2 to prevent degeneracy
double prob_seroconvert = P.SeroConvMaxSens *
(1.0 - 0.5 * ((exp(-((double)(_I64(day) - k)) * P.SeroConvP1) + 1.0) *
exp(-((double)(_I64(day) - k)) * P.SeroConvP2)));
ModelValue += c * TimeSeries[k - offset].incI * prob_seroconvert;
}
ModelValue += c * TimeSeries[day - offset].S * (1.0 - P.SeroConvSpec);
ModelValue /= ((double)P.PopSize);
// subtract saturated likelihood
LL += m * log((ModelValue + 1e-20) / (m / N + 1e-20)) +
(N - m) * log((1.0 - ModelValue + 1e-20) / (1.0 - m / N + 1e-20));
....
}
Serious scientific code. Something is calculated. Formulas. Everything looks smart and detailed.
But all these calculations shattered into pieces by human inattention. It's good that the PVS-Studio code analyzer can come to the rescue and point out the bug: V614 [CWE-457] Uninitialized variable 'modelValue' used. CovidSim.cpp 5412
Indeed, let's take a closer look at it:
double ModelValue;
for (int k = offset; k < day; k++)
{
double prob_seroconvert = ....;
ModelValue += c * TimeSeries[k - offset].incI * prob_seroconvert;
}
We are facing a simple and at the same time terrible error: an uninitialized variable. This algorithm can calculate anything.
Well, that's it. There is nothing to explain here. It only remains to remind again that developers of scientific libraries and scientific applications should make additional efforts to ensure the code quality. Crash of an ordinary application is likely to cost much less than the use of incorrect results for scientific, medical, and other calculations.
This is not our first article on this topic:
- Analyzing the Code of ROOT, Scientific Data Analysis Framework
- NCBI Genome Workbench: Scientific Research under Threat
- The Big Calculator Gone Crazy
Use the PVS-Studio static code analyzer! When errors are timely detected you can expect enormous benefits. Thanks for your attention!