Pull to refresh

Methodology for calculating results of a task set: taking into account its level of difficulty

Reading time3 min

Yury Korolev (CEO Winkid)

In the world of academic knowledge evaluation, objective calculation of large data presents a serious problem. Can a student studying in an Advanced Maths class and getting B-marks be evaluated equally with another student, getting B-marks in a General Maths class? Can we create a system that would take into account the level of difficulty those students face?

This article will describe a system of independent evaluation we have been using for school olympics in five subjects (Mathematics, English Language, Russian Language, Tatar Language, Social Science) for students grades 1 to 11. In each academic year we organise six qualification tournaments, with about 15,000 students from different regions of Russia. Then we select the top ten participants in each subject and each grade for their future participation in the final (seventh) tournament, where only the best of the best are chosen. It means that 550 participants compete in the final tournament, which is about 5.5% of all participants in the academic year. 

It is obvious that those multiple tournaments cannot be absolutely homogenous, and inevitably the levels of difficulty for each set of tasks vary. Therefore, it is critical for us to take into consideration those variations of difficulty and calculate the results in the most objective manner.

How do we calculate the scores?

Those top ten winners (i.e. 550 in all subjects and grades) are selected based on their two best scores in this school year.

Each participant earns scores based on the number of correct answers.

The tasks are divided in three groups, where a more difficult task is assigned a larger value: 1 score, 2 scores and 3 scores.

Level of difficulty


Other subjects

Number of Questions

Sum of Scores

Number of Questions

Sum of Scores

Simple task (1 score)





Medium task (2 scores)





Hard task (3 scores)










There are fewer questions in Maths olympics because unlike in other subjects they require more time for solutions due to mathematical calculations. Since all our olympics have a time limit of one hour, we have to reduce the number of questions in the Maths section.

The number of questions in the groups are different in order to make the groups more or less equal in scores. One other thing: students are given different time frames for each group of questions: 10 minutes for the simple tasks, 20 minutes for the medium tasks, and 30 minutes for the hard tasks.

Apart from the level of difficulty for each task in each individual olympics, we have to take into account the difficulty of a given olympics compared to the others in this academic year, as it is described right below.

Calculating the rating

We found the solution by calculating the rating of a participant, which is based on his or her performance in each subject and grade. A rating is awarded only to students who took part at least in two qualification tournaments in this subject and grade.

For calculating participant rating scores (RS), we calculate rating scores a participant earned for each olympics in a given subject in that year (RSO), and find the mean of the two best RSOs. This final score is the participant’s rating score.

For calculating RSO we find a Z-score for the participant’s performance in that olympics. It is an index of the relative spread, which shows the number of standard deviations from the mean value. According to the international standards in pedagogy for calculating a score of intelligence tests, Z-score is applied to the scale with the mean of 100 and the standard deviation of 15.

RSO is calculated as follows: RSO = 100 + 15 *(NPS - ANPS) / SD

where, NPS is the number of points scored by the participant, ANPS is the average number of points scored (among all participants in a concrete olympics), SD is the standard deviation of points for that olympics. The value of RSO is rounded off to the first decimal place.

Eliminating mistakes

Thus, we offset possible mistakes of test compilers when the olympics tasks are created not so difficult. The given calculation model also allows us to detect cheating in online olympics, when the tasks are solved with parents’ help.

This way we make sure that online participants who might have cheated could outperform the offline participants. Those who do well in offline olympics usually will win over those who do great in online olympiads, because the average result for offline students (ANPS) will be lesser than the one of the online participants. This means that the offline olympiad will be more difficult, so the children earning the same scores in the online olympics will receive a lesser rating.

We clearly see that in practice. Statistically, offline olympics have a higher rating than online ones,  because some online participants, unfortunately, complete their tasks dishonestly. Cheating leads to an increase in the average score. For example, in exceptional case, when all participants of an olympics answered all questions correctly, all of them will receive 100 rating scores for this olympics 100+15*(25-25)/0=100,  and this will be the most average result. In the offline olympics, the winner gets about 130-140 points.

It gives us assurance that the winners deserve their victory.