Preproccesing
Let's determine the percentage of missing values for further work with them and their processing.
df.isnull().mean() * 100
del df['YEAR']
df['COMMENT'].replace({'сумма больше на 1': 'сумма исправлена'}, inplace=True)
We get rid of empty values because these attributes have a low percentage of omissions and have little impact on our dataset.
df = df[df['GRADE_OF_COMPETITION'].notnull()]
df['ID'] = df['ID'].astype(np.int64)
df = df[df['REGIONAL_STATUS'] != 'удален']
Preproccesing Let's determine the percentage of missing values for further work with them and their processing. df.isnull().mean() * 100 del df['YEAR'] df['COMMENT'].replace({'сумма больше на 1': 'сумма исправлена'}, inplace=True) We get rid of empty values because these attributes have a low percentage of omissions and have little impact on our dataset. df = df[df['GRADE_OF_COMPETITION'].notnull()] df['ID'] = df['ID'].astype(np.int64) df = df[df['REGIONAL_STATUS'] != 'удален']
df[df['REGIONAL_STATUS'] == 'Победитель'][['SUM', 'PERCENTAGE', 'REGIONAL_STATUS']].sort_values(by='SUM', ascending=True).head(10)