Marketing with ML decision making / Habr

Backlog prioritization leads to the choice between strategies. Each one has its metrics. There is a requirement to choose the most important one. ML scoring is a solution when non linearity exists and economy is nonlinear. See introduction here. Two groups are considered. First (I) corresponds to web conversion {bounce rate, micro conversion, time, depth}. Second (II) corresponds to attraction of new visitors from organic channel {visits, viewers, views}. The target function is a number of commercial offers per day. The task is to reduce the dimension to get the optimal simple strategy. In this case online/offline B2B channels can't be separated: market is thin and new customers may have some information about 'the brand' from both channels. Therefore statistical evaluation is closer to reality than direct CJM tracking in this case.

In the given example ensemble voting, binning of target and dimension reduction techniques are involved. Two groups of metrics are considered. Features and target are normalized to the [0,1] interval. Relatively small data with daily sampling is used: four years of recording. Web metrics are parsed from analytical platform through Python API. Sales data is received from CRM. Correlation matrix shows that there is no significant correlation between converted Offers and Web metrics: linear regression can't be applied.

Ensemble of nonlinear voting estimators is used: KNeighborsClassifier, Decision Tree, Ada Boost, Gradient Boosting, Support Vector Classifier, Naive Bayes and Multi-layer Perceptron with three hidden layers. Hyper tuning is applied to KNeighborsClassifier only. Linear {high bias} and nonlinear {high variance} models are considered. Models are comparable in relation to accuracy/scattering. Smaller scattering means higher stability. At first we use all metrics as input features. It gives the 0.11 scattering of model accuracy and accuracy median of 96%:

Second experiment assumes consequent throwing away metrics to decrease dimension. The metric is thrown if its removal gives minimal draw-down of accuracy. Finally one metric is left — number of sessions. In this case accuracy is still 96%, but scattering of ensemble models is lower: 0.01.

Conclusion: no additional information is given by other features. Traffic has highest importance for prediction and corresponds to more stable prediction model. The proposed method may be generalized in decision making when non-linearity is inevitable.

Some peace of Jupyter code is given here.

Thanks for Karma