Machine Learning and Theory of Constraints / Хабр

Backlog prioritization requires simplification and weighting of tasks. Each one belongs to strategy like ads acquisition or CRO. We may consider turnover, operational costs, other metrics as input; profit margin, ROI — as output in case of retail. The perfect goal is to find 20/80 solution and focus resources on a single strategy at a time. Metrics tied to strategies gives the dimension of model. Sometimes unit economy relations are violated because of non-linearity. In practice it means low/insignificant correlation and poor regression. Example: it is impossible to separate acquisition and conversion — the quantity of acquisition affect its quality and vice versa. Decomposition of tasks/strategies assumes linear decomposition of nonlinear system. Besides nonlinear statistical evaluation of strategies is required when CJM can't be tracked or online/offline channels can't be separated.

Should we sacrifice accuracy to simplify our model? No. Large number of features may result in unstable prediction and overfit. Dimension can be reduced with increase of accuracy. Complex system is described by small number of parameters due to the nonlinear tie-up. Non-linearity means simplification, not complication. This idea was remarked by Eliyahu Goldratt – founder of TOC. If acquisition, conversion are closely connected – than linear decomposition is not possible. We may select a single parameter (acquisition or conversion) — an optimum of efficiency/cost.

Machine learning (ML) models open a window for realistic dimension reduction. It gives an outlook how to get the most effective strategy. An unambiguous {strategy<=>metric} mapping simplifies the task. ML combines feature weighting with realistic nonlinear (!) outputs like logistic function or neural network. The approach is based on practical question – what set of metrics/strategies is enough to predict the business goal with acceptable accuracy like 90%? Strategy (S1) efficiency is supposed to be a function of its metric (M1) prediction power.

We have the iterative process. First: analyze the influence of a full group of metrics. Next iteration includes throwing away one of N metrics with minimum affect on prediction power. N combinations have to be tested to get the optimum in the first iteration. Part of mixed historical data is used to get prediction for the out-of-sample part of data. Iterative process goes on until the acceptable threshold of accuracy is reached. The same (!) ML model is used in all iterations. According to the TOC theory there should be a single constraint in each moment that limits business most. Therefore iterative process should be stopped at N=1. N! is the complexity of algo with N=1 stop condition.

It seems that ML requires extra Big Data. However we may divide the target range into the prediction intervals: ROI=(10%-20%),(20%-30%), e.t.c. Less intervals — less records/data is needed to apply ML. If accuracy threshold is reached before N=1 there are 2 ways. First: weights can be required for N constraints/metrics. Second: less intervals and rougher binarization. The example of web strategy evaluation is given here. Some peace of Jupyter code is given here. If dimension is sufficiently reduced, online/offline simple and stable seperation is possible. We know online weight and target.

In this case the target (profit margin,offers,ROI) can be represented in the following way:
target=weight x (online_metric) + const

Averaging <> of both parts gives the required relation:
online/offline=(weight x <online_metric>)/const

Thanks for Karma