• Keyword Tree: graph analysis for semantic extraction


      This post is a small abstract of full-scaled research focused on keyword recognition. Technique of semantics extraction was initially applied in field of social media research of depressive patterns. Here I focus on NLP and math aspects without psychological interpretation. It is clear that analysis of single word frequencies is not enough. Multiple random mixing of collection does not affect the relative frequency but destroys information totally — bag of words effect. We need more accurate approach for the mining of semantics attractors.

      Read more →
    • A/B test is not enough

        A/B test is not enough

        There is a common opinion that A/B test is a universal, half-automatic tool that always helps to increase conversion, loyalty and UX. However misinterpretation of results or wrong sampling leads to the loss of loyal audience and decrease of margin. Why? A/B is based on the basic assumption that this sample is homogeneous and representative, scalability of results. In reality, the audience is heterogeneous — recall the “20/80” distribution for income. Heterogeneity means that sensitivity to A/B varies significantly within the sample.
        Read more →
      • Machine Learning and Theory of Constraints

          Backlog prioritization requires simplification and weighting of tasks. Each one belongs to strategy like ads acquisition or CRO. We may consider turnover, operational costs, other metrics as input; profit margin, ROI — as output in case of retail. The perfect goal is to find 20/80 solution and focus resources on a single strategy at a time. Metrics tied to strategies gives the dimension of model. Sometimes unit economy relations are violated because of non-linearity. In practice it means low/insignificant correlation and poor regression. Example: it is impossible to separate acquisition and conversion — the quantity of acquisition affect its quality and vice versa. Decomposition of tasks/strategies assumes linear decomposition of nonlinear system. Besides nonlinear statistical evaluation of strategies is required when CJM can't be tracked or online/offline channels can't be separated.
          Read more →
        • Marketing with ML decision making

          Backlog prioritization leads to the choice between strategies. Each one has its metrics. There is a requirement to choose the most important one. ML scoring is a solution when non linearity exists and economy is nonlinear. See introduction here. Two groups are considered. First (I) corresponds to web conversion {bounce rate, micro conversion, time, depth}. Second (II) corresponds to attraction of new visitors from organic channel {visits, viewers, views}. The target function is a number of commercial offers per day. The task is to reduce the dimension to get the optimal simple strategy. In this case online/offline B2B channels can't be separated: market is thin and new customers may have some information about 'the brand' from both channels. Therefore statistical evaluation is closer to reality than direct CJM tracking in this case.
          Read more →