• How to display a gazillion of metrics and keep your sanity

      Large scale equals distributed. Distributed equals inevitable complexity. Complexity at runtime equals extensive monitoring. At Hazelcast, doing distributed systems well is our bread and butter. It means we have no choice but to be huge fans of collecting all kinds of metrics to stay on guard of the data our users trust us with.

      In Management Center 4.2020.08, we drastically changed the model of how we transfer the metric data from the cluster members to the Management Center, how we store it, and how we display it. In this post, we are going to talk about the latter bit of the triad.

      We will discuss what to do when you want to display all the data at once, but your users have a limited number of monitors and only one pair of eyes. We will speculate about what users actually want to see when they look at a chart of a monitoring web app. We will go over different approaches to filter the data, and how an average, a median, and a definite integral play their key roles.

      Cross-posted from hazelcast.com

      Read more →
    • How to find an English teacher. Part 2


        This is a continuation of story about using Data Science for finding an English teacher. If you have not read it yet - there is an opportunity to become familiar with it

        Briefly  -  we had information about language teachers and tried to apply some basic ideas using pandas and our expectations. Unfortunately we got stuck on the third step, because there is not enough information for resolving our the last requirements  -  we need not more 3 candidates at the end.

        It is an approach based on my own experience and can be unsuitable to your point of view, ideas, or principles.
        Read more →
      • How to find an English teacher. Part 1

          In the modern world, here and there ideas are arising about using data science for an extra benefit. For instance, Google can use a history of watched videos for providing recommendations about new ones. Online shops are using a recommendation system for increasing your receipt. However… if companies use the data for their benefit, could we do the same for own needs such as looking an online English teacher?


          It is an approach based on my own experience and can be unsuitable to your point of view, ideas, or principles.

          Read more →
        • Kibana Tips & Tricks: How to view events in Discover mode

          • Tutorial

          Hi Habrausers!

          As you may know Kibana is a visualization instrument, part of ELK (Elastic, Logstash, Kibana) stack. With the help of Kibana you may analyze and visualize your data, build different charts and combine them on the dashboard to present data in the most beautiful way.
          People who use Kibana in our company have different background — some of them are technical who process data, some are managers who simply want to monitor some KPIs. And all have various questions. In spite of Kibana is rather popular in IT companies, there are not many articles or courses about it. To fill the gap I have created Kibana Tips & Tricks — weekly letters with frequently asked questions or themes. Such letters help our users to become more familiar with Kibana. There are no secrets — just detailed description of how you may work with your data.
          I would like to share the first part of 'Kibana Tips & Tricks' with you — series of simple how-to articles for people who would like to know more about data analysis and visualization in Kibana. Today we will see how to view events in Kibana.
          Read more →
        • COVID YAAA! or Yet Another Analyze Attempt


            Hello, Habr!

            About a month ago, I had a feeling of constant anxiety. I began to eat poorly, sleep even worse, and constantly read to a ton of news about the pandemic. Based on them, the coronavirus either captured, or liberated our planet, was either a conspiracy of world governments, or the vengeance of the pangolin, the virus either threatened everyone at once, or personally me and my sleeping cat…

            Hundreds of articles, social media posts, youtube-telegram-instagram-tik-tok (yes, I sin) content of varying degrees of content quality did not lead me to anything but an even greater sense of anxiety.

            But one day I bought buckwheat decided to end it all. As soon as possible!

            What did you do?
          • Habr — best articles, authors and statistics 2019

              2019 is coming to an end, and it's Christmas soon. It is also the time to grab all data and collect statistics and a rating of the most interesting Habr's articles for this period.

              In this post the best articles and best Habr authors 2019 will be presented, I also will show some statistical graphs that I find interesting or unusual.

              Let's get started.
              Read more →
            • Machine Learning for your flat hunt. Part 2

                Have you thought about the influence of the nearest metro to the price of your flat? 
                What about several kindergartens around your apartment? Are you ready to plunge in the world of geo-spatial data?

                The world provides so much information…

                Read more →
              • Ads
                AdBlock has stolen the banner, but banners are not teeth — they will be back

              • Keyword Tree: graph analysis for semantic extraction


                  This post is a small abstract of full-scaled research focused on keyword recognition. Technique of semantics extraction was initially applied in field of social media research of depressive patterns. Here I focus on NLP and math aspects without psychological interpretation. It is clear that analysis of single word frequencies is not enough. Multiple random mixing of collection does not affect the relative frequency but destroys information totally — bag of words effect. We need more accurate approach for the mining of semantics attractors.

                  Read more →
                • Automatic respiratory organ segmentation

                    Manual lung segmentation takes about 10 minutes and it requires a certain skill to get the same high-quality result as with automatic segmentation. Automatic segmentation takes about 15 seconds.

                    I assumed that without a neural network it would be possible to get an accuracy of no more than 70%. I also assumed, that morphological operations are only the preparation of an image for more complex algorithms. But as a result of processing of those, although few, 40 samples of tomographic data on hand, the algorithm segmented the lungs without errors. Moreover, after testing in the first five cases, the algorithm didn’t change significantly and correctly worked on the other 35 studies without changing the settings.

                    Also, neural networks have a disadvantage — for their training we need hundreds of training samples of lungs, which need to be marked up manually.

                    Read more →
                  • Google News and Leo Tolstoy: visualizing Word2Vec word embeddings using t-SNE

                      Everyone uniquely perceives texts, regardless of whether this person reads news on the Internet or world-known classic novels. This also applies to a variety of algorithms and machine learning techniques, which understand texts in a more mathematical way, namely, using high-dimensional vector space.

                      This article is devoted to visualizing high-dimensional Word2Vec word embeddings using t-SNE. The visualization can be useful to understand how Word2Vec works and how to interpret relations between vectors captured from your texts before using them in neural networks or other machine learning algorithms. As training data, we will use articles from Google News and classical literary works by Leo Tolstoy, the Russian writer who is regarded as one of the greatest authors of all time.

                      We go through the brief overview of t-SNE algorithm, then move to word embeddings calculation using Word2Vec, and finally, proceed to word vectors visualization with t-SNE in 2D and 3D space. We will write our scripts in Python using Jupyter Notebook.

                      Read more →