Statistics and monitoring of PHP scripts in real time. ClickHouse and Grafana go to Pinba for help

  • Tutorial
In this article I will explain how to use pinba with clickhouse and grafana instead of pinba_engine and pinboard.

On the php project pinba is probably the only reliable way to understand what is happening with performance. But usually people start to use pinba only when problems are already observed and it isn't clear where to look in.

Often developers have no idea how many RPS each script has. So they begin to optimize starting from places that seem to have problem.

Someone is analyzing the nginx logs, and someone is slow queries in the database.

Of course pinba would not be superfluous, but there are several reasons why it is not on every project.



And the first reason is the installation.

In order to more or less get some kind of profit from pinba using, it is highly desirable to see the metrics not only in the last minutes, but also over a long period of time (from days to months).

For that you need:

  • install extension for php (and you might want to have a module for nginx)
  • compile extensions for mysql
  • install pinboard and configure cron

Because we have a little information about pinba recently, many people think that it only worked on php5 and was gone, but as we will see later, it is not the true.

The first step is the easiest, all you need to do is execute the command:

apt install php-pinba

In the repositories this extension exists up to php 7.3 inclusive and you do not need to compile anything.

After executing the installation command, we immediately get an already working extension that collects and sends the metrics of each script (duration, memory, etc.) by udp at 127.0.0.1:370002 in the format protobuf.

We don't have yet an application for catching and processing these udp packages, but this doesn’t badly affect the speed or stability of your php scripts.

Until recently, only pinba_engine could catch and process these udp-packages. The description of "simple" installation discourages ever again to read it. In long lists of dependencies there are package names and program names and links to another pages with another dependencies. No one has the time or the desire to deal with this crap.

The installation process for pinba2 is not especially easier.

Maybe in feature pinba10 can be installed with one or two commands and without reading a bunch of stuff to figure out how to do it, but for now it’s not like that.

Installing of pinba_engine is only half the story. After all, without pinboard you will have limited data for the last few minutes only. It's good that pinboard is pretty simple in an installation.

But all the metrics from php already go to the udp port in the protobuf format and all that is needed is to write an application that can catch udp-packages and put its in some kind of storage? Apparently, those developers who thought so they created own applications and some of which published its on the githab.

Below is a review of four open-source projects that store metrics in storage, from which this data is easily obtained and visualized by the grafana.

olegfedoseev/pinba-influxdb (November 2017)


udp server on golang, which saves metrics in OpenTSDB. Perhaps if you already use OpenTSDB on your project, then such a solution would suit you.

olegfedoseev/pinba-influxdb (June 2018)


udp server on golang, from the same github user, which this time saves metrics in InfluxDB. Many projects already use InfluxDB for monitoring, so this solution may be a good for them.

pros:

  • Influx allows to aggregate the resulting metrics and remove the original after a specified time.

cons:

  • this solution does not save information for timers.
  • InfluxDB will save the addresses of the pages as tags and if you have a lot of unique addresses of pages, it will increase consumption of RAM. From a certain point, it will "start use a lot of memory". (source)

ClickHouse-Ninja/Proton (January 2019)


udp server on golang, which saves metrics in ClickHouse. This is my friend's application. After using I started to work on my own application for pinba with clickhouse.

pros:

  • The clickhouse is ideal for such tasks, it allows you to compress the data so that you can store all the raw data even without aggregations
  • if you need, you can easily aggregate the resulting metrics
  • ready template for grafana
  • saves information for timers

cons:

  • Not invented here
  • there is no config for the name of the database and tables, for the address and port of the
    server.
  • other little things that flow from the first minus

pinba-server/pinba-server (April 2019)


udp server on php, which saves metrics in ClickHouse. This is my application, which is the result of my RND of pinba, ClickHouse and protobuf. I wrote “proof of concept”, which unexpectedly for me did not consume significant resources (30 MB of RAM and less than 1% of one of the eight processor cores), so I decided to share it with the people.

The advantages are the same as in the previous solution, I also used the usual names from the original pinba_engine. I also added a config that allows you to run several pinbasver instances for saving metrics to different tables — this is useful if you want to collect measurements not only from php, but also from nginx.

Cons — «Not invented here» and those little things that do not suit you personally, but my solution is very simple and consists of only about 100 lines of code, so any php-developer can change anything in a couple of minutes that he does not like.

How it works

It's listening udp-port 30002. All incoming packets are decoded according to the protobuf-scheme and are aggregated. Once a minute the batch of packages is inserted into the clickhouse in the pinba.requests table. (all settings are configured in the config)

About ClickHouse

Clickhouse supports different data storage engines. The most commonly used is MergeTree.

If at some point you decide to store aggregated data for all time, and raw data only for the latter, you can create a materialized view with grouping and periodically clean the main table pinba.requests, while all data remain in the materialized view. Moreover, you can specify «engine = Null» for the pinba.requests table, so the raw data will not be saved to disk at all and at the same time it will still be included in the materialized view. I use this scheme for nginx metrics, because on nginx I have 50 times more requests than on php.

You have come a long way, so there will be a detailed description of the installation and configuration of my solution and everything you need. The entire installation process is described for Ubuntu 18.04 LTS and Centos 7, on other distributions and versions the process may differ slightly.

Installation


I've put all the necessary commands to Dockerfile for reproducibility of instructions. Only problems will be described below.

php-pinba

After installation, make sure you have uncommented all the options in the /etc/php/7.2/fpm/conf.d/20-pinba.ini file. In some distributions (for example, centos) its can be commented out.

extension = pinba.so
pinba.enabled = 1
pinba.server = 127.0.0.1:30002

clickhouse

During installation, clickhouse will ask you to set a password for the default user. By default, this user is available from all ip. So if you do not have a firewall on your server please set a password. This can also be done after installation in the /etc/clickhouse-server/users.xml file.

Also note that clickhouse uses several ports, including 9000. This port is also used for php-fpm in some distributions (for example, centos). If you already use this port, you can change it to another one in the /etc/clickhouse-server/config.xml file.

grafana with clickhouse plugin

After installing the grafana, use username «admin» and password «admin». When you first log in, the grafana will ask you to set a new password.

Next, go to the menu "+" -> import and specify the number of dashboards to import 10011. I prepared this dashboard so you don't need to do it yourself again.

Grafana supports ClickHouse by a third-party plugin, but grafana doesn't support alerts for third-party plugins (the ticket already exists several years).

pinba-server

Installing protobuf and libevent is optional, but it improves pinba-server performance. If you install a pinba-server in a folder other than /opt, then you will also need to change systemd script file.

pinba-module under nginx

To compile the module, you need the source code of the same version of nginx that is already installed on your server, as well as the same compilation options, otherwise the assembly will be successful, but when the module is connected, you will error «the module is binary incompatible». Compilation options can be viewed by using «nginx -V» command.

Lifehacks

All my sites work only on https. So I use the field «schema» for separating the web/console.
In the web scripts I use:

if (ini_get('pinba.enabled')) {
    pinba_schema_set('web');
}

and in the console (for example, cron-scripts):

if (ini_get ('pinba.enabled')) {
    pinba_schema_set('console');
}

In my dashboard in grafana there is a switch web/console for viewing statistics separately.
You can also send your tags to the pinba, for example:

pinba_tag_set('country', $countryCode); 

That's all.

You can also read russian version.

Please answer for surveys under the article and support me on Reddit.

Only registered users can participate in poll. Log in, please.

What OS on the server are you using?

What version of php on the server are you using?

Have you ever used pinba?

What version of pinba server would you like to try?

Support the author
Share post

Comments 0

Only users with full accounts can post comments. Log in, please.