polarnik Aug 23 2016 at 04:18

Выбираем html-парсер для Apache.JMeter

11 min

9.8K

HTML * IT systems testing * Web services testing *

+10

Comments 5

gmaker Aug 23 2016 at 06:03

понравилось:

Выводы

Особой практической ценности в статье нет.

polarnik Aug 23 2016 at 12:31

С одной стороны много пропущенных ссылок, с другой стороны среди пропущенных много рекламных. С одной стороны парсер плоховато работает, с другой стороны он удобен и прост. С одной стороны не вся статика грузится, с другой стороны чаще всего узким местом бывает не фротнэнд, а база данных и бекэнд.

Хотелось протестировать — протестировал и счастлив. Вас порадовал, тоже хорошо.

Deosis Aug 23 2016 at 08:33

> парсеры работают почти одинаково, а значит можно применять любой;
Работают почти одинаково плохо. Какой смысл выбирать из молотков тот, что бьет по пальцу менее болезненно?
Хотелось бы увидеть, какие ссылки парсеры игнорируют?
Если удастся выделить класс таких ссылок, то можно улучшить работу парсеров.

polarnik Aug 23 2016 at 09:12

Здравствуйте, спасибо за вопрос. Логи доступны, узнать ответ несложно.

Так для сайта habrahabr.ru пропущена в основном реклама и статистика: https://docs.google.com/spreadsheets/d/1FqgnkRm4gYrWUN9bBCEPvVo0mdi5lQl_a3mv1wY7tko/edit?usp=sharing

philmdot Aug 23 2016 at 19:08

Hello,
First thanks for this great comparison of Parsers! I don't read russian so I read it through translation tools (I hope I didn't misunderstand things)
As a JMeter commiter I wanted to clarify some notes from the release notes of 3.0.

What has been improved in 3.0 is:

the connection simulation
the throughput of resources downloads
The parsing of CSS resources which didn't exist before

We never pretended that we downloaded what a browser does.
We always write "JMeter is not a browser".

We don't download any JS loaded resources and will never do because we're not a browser.
Besides, from a Load Testing perspective, all resources that hit 3rd party servers (yandex, google analytics, ....) are not useful, we only download resources that match a regular expression that you enter.

Now your results are very interesting and I highly encourage you to report:

To Jodd (http://jodd.org/) a bug on the difference between downloaded resources compared to JSOUP
To report to JMeter the recursivity issue that you faced, with an example

Of course any patch improving JMeter is very welcome.

Regards
Philippe M.
philmdot