Разбор перформансных задач с JBreak (часть 3)

    Публикую предпоследнюю часть разбора с третьей задачей. До этого выходил разбор первой задачи и второй задачи.

    Код к третьей задаче:

        public static double compute(
                double x1, double y1, double z1,
                double x2, double y2, double z2) {
            double x = y1 * z2 - z1 * y2;
            double y = z1 * x2 - x1 * z2;
            double z = x1 * y2 - y1 * x2;
            return x * x + y * y + z * z;
        }
    
        public static double compute(
                double x1, double y1, double z1,
                double x2, double y2, double z2) {
            Vector v1 = new Vector(x1, y1, z1);
            Vector v2 = new Vector(x2, y2, z2);
            return v1.crossProduct(v2).squared();
        }
    
        public final static class Vector {
            private final double x, y, z;
    
            public Vector(double x, double y, double z) {
                this.x = x; this.y = y; this.z = z;
            }
    
            public double squared() {
                return x * x + y * y + z * z;
            }
    
            public Vector crossProduct(Vector v) {
                return new Vector(
                        y * v.z - z * v.y,
                        z * v.x - x * v.z,
                        x * v.y - y * v.x);
            }
        }

    Условие (упрощённо):
    Определить, какие методы быстрые, а какие — медленные (JRE 1.8.0_161).

    В обоих случаях реализована одна и та же математика: сперва вычисляем векторное произведение двух векторов в трёхмерном пространстве, затем для получившегося вектора считаем квадрат его длины (или скалярное произведение вектора самого на себя, кому как больше нравится).

    Очевидный неправильный ответ


    Первый способ быстрее, так как не тратится время на создание объектов Vector.
    Сюда же можно добавить рассуждения о том, что первый алгоритм, в отличие от второго, garbage-free, поэтому нет дополнительной нагрузки на работу GC.

    Решение


    Традиционно начнём с небольшого бенчмарка (полный код доступен на github):

    @Fork(value = 5, warmups = 0)
    @Warmup(iterations = 5, time = 1_000, timeUnit = TimeUnit.MILLISECONDS)
    @Measurement(iterations = 10, time = 1_000, timeUnit = TimeUnit.MILLISECONDS)
    @OutputTimeUnit(value = TimeUnit.NANOSECONDS)
    @BenchmarkMode(Mode.AverageTime)
    @State(Scope.Benchmark)
    public class ComputationOnlyBenchmark {
        private double x1, y1, z1;
        private double x2, y2, z2;
    
        @Setup(value = Level.Iteration)
        public void setup() {
            x1 = 123.4;
            y1 = 234.5;
            z1 = 345.6;
            x2 = 456.7;
            y2 = 567.8;
            z2 = 678.9;
        }
    
        @Benchmark
        public void computeWithRawScalars(Blackhole bh) {
            bh.consume(VectorAlgebra.computeWithRawScalars(x1, y1, z1, x2, y2, z2));
        }
    
        @Benchmark
        public void computeWithVectors(Blackhole bh) {
            bh.consume(VectorAlgebra.computeWithVectors(x1, y1, z1, x2, y2, z2));
        }
    }

    Полученный результат довольно-таки интересен:

    Benchmark                                Mode  Cnt  Score   Error  Units
    ComputationOnly...computeWithRawScalars  avgt   50  4,783 ± 0,031  ns/op
    ComputationOnly...computeWithVectors     avgt   50  4,785 ± 0,040  ns/op

    Результат бенчмарка целиком
    # JMH version: 1.20
    # VM version: JDK 1.8.0_161, VM 25.161-b12
    # VM invoker: C:\Program Files\Java\jre1.8.0_161\bin\java.exe
    # VM options: <none>
    # Warmup: 5 iterations, 1000 ms each
    # Measurement: 10 iterations, 1000 ms each
    # Timeout: 10 min per iteration
    # Threads: 1 thread, will synchronize iterations
    # Benchmark mode: Average time, time/op
    # Benchmark: ru.gnkoshelev.jbreak2018.perf_tests.vector.ComputationOnlyBenchmark.computeWithRawScalars
    
    # Run progress: 0,00% complete, ETA 00:02:30
    # Fork: 1 of 5
    # Warmup Iteration   1: 5,236 ns/op
    # Warmup Iteration   2: 4,710 ns/op
    # Warmup Iteration   3: 4,779 ns/op
    # Warmup Iteration   4: 5,481 ns/op
    # Warmup Iteration   5: 4,908 ns/op
    Iteration   1: 4,785 ns/op
    Iteration   2: 4,886 ns/op
    Iteration   3: 4,888 ns/op
    Iteration   4: 4,739 ns/op
    Iteration   5: 4,738 ns/op
    Iteration   6: 4,749 ns/op
    Iteration   7: 4,737 ns/op
    Iteration   8: 4,739 ns/op
    Iteration   9: 4,749 ns/op
    Iteration  10: 4,745 ns/op
    
    # Run progress: 10,00% complete, ETA 00:02:17
    # Fork: 2 of 5
    # Warmup Iteration   1: 5,108 ns/op
    # Warmup Iteration   2: 4,692 ns/op
    # Warmup Iteration   3: 4,746 ns/op
    # Warmup Iteration   4: 4,738 ns/op
    # Warmup Iteration   5: 4,750 ns/op
    Iteration   1: 4,888 ns/op
    Iteration   2: 4,753 ns/op
    Iteration   3: 4,741 ns/op
    Iteration   4: 4,734 ns/op
    Iteration   5: 4,741 ns/op
    Iteration   6: 4,789 ns/op
    Iteration   7: 4,880 ns/op
    Iteration   8: 4,771 ns/op
    Iteration   9: 4,746 ns/op
    Iteration  10: 4,742 ns/op
    
    # Run progress: 20,00% complete, ETA 00:02:02
    # Fork: 3 of 5
    # Warmup Iteration   1: 5,129 ns/op
    # Warmup Iteration   2: 4,733 ns/op
    # Warmup Iteration   3: 4,760 ns/op
    # Warmup Iteration   4: 5,021 ns/op
    # Warmup Iteration   5: 4,834 ns/op
    Iteration   1: 4,787 ns/op
    Iteration   2: 4,818 ns/op
    Iteration   3: 4,773 ns/op
    Iteration   4: 4,735 ns/op
    Iteration   5: 4,742 ns/op
    Iteration   6: 4,758 ns/op
    Iteration   7: 4,749 ns/op
    Iteration   8: 4,741 ns/op
    Iteration   9: 4,794 ns/op
    Iteration  10: 4,937 ns/op
    
    # Run progress: 30,00% complete, ETA 00:01:47
    # Fork: 4 of 5
    # Warmup Iteration   1: 5,158 ns/op
    # Warmup Iteration   2: 4,889 ns/op
    # Warmup Iteration   3: 4,823 ns/op
    # Warmup Iteration   4: 4,994 ns/op
    # Warmup Iteration   5: 4,847 ns/op
    Iteration   1: 4,779 ns/op
    Iteration   2: 4,818 ns/op
    Iteration   3: 4,804 ns/op
    Iteration   4: 4,900 ns/op
    Iteration   5: 5,045 ns/op
    Iteration   6: 4,841 ns/op
    Iteration   7: 4,773 ns/op
    Iteration   8: 4,738 ns/op
    Iteration   9: 4,770 ns/op
    Iteration  10: 4,745 ns/op
    
    # Run progress: 40,00% complete, ETA 00:01:31
    # Fork: 5 of 5
    # Warmup Iteration   1: 5,215 ns/op
    # Warmup Iteration   2: 4,731 ns/op
    # Warmup Iteration   3: 5,088 ns/op
    # Warmup Iteration   4: 4,776 ns/op
    # Warmup Iteration   5: 4,769 ns/op
    Iteration   1: 4,750 ns/op
    Iteration   2: 4,775 ns/op
    Iteration   3: 4,759 ns/op
    Iteration   4: 4,759 ns/op
    Iteration   5: 4,731 ns/op
    Iteration   6: 4,737 ns/op
    Iteration   7: 4,764 ns/op
    Iteration   8: 4,751 ns/op
    Iteration   9: 4,755 ns/op
    Iteration  10: 4,763 ns/op
    
    
    Result "ru.gnkoshelev.jbreak2018.perf_tests.vector.ComputationOnlyBenchmark.computeWithRawScalars":
      4,783 ±(99.9%) 0,031 ns/op [Average]
      (min, avg, max) = (4,731, 4,783, 5,045), stdev = 0,063
      CI (99.9%): [4,751, 4,814] (assumes normal distribution)
    
    
    # JMH version: 1.20
    # VM version: JDK 1.8.0_161, VM 25.161-b12
    # VM invoker: C:\Program Files\Java\jre1.8.0_161\bin\java.exe
    # VM options: <none>
    # Warmup: 5 iterations, 1000 ms each
    # Measurement: 10 iterations, 1000 ms each
    # Timeout: 10 min per iteration
    # Threads: 1 thread, will synchronize iterations
    # Benchmark mode: Average time, time/op
    # Benchmark: ru.gnkoshelev.jbreak2018.perf_tests.vector.ComputationOnlyBenchmark.computeWithVectors
    
    # Run progress: 50,00% complete, ETA 00:01:16
    # Fork: 1 of 5
    # Warmup Iteration   1: 5,246 ns/op
    # Warmup Iteration   2: 4,695 ns/op
    # Warmup Iteration   3: 4,713 ns/op
    # Warmup Iteration   4: 4,693 ns/op
    # Warmup Iteration   5: 4,705 ns/op
    Iteration   1: 4,717 ns/op
    Iteration   2: 4,705 ns/op
    Iteration   3: 4,718 ns/op
    Iteration   4: 4,720 ns/op
    Iteration   5: 4,697 ns/op
    Iteration   6: 4,739 ns/op
    Iteration   7: 4,714 ns/op
    Iteration   8: 4,718 ns/op
    Iteration   9: 4,711 ns/op
    Iteration  10: 4,818 ns/op
    
    # Run progress: 60,00% complete, ETA 00:01:01
    # Fork: 2 of 5
    # Warmup Iteration   1: 5,253 ns/op
    # Warmup Iteration   2: 4,827 ns/op
    # Warmup Iteration   3: 4,850 ns/op
    # Warmup Iteration   4: 4,745 ns/op
    # Warmup Iteration   5: 4,912 ns/op
    Iteration   1: 4,889 ns/op
    Iteration   2: 4,732 ns/op
    Iteration   3: 5,133 ns/op
    Iteration   4: 5,010 ns/op
    Iteration   5: 4,820 ns/op
    Iteration   6: 4,748 ns/op
    Iteration   7: 4,762 ns/op
    Iteration   8: 4,831 ns/op
    Iteration   9: 4,797 ns/op
    Iteration  10: 4,826 ns/op
    
    # Run progress: 70,00% complete, ETA 00:00:45
    # Fork: 3 of 5
    # Warmup Iteration   1: 5,227 ns/op
    # Warmup Iteration   2: 4,750 ns/op
    # Warmup Iteration   3: 4,811 ns/op
    # Warmup Iteration   4: 4,769 ns/op
    # Warmup Iteration   5: 4,792 ns/op
    Iteration   1: 4,783 ns/op
    Iteration   2: 4,780 ns/op
    Iteration   3: 4,754 ns/op
    Iteration   4: 4,747 ns/op
    Iteration   5: 4,753 ns/op
    Iteration   6: 4,807 ns/op
    Iteration   7: 4,773 ns/op
    Iteration   8: 4,786 ns/op
    Iteration   9: 4,770 ns/op
    Iteration  10: 4,772 ns/op
    
    # Run progress: 80,00% complete, ETA 00:00:30
    # Fork: 4 of 5
    # Warmup Iteration   1: 5,190 ns/op
    # Warmup Iteration   2: 4,780 ns/op
    # Warmup Iteration   3: 4,704 ns/op
    # Warmup Iteration   4: 4,762 ns/op
    # Warmup Iteration   5: 4,725 ns/op
    Iteration   1: 4,745 ns/op
    Iteration   2: 4,753 ns/op
    Iteration   3: 4,775 ns/op
    Iteration   4: 4,804 ns/op
    Iteration   5: 4,771 ns/op
    Iteration   6: 4,737 ns/op
    Iteration   7: 4,875 ns/op
    Iteration   8: 4,834 ns/op
    Iteration   9: 4,833 ns/op
    Iteration  10: 5,000 ns/op
    
    # Run progress: 90,00% complete, ETA 00:00:15
    # Fork: 5 of 5
    # Warmup Iteration   1: 5,220 ns/op
    # Warmup Iteration   2: 4,766 ns/op
    # Warmup Iteration   3: 5,140 ns/op
    # Warmup Iteration   4: 4,835 ns/op
    # Warmup Iteration   5: 4,963 ns/op
    Iteration   1: 4,743 ns/op
    Iteration   2: 4,738 ns/op
    Iteration   3: 4,780 ns/op
    Iteration   4: 4,749 ns/op
    Iteration   5: 4,716 ns/op
    Iteration   6: 4,791 ns/op
    Iteration   7: 4,759 ns/op
    Iteration   8: 4,772 ns/op
    Iteration   9: 4,797 ns/op
    Iteration  10: 4,768 ns/op
    
    
    Result "ru.gnkoshelev.jbreak2018.perf_tests.vector.ComputationOnlyBenchmark.computeWithVectors":
      4,785 ±(99.9%) 0,040 ns/op [Average]
      (min, avg, max) = (4,697, 4,785, 5,133), stdev = 0,081
      CI (99.9%): [4,746, 4,825] (assumes normal distribution)
    
    
    # Run complete. Total time: 00:02:33
    
    Benchmark                                       Mode  Cnt  Score   Error  Units
    ComputationOnlyBenchmark.computeWithRawScalars  avgt   50  4,783 ± 0,031  ns/op
    ComputationOnlyBenchmark.computeWithVectors     avgt   50  4,785 ± 0,040  ns/op

    В коде нашего примера помимо арифметических действий есть создание трёх объектов Vector (а это вызов двух конструкторов Vector и Object на каждое создание) и вызов двух методов Vector.crossProduct() и Vector.squared(), но на производительность метода VectorAlgebra.computeWithVectors() это никак не сказалось. Ну, не может же создание этих объектов быть таким дешёвым и пропасть в погрешности измерения!

    Справедливо можно заметить, что JIT без проблем умеет инлайнить простые методы, что он и сделал:

    @ 25   ru.gnkoshelev.jbreak2018.perf_tests.vector.VectorAlgebra::computeWithVectors (39 bytes)        3  inline (hot)
      @ 8   ru.gnkoshelev.jbreak2018.perf_tests.vector.VectorAlgebra$Vector::<init> (21 bytes)   inline (hot)
        @ 1   java.lang.Object::<init> (1 bytes)   inline (hot)
      @ 23   ru.gnkoshelev.jbreak2018.perf_tests.vector.VectorAlgebra$Vector::<init> (21 bytes)   inline (hot)
        @ 1   java.lang.Object::<init> (1 bytes)   inline (hot)
      @ 32   ru.gnkoshelev.jbreak2018.perf_tests.vector.VectorAlgebra$Vector::crossProduct (65 bytes)   inline (hot)
        @ 61   ru.gnkoshelev.jbreak2018.perf_tests.vector.VectorAlgebra$Vector::<init> (21 bytes)   inline (hot)
          @ 1   java.lang.Object::<init> (1 bytes)   inline (hot)
      @ 35   ru.gnkoshelev.jbreak2018.perf_tests.vector.VectorAlgebra$Vector::squared (30 bytes)   inline (hot)

    Но inlining лишь встраивает тело вызываемой функции внутрь вызываемого кода, то есть исполняемого кода по-прежнему остаётся объективно много.

    Попробуем проверить затраты на создание объектов и оценить оверхед JMH:

    @Fork(value = 5, warmups = 0)
    @Warmup(iterations = 5, time = 1_000, timeUnit = TimeUnit.MILLISECONDS)
    @Measurement(iterations = 10, time = 1_000, timeUnit = TimeUnit.MILLISECONDS)
    @OutputTimeUnit(value = TimeUnit.NANOSECONDS)
    @BenchmarkMode(Mode.AverageTime)
    @State(Scope.Benchmark)
    public class CreateAndConsumeBenchmark {
        private double x1, y1, z1;
        private double x2, y2, z2;
    
        @Setup(value = Level.Iteration)
        public void setup() {
            x1 = 123.4;
            y1 = 234.5;
            z1 = 345.6;
            x2 = 456.7;
            y2 = 567.8;
            z2 = 678.9;
        }
    
        @Benchmark
        public void consumeDouble(Blackhole bh) {
            bh.consume(x1);
        }
    
        @Benchmark
        public void consumeObject(Blackhole bh) {
            bh.consume(new Object());
        }
    
        @Benchmark
        public void createAndConsumeSingleVector(Blackhole bh) {
            bh.consume(new VectorAlgebra.Vector(x1, y1, z1));
        }
    
        @Benchmark
        public void createAndConsumeTwoVectors(Blackhole bh) {
            bh.consume(new VectorAlgebra.Vector(x1, y1, z1));
            bh.consume(new VectorAlgebra.Vector(x2, y2, z2));
        }
    
        @Benchmark
        public void createAndConsumeThreeVectors(Blackhole bh) {
            bh.consume(new VectorAlgebra.Vector(x1, y1, z1));
            bh.consume(new VectorAlgebra.Vector(x2, y2, z2));
            bh.consume(new VectorAlgebra.Vector(x1, y2, z1));
        }
    }

    Ниже результат прогона нашего бенчмарка:

    Benchmark                                        Mode  Cnt   Score   Error  Units
    CreateAndConsume...consumeDouble                 avgt   50   2,762 ± 0,103  ns/op
    CreateAndConsume...consumeObject                 avgt   50   3,084 ± 0,036  ns/op
    CreateAndConsume...createAndConsumeObject        avgt   50   4,233 ± 0,081  ns/op
    CreateAndConsume...createAndConsumeSingleVector  avgt   50   7,010 ± 0,147  ns/op
    CreateAndConsume...createAndConsumeThreeVectors  avgt   50  20,710 ± 0,654  ns/op
    CreateAndConsume...createAndConsumeTwoVectors    avgt   50  13,702 ± 0,276  ns/op

    Результат бенчмарка целиком
    # JMH version: 1.20
    # VM version: JDK 1.8.0_161, VM 25.161-b12
    # VM invoker: C:\Program Files\Java\jre1.8.0_161\bin\java.exe
    # VM options: <none>
    # Warmup: 5 iterations, 1000 ms each
    # Measurement: 10 iterations, 1000 ms each
    # Timeout: 10 min per iteration
    # Threads: 1 thread, will synchronize iterations
    # Benchmark mode: Average time, time/op
    # Benchmark: ru.gnkoshelev.jbreak2018.perf_tests.vector.CreateAndConsumeBenchmark.consumeDouble
    
    # Run progress: 0,00% complete, ETA 00:07:30
    # Fork: 1 of 5
    # Warmup Iteration   1: 2,888 ns/op
    # Warmup Iteration   2: 2,949 ns/op
    # Warmup Iteration   3: 2,716 ns/op
    # Warmup Iteration   4: 2,635 ns/op
    # Warmup Iteration   5: 2,674 ns/op
    Iteration   1: 2,710 ns/op
    Iteration   2: 2,622 ns/op
    Iteration   3: 2,651 ns/op
    Iteration   4: 2,628 ns/op
    Iteration   5: 2,620 ns/op
    Iteration   6: 2,623 ns/op
    Iteration   7: 2,613 ns/op
    Iteration   8: 2,613 ns/op
    Iteration   9: 2,618 ns/op
    Iteration  10: 2,621 ns/op
    
    # Run progress: 3,33% complete, ETA 00:07:24
    # Fork: 2 of 5
    # Warmup Iteration   1: 2,828 ns/op
    # Warmup Iteration   2: 2,762 ns/op
    # Warmup Iteration   3: 2,612 ns/op
    # Warmup Iteration   4: 2,640 ns/op
    # Warmup Iteration   5: 2,620 ns/op
    Iteration   1: 2,665 ns/op
    Iteration   2: 2,637 ns/op
    Iteration   3: 2,622 ns/op
    Iteration   4: 2,623 ns/op
    Iteration   5: 2,654 ns/op
    Iteration   6: 2,628 ns/op
    Iteration   7: 2,634 ns/op
    Iteration   8: 2,658 ns/op
    Iteration   9: 2,630 ns/op
    Iteration  10: 2,626 ns/op
    
    # Run progress: 6,67% complete, ETA 00:07:08
    # Fork: 3 of 5
    # Warmup Iteration   1: 2,886 ns/op
    # Warmup Iteration   2: 2,755 ns/op
    # Warmup Iteration   3: 2,627 ns/op
    # Warmup Iteration   4: 2,662 ns/op
    # Warmup Iteration   5: 2,726 ns/op
    Iteration   1: 2,691 ns/op
    Iteration   2: 2,642 ns/op
    Iteration   3: 2,675 ns/op
    Iteration   4: 2,687 ns/op
    Iteration   5: 2,663 ns/op
    Iteration   6: 2,717 ns/op
    Iteration   7: 2,624 ns/op
    Iteration   8: 2,692 ns/op
    Iteration   9: 2,706 ns/op
    Iteration  10: 2,659 ns/op
    
    # Run progress: 10,00% complete, ETA 00:06:53
    # Fork: 4 of 5
    # Warmup Iteration   1: 3,347 ns/op
    # Warmup Iteration   2: 2,833 ns/op
    # Warmup Iteration   3: 3,462 ns/op
    # Warmup Iteration   4: 3,186 ns/op
    # Warmup Iteration   5: 3,187 ns/op
    Iteration   1: 3,143 ns/op
    Iteration   2: 3,144 ns/op
    Iteration   3: 3,189 ns/op
    Iteration   4: 3,171 ns/op
    Iteration   5: 3,178 ns/op
    Iteration   6: 3,199 ns/op
    Iteration   7: 3,137 ns/op
    Iteration   8: 3,180 ns/op
    Iteration   9: 3,133 ns/op
    Iteration  10: 3,204 ns/op
    
    # Run progress: 13,33% complete, ETA 00:06:37
    # Fork: 5 of 5
    # Warmup Iteration   1: 2,881 ns/op
    # Warmup Iteration   2: 2,792 ns/op
    # Warmup Iteration   3: 2,629 ns/op
    # Warmup Iteration   4: 2,656 ns/op
    # Warmup Iteration   5: 2,709 ns/op
    Iteration   1: 2,630 ns/op
    Iteration   2: 2,731 ns/op
    Iteration   3: 2,674 ns/op
    Iteration   4: 2,781 ns/op
    Iteration   5: 2,667 ns/op
    Iteration   6: 2,736 ns/op
    Iteration   7: 2,657 ns/op
    Iteration   8: 2,737 ns/op
    Iteration   9: 2,692 ns/op
    Iteration  10: 2,659 ns/op
    
    
    Result "ru.gnkoshelev.jbreak2018.perf_tests.vector.CreateAndConsumeBenchmark.consumeDouble":
      2,762 ±(99.9%) 0,103 ns/op [Average]
      (min, avg, max) = (2,613, 2,762, 3,204), stdev = 0,209
      CI (99.9%): [2,659, 2,865] (assumes normal distribution)
    
    
    # JMH version: 1.20
    # VM version: JDK 1.8.0_161, VM 25.161-b12
    # VM invoker: C:\Program Files\Java\jre1.8.0_161\bin\java.exe
    # VM options: <none>
    # Warmup: 5 iterations, 1000 ms each
    # Measurement: 10 iterations, 1000 ms each
    # Timeout: 10 min per iteration
    # Threads: 1 thread, will synchronize iterations
    # Benchmark mode: Average time, time/op
    # Benchmark: ru.gnkoshelev.jbreak2018.perf_tests.vector.CreateAndConsumeBenchmark.consumeObject
    
    # Run progress: 16,67% complete, ETA 00:06:22
    # Fork: 1 of 5
    # Warmup Iteration   1: 3,251 ns/op
    # Warmup Iteration   2: 3,207 ns/op
    # Warmup Iteration   3: 3,434 ns/op
    # Warmup Iteration   4: 3,136 ns/op
    # Warmup Iteration   5: 3,108 ns/op
    Iteration   1: 3,113 ns/op
    Iteration   2: 3,126 ns/op
    Iteration   3: 3,111 ns/op
    Iteration   4: 3,072 ns/op
    Iteration   5: 3,048 ns/op
    Iteration   6: 3,484 ns/op
    Iteration   7: 3,138 ns/op
    Iteration   8: 3,052 ns/op
    Iteration   9: 3,119 ns/op
    Iteration  10: 3,071 ns/op
    
    # Run progress: 20,00% complete, ETA 00:06:07
    # Fork: 2 of 5
    # Warmup Iteration   1: 3,283 ns/op
    # Warmup Iteration   2: 3,129 ns/op
    # Warmup Iteration   3: 3,124 ns/op
    # Warmup Iteration   4: 3,119 ns/op
    # Warmup Iteration   5: 3,092 ns/op
    Iteration   1: 3,123 ns/op
    Iteration   2: 3,101 ns/op
    Iteration   3: 3,121 ns/op
    Iteration   4: 3,112 ns/op
    Iteration   5: 3,141 ns/op
    Iteration   6: 3,142 ns/op
    Iteration   7: 3,045 ns/op
    Iteration   8: 3,090 ns/op
    Iteration   9: 3,145 ns/op
    Iteration  10: 3,094 ns/op
    
    # Run progress: 23,33% complete, ETA 00:05:51
    # Fork: 3 of 5
    # Warmup Iteration   1: 3,232 ns/op
    # Warmup Iteration   2: 3,160 ns/op
    # Warmup Iteration   3: 3,092 ns/op
    # Warmup Iteration   4: 3,090 ns/op
    # Warmup Iteration   5: 3,101 ns/op
    Iteration   1: 3,077 ns/op
    Iteration   2: 3,027 ns/op
    Iteration   3: 3,039 ns/op
    Iteration   4: 3,086 ns/op
    Iteration   5: 3,043 ns/op
    Iteration   6: 3,073 ns/op
    Iteration   7: 3,078 ns/op
    Iteration   8: 3,054 ns/op
    Iteration   9: 3,040 ns/op
    Iteration  10: 3,042 ns/op
    
    # Run progress: 26,67% complete, ETA 00:05:36
    # Fork: 4 of 5
    # Warmup Iteration   1: 3,272 ns/op
    # Warmup Iteration   2: 3,215 ns/op
    # Warmup Iteration   3: 3,053 ns/op
    # Warmup Iteration   4: 3,074 ns/op
    # Warmup Iteration   5: 3,056 ns/op
    Iteration   1: 3,236 ns/op
    Iteration   2: 3,036 ns/op
    Iteration   3: 3,060 ns/op
    Iteration   4: 3,067 ns/op
    Iteration   5: 3,061 ns/op
    Iteration   6: 3,083 ns/op
    Iteration   7: 3,045 ns/op
    Iteration   8: 3,031 ns/op
    Iteration   9: 3,070 ns/op
    Iteration  10: 3,099 ns/op
    
    # Run progress: 30,00% complete, ETA 00:05:21
    # Fork: 5 of 5
    # Warmup Iteration   1: 3,314 ns/op
    # Warmup Iteration   2: 3,134 ns/op
    # Warmup Iteration   3: 3,055 ns/op
    # Warmup Iteration   4: 3,065 ns/op
    # Warmup Iteration   5: 3,058 ns/op
    Iteration   1: 3,064 ns/op
    Iteration   2: 3,048 ns/op
    Iteration   3: 3,022 ns/op
    Iteration   4: 3,034 ns/op
    Iteration   5: 3,049 ns/op
    Iteration   6: 3,008 ns/op
    Iteration   7: 3,022 ns/op
    Iteration   8: 3,053 ns/op
    Iteration   9: 3,047 ns/op
    Iteration  10: 3,036 ns/op
    
    
    Result "ru.gnkoshelev.jbreak2018.perf_tests.vector.CreateAndConsumeBenchmark.consumeObject":
      3,084 ±(99.9%) 0,036 ns/op [Average]
      (min, avg, max) = (3,008, 3,084, 3,484), stdev = 0,072
      CI (99.9%): [3,048, 3,119] (assumes normal distribution)
    
    
    # JMH version: 1.20
    # VM version: JDK 1.8.0_161, VM 25.161-b12
    # VM invoker: C:\Program Files\Java\jre1.8.0_161\bin\java.exe
    # VM options: <none>
    # Warmup: 5 iterations, 1000 ms each
    # Measurement: 10 iterations, 1000 ms each
    # Timeout: 10 min per iteration
    # Threads: 1 thread, will synchronize iterations
    # Benchmark mode: Average time, time/op
    # Benchmark: ru.gnkoshelev.jbreak2018.perf_tests.vector.CreateAndConsumeBenchmark.createAndConsumeObject
    
    # Run progress: 33,33% complete, ETA 00:05:05
    # Fork: 1 of 5
    # Warmup Iteration   1: 6,357 ns/op
    # Warmup Iteration   2: 6,220 ns/op
    # Warmup Iteration   3: 4,057 ns/op
    # Warmup Iteration   4: 3,996 ns/op
    # Warmup Iteration   5: 4,095 ns/op
    Iteration   1: 4,189 ns/op
    Iteration   2: 4,137 ns/op
    Iteration   3: 4,176 ns/op
    Iteration   4: 4,212 ns/op
    Iteration   5: 4,114 ns/op
    Iteration   6: 4,216 ns/op
    Iteration   7: 4,214 ns/op
    Iteration   8: 4,158 ns/op
    Iteration   9: 4,159 ns/op
    Iteration  10: 4,132 ns/op
    
    # Run progress: 36,67% complete, ETA 00:04:50
    # Fork: 2 of 5
    # Warmup Iteration   1: 6,585 ns/op
    # Warmup Iteration   2: 6,339 ns/op
    # Warmup Iteration   3: 4,486 ns/op
    # Warmup Iteration   4: 4,182 ns/op
    # Warmup Iteration   5: 4,234 ns/op
    Iteration   1: 4,196 ns/op
    Iteration   2: 4,119 ns/op
    Iteration   3: 4,221 ns/op
    Iteration   4: 4,350 ns/op
    Iteration   5: 4,268 ns/op
    Iteration   6: 4,422 ns/op
    Iteration   7: 4,856 ns/op
    Iteration   8: 4,463 ns/op
    Iteration   9: 4,387 ns/op
    Iteration  10: 4,666 ns/op
    
    # Run progress: 40,00% complete, ETA 00:04:35
    # Fork: 3 of 5
    # Warmup Iteration   1: 6,676 ns/op
    # Warmup Iteration   2: 5,895 ns/op
    # Warmup Iteration   3: 4,275 ns/op
    # Warmup Iteration   4: 4,967 ns/op
    # Warmup Iteration   5: 4,296 ns/op
    Iteration   1: 3,988 ns/op
    Iteration   2: 4,229 ns/op
    Iteration   3: 4,043 ns/op
    Iteration   4: 4,176 ns/op
    Iteration   5: 4,132 ns/op
    Iteration   6: 4,116 ns/op
    Iteration   7: 4,112 ns/op
    Iteration   8: 4,141 ns/op
    Iteration   9: 4,376 ns/op
    Iteration  10: 4,046 ns/op
    
    # Run progress: 43,33% complete, ETA 00:04:20
    # Fork: 4 of 5
    # Warmup Iteration   1: 6,594 ns/op
    # Warmup Iteration   2: 6,165 ns/op
    # Warmup Iteration   3: 4,236 ns/op
    # Warmup Iteration   4: 4,239 ns/op
    # Warmup Iteration   5: 4,261 ns/op
    Iteration   1: 4,410 ns/op
    Iteration   2: 4,354 ns/op
    Iteration   3: 4,389 ns/op
    Iteration   4: 4,404 ns/op
    Iteration   5: 4,372 ns/op
    Iteration   6: 4,383 ns/op
    Iteration   7: 4,339 ns/op
    Iteration   8: 4,262 ns/op
    Iteration   9: 4,225 ns/op
    Iteration  10: 4,263 ns/op
    
    # Run progress: 46,67% complete, ETA 00:04:05
    # Fork: 5 of 5
    # Warmup Iteration   1: 6,359 ns/op
    # Warmup Iteration   2: 5,736 ns/op
    # Warmup Iteration   3: 4,054 ns/op
    # Warmup Iteration   4: 4,209 ns/op
    # Warmup Iteration   5: 4,989 ns/op
    Iteration   1: 4,128 ns/op
    Iteration   2: 4,111 ns/op
    Iteration   3: 4,050 ns/op
    Iteration   4: 4,145 ns/op
    Iteration   5: 3,997 ns/op
    Iteration   6: 4,055 ns/op
    Iteration   7: 4,131 ns/op
    Iteration   8: 4,109 ns/op
    Iteration   9: 4,258 ns/op
    Iteration  10: 4,259 ns/op
    
    
    Result "ru.gnkoshelev.jbreak2018.perf_tests.vector.CreateAndConsumeBenchmark.createAndConsumeObject":
      4,233 ±(99.9%) 0,081 ns/op [Average]
      (min, avg, max) = (3,988, 4,233, 4,856), stdev = 0,163
      CI (99.9%): [4,152, 4,314] (assumes normal distribution)
    
    
    # JMH version: 1.20
    # VM version: JDK 1.8.0_161, VM 25.161-b12
    # VM invoker: C:\Program Files\Java\jre1.8.0_161\bin\java.exe
    # VM options: <none>
    # Warmup: 5 iterations, 1000 ms each
    # Measurement: 10 iterations, 1000 ms each
    # Timeout: 10 min per iteration
    # Threads: 1 thread, will synchronize iterations
    # Benchmark mode: Average time, time/op
    # Benchmark: ru.gnkoshelev.jbreak2018.perf_tests.vector.CreateAndConsumeBenchmark.createAndConsumeSingleVector
    
    # Run progress: 50,00% complete, ETA 00:03:50
    # Fork: 1 of 5
    # Warmup Iteration   1: 9,465 ns/op
    # Warmup Iteration   2: 8,339 ns/op
    # Warmup Iteration   3: 6,843 ns/op
    # Warmup Iteration   4: 6,870 ns/op
    # Warmup Iteration   5: 7,000 ns/op
    Iteration   1: 6,830 ns/op
    Iteration   2: 6,889 ns/op
    Iteration   3: 6,719 ns/op
    Iteration   4: 6,963 ns/op
    Iteration   5: 7,735 ns/op
    Iteration   6: 6,903 ns/op
    Iteration   7: 7,118 ns/op
    Iteration   8: 6,859 ns/op
    Iteration   9: 7,087 ns/op
    Iteration  10: 7,379 ns/op
    
    # Run progress: 53,33% complete, ETA 00:03:34
    # Fork: 2 of 5
    # Warmup Iteration   1: 9,744 ns/op
    # Warmup Iteration   2: 9,466 ns/op
    # Warmup Iteration   3: 6,871 ns/op
    # Warmup Iteration   4: 6,820 ns/op
    # Warmup Iteration   5: 6,858 ns/op
    Iteration   1: 6,869 ns/op
    Iteration   2: 6,898 ns/op
    Iteration   3: 7,045 ns/op
    Iteration   4: 6,835 ns/op
    Iteration   5: 6,868 ns/op
    Iteration   6: 6,941 ns/op
    Iteration   7: 6,998 ns/op
    Iteration   8: 6,807 ns/op
    Iteration   9: 7,175 ns/op
    Iteration  10: 6,743 ns/op
    
    # Run progress: 56,67% complete, ETA 00:03:19
    # Fork: 3 of 5
    # Warmup Iteration   1: 10,032 ns/op
    # Warmup Iteration   2: 8,211 ns/op
    # Warmup Iteration   3: 6,765 ns/op
    # Warmup Iteration   4: 6,623 ns/op
    # Warmup Iteration   5: 6,686 ns/op
    Iteration   1: 6,888 ns/op
    Iteration   2: 6,890 ns/op
    Iteration   3: 6,801 ns/op
    Iteration   4: 6,948 ns/op
    Iteration   5: 6,917 ns/op
    Iteration   6: 6,983 ns/op
    Iteration   7: 7,424 ns/op
    Iteration   8: 6,883 ns/op
    Iteration   9: 6,852 ns/op
    Iteration  10: 7,131 ns/op
    
    # Run progress: 60,00% complete, ETA 00:03:04
    # Fork: 4 of 5
    # Warmup Iteration   1: 9,733 ns/op
    # Warmup Iteration   2: 9,382 ns/op
    # Warmup Iteration   3: 7,941 ns/op
    # Warmup Iteration   4: 6,613 ns/op
    # Warmup Iteration   5: 6,822 ns/op
    Iteration   1: 6,882 ns/op
    Iteration   2: 6,867 ns/op
    Iteration   3: 6,746 ns/op
    Iteration   4: 6,705 ns/op
    Iteration   5: 6,797 ns/op
    Iteration   6: 6,912 ns/op
    Iteration   7: 6,829 ns/op
    Iteration   8: 6,918 ns/op
    Iteration   9: 6,794 ns/op
    Iteration  10: 6,676 ns/op
    
    # Run progress: 63,33% complete, ETA 00:02:48
    # Fork: 5 of 5
    # Warmup Iteration   1: 9,569 ns/op
    # Warmup Iteration   2: 8,417 ns/op
    # Warmup Iteration   3: 7,498 ns/op
    # Warmup Iteration   4: 6,733 ns/op
    # Warmup Iteration   5: 7,604 ns/op
    Iteration   1: 7,897 ns/op
    Iteration   2: 7,120 ns/op
    Iteration   3: 7,500 ns/op
    Iteration   4: 6,625 ns/op
    Iteration   5: 6,770 ns/op
    Iteration   6: 7,269 ns/op
    Iteration   7: 7,241 ns/op
    Iteration   8: 7,620 ns/op
    Iteration   9: 7,856 ns/op
    Iteration  10: 7,113 ns/op
    
    
    Result "ru.gnkoshelev.jbreak2018.perf_tests.vector.CreateAndConsumeBenchmark.createAndConsumeSingleVector":
      7,010 ±(99.9%) 0,147 ns/op [Average]
      (min, avg, max) = (6,625, 7,010, 7,897), stdev = 0,296
      CI (99.9%): [6,864, 7,157] (assumes normal distribution)
    
    
    # JMH version: 1.20
    # VM version: JDK 1.8.0_161, VM 25.161-b12
    # VM invoker: C:\Program Files\Java\jre1.8.0_161\bin\java.exe
    # VM options: <none>
    # Warmup: 5 iterations, 1000 ms each
    # Measurement: 10 iterations, 1000 ms each
    # Timeout: 10 min per iteration
    # Threads: 1 thread, will synchronize iterations
    # Benchmark mode: Average time, time/op
    # Benchmark: ru.gnkoshelev.jbreak2018.perf_tests.vector.CreateAndConsumeBenchmark.createAndConsumeThreeVectors
    
    # Run progress: 66,67% complete, ETA 00:02:33
    # Fork: 1 of 5
    # Warmup Iteration   1: 31,232 ns/op
    # Warmup Iteration   2: 24,643 ns/op
    # Warmup Iteration   3: 20,248 ns/op
    # Warmup Iteration   4: 20,570 ns/op
    # Warmup Iteration   5: 20,308 ns/op
    Iteration   1: 19,842 ns/op
    Iteration   2: 20,232 ns/op
    Iteration   3: 20,029 ns/op
    Iteration   4: 20,176 ns/op
    Iteration   5: 20,115 ns/op
    Iteration   6: 19,805 ns/op
    Iteration   7: 21,714 ns/op
    Iteration   8: 22,290 ns/op
    Iteration   9: 19,326 ns/op
    Iteration  10: 20,043 ns/op
    
    # Run progress: 70,00% complete, ETA 00:02:18
    # Fork: 2 of 5
    # Warmup Iteration   1: 26,737 ns/op
    # Warmup Iteration   2: 23,298 ns/op
    # Warmup Iteration   3: 19,492 ns/op
    # Warmup Iteration   4: 20,015 ns/op
    # Warmup Iteration   5: 19,786 ns/op
    Iteration   1: 20,654 ns/op
    Iteration   2: 24,989 ns/op
    Iteration   3: 23,062 ns/op
    Iteration   4: 20,066 ns/op
    Iteration   5: 19,356 ns/op
    Iteration   6: 20,228 ns/op
    Iteration   7: 21,509 ns/op
    Iteration   8: 22,263 ns/op
    Iteration   9: 21,233 ns/op
    Iteration  10: 19,880 ns/op
    
    # Run progress: 73,33% complete, ETA 00:02:02
    # Fork: 3 of 5
    # Warmup Iteration   1: 26,036 ns/op
    # Warmup Iteration   2: 23,763 ns/op
    # Warmup Iteration   3: 20,667 ns/op
    # Warmup Iteration   4: 21,922 ns/op
    # Warmup Iteration   5: 21,267 ns/op
    Iteration   1: 23,255 ns/op
    Iteration   2: 19,302 ns/op
    Iteration   3: 18,863 ns/op
    Iteration   4: 19,233 ns/op
    Iteration   5: 19,925 ns/op
    Iteration   6: 20,173 ns/op
    Iteration   7: 21,392 ns/op
    Iteration   8: 20,636 ns/op
    Iteration   9: 20,912 ns/op
    Iteration  10: 24,070 ns/op
    
    # Run progress: 76,67% complete, ETA 00:01:47
    # Fork: 4 of 5
    # Warmup Iteration   1: 29,365 ns/op
    # Warmup Iteration   2: 27,052 ns/op
    # Warmup Iteration   3: 21,789 ns/op
    # Warmup Iteration   4: 19,787 ns/op
    # Warmup Iteration   5: 20,056 ns/op
    Iteration   1: 21,602 ns/op
    Iteration   2: 22,444 ns/op
    Iteration   3: 20,305 ns/op
    Iteration   4: 21,075 ns/op
    Iteration   5: 19,933 ns/op
    Iteration   6: 22,111 ns/op
    Iteration   7: 22,645 ns/op
    Iteration   8: 19,873 ns/op
    Iteration   9: 19,664 ns/op
    Iteration  10: 19,952 ns/op
    
    # Run progress: 80,00% complete, ETA 00:01:32
    # Fork: 5 of 5
    # Warmup Iteration   1: 30,277 ns/op
    # Warmup Iteration   2: 25,022 ns/op
    # Warmup Iteration   3: 20,405 ns/op
    # Warmup Iteration   4: 19,999 ns/op
    # Warmup Iteration   5: 20,755 ns/op
    Iteration   1: 20,470 ns/op
    Iteration   2: 21,499 ns/op
    Iteration   3: 20,766 ns/op
    Iteration   4: 19,998 ns/op
    Iteration   5: 19,515 ns/op
    Iteration   6: 20,064 ns/op
    Iteration   7: 19,542 ns/op
    Iteration   8: 20,014 ns/op
    Iteration   9: 19,758 ns/op
    Iteration  10: 19,717 ns/op
    
    
    Result "ru.gnkoshelev.jbreak2018.perf_tests.vector.CreateAndConsumeBenchmark.createAndConsumeThreeVectors":
      20,710 ±(99.9%) 0,654 ns/op [Average]
      (min, avg, max) = (18,863, 20,710, 24,989), stdev = 1,320
      CI (99.9%): [20,057, 21,364] (assumes normal distribution)
    
    
    # JMH version: 1.20
    # VM version: JDK 1.8.0_161, VM 25.161-b12
    # VM invoker: C:\Program Files\Java\jre1.8.0_161\bin\java.exe
    # VM options: <none>
    # Warmup: 5 iterations, 1000 ms each
    # Measurement: 10 iterations, 1000 ms each
    # Timeout: 10 min per iteration
    # Threads: 1 thread, will synchronize iterations
    # Benchmark mode: Average time, time/op
    # Benchmark: ru.gnkoshelev.jbreak2018.perf_tests.vector.CreateAndConsumeBenchmark.createAndConsumeTwoVectors
    
    # Run progress: 83,33% complete, ETA 00:01:16
    # Fork: 1 of 5
    # Warmup Iteration   1: 18,072 ns/op
    # Warmup Iteration   2: 25,260 ns/op
    # Warmup Iteration   3: 15,438 ns/op
    # Warmup Iteration   4: 13,649 ns/op
    # Warmup Iteration   5: 13,361 ns/op
    Iteration   1: 13,433 ns/op
    Iteration   2: 13,303 ns/op
    Iteration   3: 13,019 ns/op
    Iteration   4: 13,528 ns/op
    Iteration   5: 14,091 ns/op
    Iteration   6: 13,546 ns/op
    Iteration   7: 13,573 ns/op
    Iteration   8: 13,638 ns/op
    Iteration   9: 14,691 ns/op
    Iteration  10: 13,792 ns/op
    
    # Run progress: 86,67% complete, ETA 00:01:01
    # Fork: 2 of 5
    # Warmup Iteration   1: 18,286 ns/op
    # Warmup Iteration   2: 17,930 ns/op
    # Warmup Iteration   3: 14,022 ns/op
    # Warmup Iteration   4: 13,687 ns/op
    # Warmup Iteration   5: 13,751 ns/op
    Iteration   1: 14,289 ns/op
    Iteration   2: 15,563 ns/op
    Iteration   3: 14,257 ns/op
    Iteration   4: 13,320 ns/op
    Iteration   5: 13,521 ns/op
    Iteration   6: 13,466 ns/op
    Iteration   7: 13,302 ns/op
    Iteration   8: 14,263 ns/op
    Iteration   9: 14,169 ns/op
    Iteration  10: 13,351 ns/op
    
    # Run progress: 90,00% complete, ETA 00:00:46
    # Fork: 3 of 5
    # Warmup Iteration   1: 18,666 ns/op
    # Warmup Iteration   2: 16,649 ns/op
    # Warmup Iteration   3: 14,153 ns/op
    # Warmup Iteration   4: 13,350 ns/op
    # Warmup Iteration   5: 13,531 ns/op
    Iteration   1: 13,186 ns/op
    Iteration   2: 13,436 ns/op
    Iteration   3: 14,136 ns/op
    Iteration   4: 14,686 ns/op
    Iteration   5: 13,111 ns/op
    Iteration   6: 13,267 ns/op
    Iteration   7: 13,264 ns/op
    Iteration   8: 15,579 ns/op
    Iteration   9: 13,763 ns/op
    Iteration  10: 13,015 ns/op
    
    # Run progress: 93,33% complete, ETA 00:00:30
    # Fork: 4 of 5
    # Warmup Iteration   1: 17,933 ns/op
    # Warmup Iteration   2: 17,535 ns/op
    # Warmup Iteration   3: 12,788 ns/op
    # Warmup Iteration   4: 13,409 ns/op
    # Warmup Iteration   5: 13,425 ns/op
    Iteration   1: 13,695 ns/op
    Iteration   2: 13,744 ns/op
    Iteration   3: 13,878 ns/op
    Iteration   4: 13,978 ns/op
    Iteration   5: 13,653 ns/op
    Iteration   6: 13,535 ns/op
    Iteration   7: 13,110 ns/op
    Iteration   8: 14,358 ns/op
    Iteration   9: 13,280 ns/op
    Iteration  10: 13,538 ns/op
    
    # Run progress: 96,67% complete, ETA 00:00:15
    # Fork: 5 of 5
    # Warmup Iteration   1: 18,142 ns/op
    # Warmup Iteration   2: 16,007 ns/op
    # Warmup Iteration   3: 15,354 ns/op
    # Warmup Iteration   4: 14,272 ns/op
    # Warmup Iteration   5: 13,961 ns/op
    Iteration   1: 13,698 ns/op
    Iteration   2: 13,758 ns/op
    Iteration   3: 13,508 ns/op
    Iteration   4: 13,410 ns/op
    Iteration   5: 13,533 ns/op
    Iteration   6: 13,457 ns/op
    Iteration   7: 13,454 ns/op
    Iteration   8: 13,197 ns/op
    Iteration   9: 13,234 ns/op
    Iteration  10: 13,514 ns/op
    
    
    Result "ru.gnkoshelev.jbreak2018.perf_tests.vector.CreateAndConsumeBenchmark.createAndConsumeTwoVectors":
      13,702 ±(99.9%) 0,276 ns/op [Average]
      (min, avg, max) = (13,015, 13,702, 15,579), stdev = 0,557
      CI (99.9%): [13,426, 13,978] (assumes normal distribution)
    
    
    # Run complete. Total time: 00:07:41
    
    Benchmark                                               Mode  Cnt   Score   Error  Units
    CreateAndConsumeBenchmark.consumeDouble                 avgt   50   2,762 ± 0,103  ns/op
    CreateAndConsumeBenchmark.consumeObject                 avgt   50   3,084 ± 0,036  ns/op
    CreateAndConsumeBenchmark.createAndConsumeObject        avgt   50   4,233 ± 0,081  ns/op
    CreateAndConsumeBenchmark.createAndConsumeSingleVector  avgt   50   7,010 ± 0,147  ns/op
    CreateAndConsumeBenchmark.createAndConsumeThreeVectors  avgt   50  20,710 ± 0,654  ns/op
    CreateAndConsumeBenchmark.createAndConsumeTwoVectors    avgt   50  13,702 ± 0,276  ns/op

    Ещё несколько выводов по результатам этого бенчмарка:

    1. Blackhole.consume(double) — дорогой в сравнении с нашей арифметикой (это плохо)
    2. Blackhole.consume(Object) немного дороже Blackhole.consume(double) (это тоже плохо)
    3. Каких-то существенных выбросов в измерениях нет (это ни о чём не говорит)
    4. Важный вывод — цена создания объекта сопоставима с арифметикой выше

    Можно долго спекулировать на тему полученных результатов, менять double на int и наоборот — данный бенчмарк показал несущественную разницу для двух алгоритмов. Честно признаться, на таких тестах (единицы наносекунд) существенное влияние может оказать всё вплоть до железа, но нам повезло. А ещё… Мы так и не разобрались, куда пропали затраты на создание объектов из первого бенчмарка.

    Куда делись объекты Vector


    Если внимательно посмотреть на код, можно заметить, что создаваемые объекты Vector локальны для метода VectorAlgebra.computeWithVectors() (скоуп объектов ограничен указанным методом), а всё действие с объектами ограничивается арифметикой над их полями. Это значит, что JIT-компилятор в теории мог бы не создавать эти объекты, а заменить код каким-то таким образом (помним, что методы были заинлайнены):

        public static double computeJittedPseudocode(
                double x1, double y1, double z1,
                double x2, double y2, double z2) {
            final double v1_x = x1, v1_y = y1, v1_z = z1;// new Vector(x1, y1, z1);
            final double v2_x = x2, v2_y = y2, v2_z = z2;// new Vector(x2, y2, z2);
            double x = v1_y * v2_z - v1_z * v2_y; // inline crossProduct
            double y = v1_z * v2_x - v1_x * v2_z; // inline crossProduct
            double z = v1_x * v2_y - v1_y * v2_x; // inline crossProduct
            // new Vector(x, y, z);
            return x * x + y * y + z * z;// inline squared
        }

    Каковы преимущества такой оптимизации? Во-первых, выделение памяти не в куче, а на стеке (если повезёт, то все вычисления ограничатся использованием регистров процессора). Во-вторых, нет объекта — нет нагрузки на GC.

    Написанное выше можно формализовать в виде двух действий:

    1. Обнаружить, какие объекты имеют ограниченный скоуп (определение области достижимости объекта). Это называется Escape analysis.
    2. Заменить операции над полями объекта операциями над локальными переменными (а значит, не создавать сами объекты за ненадобностью). Данная оптимизация называется Scalar replacement или скаляризация.

    Подробнее об этом можно почитать в статье Escape analysis и скаляризация: Пусть GC отдохнет (по ссылке доклад Руслана cheremin с конференции JPoint и его текстовая расшифровка). Оригинальная статья Brian Goetz, вышедшая в 2005 году, Urban performance legends, revisited, в которой рассказывается о нововведении в Java 6. По умолчанию Escape-анализ включен начиная с 6u23.

    В качестве подтверждения попробуем прогнать первый бенчмарк с отключенным Escape-анализом -XX:-DoEscapeAnalysis:

    Benchmark                                Mode  Cnt   Score   Error  Units
    ComputationOnly...computeWithRawScalars  avgt   50   4,977 ± 0,122  ns/op
    ComputationOnly...computeWithVectors     avgt   50  20,335 ± 1,005  ns/op

    Результат бенчмарка целиком
    # JMH version: 1.20
    # VM version: JDK 1.8.0_161, VM 25.161-b12
    # VM invoker: C:\Program Files\Java\jre1.8.0_161\bin\java.exe
    # VM options: -XX:-DoEscapeAnalysis
    # Warmup: 5 iterations, 1000 ms each
    # Measurement: 10 iterations, 1000 ms each
    # Timeout: 10 min per iteration
    # Threads: 1 thread, will synchronize iterations
    # Benchmark mode: Average time, time/op
    # Benchmark: ru.gnkoshelev.jbreak2018.perf_tests.vector.ComputationOnlyBenchmark.computeWithRawScalars
    
    # Run progress: 0,00% complete, ETA 00:02:30
    # Fork: 1 of 5
    # Warmup Iteration   1: 6,116 ns/op
    # Warmup Iteration   2: 5,668 ns/op
    # Warmup Iteration   3: 5,795 ns/op
    # Warmup Iteration   4: 5,484 ns/op
    # Warmup Iteration   5: 5,185 ns/op
    Iteration   1: 4,794 ns/op
    Iteration   2: 4,919 ns/op
    Iteration   3: 5,092 ns/op
    Iteration   4: 5,231 ns/op
    Iteration   5: 5,187 ns/op
    Iteration   6: 4,912 ns/op
    Iteration   7: 4,899 ns/op
    Iteration   8: 5,254 ns/op
    Iteration   9: 5,218 ns/op
    Iteration  10: 5,095 ns/op
    
    # Run progress: 10,00% complete, ETA 00:02:19
    # Fork: 2 of 5
    # Warmup Iteration   1: 6,257 ns/op
    # Warmup Iteration   2: 5,560 ns/op
    # Warmup Iteration   3: 6,078 ns/op
    # Warmup Iteration   4: 5,831 ns/op
    # Warmup Iteration   5: 6,079 ns/op
    Iteration   1: 5,250 ns/op
    Iteration   2: 5,434 ns/op
    Iteration   3: 5,472 ns/op
    Iteration   4: 5,099 ns/op
    Iteration   5: 5,204 ns/op
    Iteration   6: 5,217 ns/op
    Iteration   7: 5,628 ns/op
    Iteration   8: 5,152 ns/op
    Iteration   9: 5,273 ns/op
    Iteration  10: 5,126 ns/op
    
    # Run progress: 20,00% complete, ETA 00:02:03
    # Fork: 3 of 5
    # Warmup Iteration   1: 5,595 ns/op
    # Warmup Iteration   2: 5,203 ns/op
    # Warmup Iteration   3: 5,247 ns/op
    # Warmup Iteration   4: 5,157 ns/op
    # Warmup Iteration   5: 5,184 ns/op
    Iteration   1: 4,924 ns/op
    Iteration   2: 4,831 ns/op
    Iteration   3: 4,816 ns/op
    Iteration   4: 4,787 ns/op
    Iteration   5: 4,843 ns/op
    Iteration   6: 4,758 ns/op
    Iteration   7: 4,788 ns/op
    Iteration   8: 4,771 ns/op
    Iteration   9: 5,051 ns/op
    Iteration  10: 4,767 ns/op
    
    # Run progress: 30,00% complete, ETA 00:01:47
    # Fork: 4 of 5
    # Warmup Iteration   1: 5,296 ns/op
    # Warmup Iteration   2: 4,822 ns/op
    # Warmup Iteration   3: 4,827 ns/op
    # Warmup Iteration   4: 4,884 ns/op
    # Warmup Iteration   5: 4,863 ns/op
    Iteration   1: 4,807 ns/op
    Iteration   2: 4,880 ns/op
    Iteration   3: 5,747 ns/op
    Iteration   4: 4,862 ns/op
    Iteration   5: 4,800 ns/op
    Iteration   6: 4,802 ns/op
    Iteration   7: 4,843 ns/op
    Iteration   8: 4,858 ns/op
    Iteration   9: 4,864 ns/op
    Iteration  10: 4,837 ns/op
    
    # Run progress: 40,00% complete, ETA 00:01:32
    # Fork: 5 of 5
    # Warmup Iteration   1: 5,158 ns/op
    # Warmup Iteration   2: 4,728 ns/op
    # Warmup Iteration   3: 4,759 ns/op
    # Warmup Iteration   4: 4,751 ns/op
    # Warmup Iteration   5: 4,753 ns/op
    Iteration   1: 4,758 ns/op
    Iteration   2: 4,793 ns/op
    Iteration   3: 4,773 ns/op
    Iteration   4: 4,755 ns/op
    Iteration   5: 4,771 ns/op
    Iteration   6: 4,759 ns/op
    Iteration   7: 4,811 ns/op
    Iteration   8: 4,763 ns/op
    Iteration   9: 4,768 ns/op
    Iteration  10: 4,822 ns/op
    
    
    Result "ru.gnkoshelev.jbreak2018.perf_tests.vector.ComputationOnlyBenchmark.computeWithRawScalars":
      4,977 ±(99.9%) 0,122 ns/op [Average]
      (min, avg, max) = (4,755, 4,977, 5,747), stdev = 0,247
      CI (99.9%): [4,855, 5,100] (assumes normal distribution)
    
    
    # JMH version: 1.20
    # VM version: JDK 1.8.0_161, VM 25.161-b12
    # VM invoker: C:\Program Files\Java\jre1.8.0_161\bin\java.exe
    # VM options: -XX:-DoEscapeAnalysis
    # Warmup: 5 iterations, 1000 ms each
    # Measurement: 10 iterations, 1000 ms each
    # Timeout: 10 min per iteration
    # Threads: 1 thread, will synchronize iterations
    # Benchmark mode: Average time, time/op
    # Benchmark: ru.gnkoshelev.jbreak2018.perf_tests.vector.ComputationOnlyBenchmark.computeWithVectors
    
    # Run progress: 50,00% complete, ETA 00:01:16
    # Fork: 1 of 5
    # Warmup Iteration   1: 27,236 ns/op
    # Warmup Iteration   2: 19,767 ns/op
    # Warmup Iteration   3: 17,886 ns/op
    # Warmup Iteration   4: 18,233 ns/op
    # Warmup Iteration   5: 19,181 ns/op
    Iteration   1: 18,941 ns/op
    Iteration   2: 19,268 ns/op
    Iteration   3: 19,248 ns/op
    Iteration   4: 18,410 ns/op
    Iteration   5: 18,542 ns/op
    Iteration   6: 18,864 ns/op
    Iteration   7: 19,185 ns/op
    Iteration   8: 19,991 ns/op
    Iteration   9: 20,165 ns/op
    Iteration  10: 23,206 ns/op
    
    # Run progress: 60,00% complete, ETA 00:01:01
    # Fork: 2 of 5
    # Warmup Iteration   1: 25,746 ns/op
    # Warmup Iteration   2: 22,360 ns/op
    # Warmup Iteration   3: 21,799 ns/op
    # Warmup Iteration   4: 20,638 ns/op
    # Warmup Iteration   5: 19,795 ns/op
    Iteration   1: 19,828 ns/op
    Iteration   2: 19,245 ns/op
    Iteration   3: 19,671 ns/op
    Iteration   4: 19,241 ns/op
    Iteration   5: 20,262 ns/op
    Iteration   6: 23,844 ns/op
    Iteration   7: 21,305 ns/op
    Iteration   8: 19,938 ns/op
    Iteration   9: 24,120 ns/op
    Iteration  10: 22,911 ns/op
    
    # Run progress: 70,00% complete, ETA 00:00:46
    # Fork: 3 of 5
    # Warmup Iteration   1: 28,404 ns/op
    # Warmup Iteration   2: 25,285 ns/op
    # Warmup Iteration   3: 20,809 ns/op
    # Warmup Iteration   4: 20,383 ns/op
    # Warmup Iteration   5: 20,395 ns/op
    Iteration   1: 21,113 ns/op
    Iteration   2: 21,447 ns/op
    Iteration   3: 21,666 ns/op
    Iteration   4: 20,485 ns/op
    Iteration   5: 21,662 ns/op
    Iteration   6: 20,139 ns/op
    Iteration   7: 21,567 ns/op
    Iteration   8: 20,834 ns/op
    Iteration   9: 21,712 ns/op
    Iteration  10: 20,869 ns/op
    
    # Run progress: 80,00% complete, ETA 00:00:30
    # Fork: 4 of 5
    # Warmup Iteration   1: 30,493 ns/op
    # Warmup Iteration   2: 24,889 ns/op
    # Warmup Iteration   3: 21,871 ns/op
    # Warmup Iteration   4: 19,788 ns/op
    # Warmup Iteration   5: 18,893 ns/op
    Iteration   1: 18,813 ns/op
    Iteration   2: 18,546 ns/op
    Iteration   3: 19,503 ns/op
    Iteration   4: 20,699 ns/op
    Iteration   5: 27,849 ns/op
    Iteration   6: 19,529 ns/op
    Iteration   7: 26,412 ns/op
    Iteration   8: 20,032 ns/op
    Iteration   9: 19,040 ns/op
    Iteration  10: 19,013 ns/op
    
    # Run progress: 90,00% complete, ETA 00:00:15
    # Fork: 5 of 5
    # Warmup Iteration   1: 25,171 ns/op
    # Warmup Iteration   2: 23,385 ns/op
    # Warmup Iteration   3: 19,782 ns/op
    # Warmup Iteration   4: 21,491 ns/op
    # Warmup Iteration   5: 20,863 ns/op
    Iteration   1: 20,694 ns/op
    Iteration   2: 18,793 ns/op
    Iteration   3: 17,919 ns/op
    Iteration   4: 18,117 ns/op
    Iteration   5: 18,309 ns/op
    Iteration   6: 20,848 ns/op
    Iteration   7: 18,970 ns/op
    Iteration   8: 18,359 ns/op
    Iteration   9: 18,688 ns/op
    Iteration  10: 18,948 ns/op
    
    
    Result "ru.gnkoshelev.jbreak2018.perf_tests.vector.ComputationOnlyBenchmark.computeWithVectors":
      20,335 ±(99.9%) 1,005 ns/op [Average]
      (min, avg, max) = (17,919, 20,335, 27,849), stdev = 2,030
      CI (99.9%): [19,331, 21,340] (assumes normal distribution)
    
    
    # Run complete. Total time: 00:02:34
    
    Benchmark                                Mode  Cnt   Score   Error  Units
    ComputationOnly...computeWithRawScalars  avgt   50   4,977 ± 0,122  ns/op
    ComputationOnly...computeWithVectors     avgt   50  20,335 ± 1,005  ns/op

    Escape-анализ не является оптимизацией сам по себе — он лишь собирает сведения для последующего применения Scalar Replacement. Для отключения оптимизации можно воспользоваться ключом -XX:-EliminateAllocations.

    Правильный ответ?

    Оба алгоритма дают одинаковый результат по производительности, т.к. благодаря Escape Analysis и Scalar Replacement не будут создаваться объекты Vector

    Второе дно


    На конференции у нашего стенда Владимир vladimirsitnikov Ситников и Никита Коваль обсуждали эту задачу. Они верно определили, что здесь имеет место быть Escape-анализ. Однако, Вова высказал сомнение, что здесь может быть не всё так просто и у задачи вполне может оказаться второе дно.

    Вернёмся к тому, что один-единственный вызов нашего метода столкнулся с side-эффектами окружения. В частности, метод Blackhole.consume() требовал сопоставимое с алгоритмом время на выполнение. Доработаем бенчмарк, чтобы снизить эффект от этого:

    @Fork(value = 5, warmups = 0)
    @Warmup(iterations = 5, time = 1_000, timeUnit = TimeUnit.MILLISECONDS)
    @Measurement(iterations = 10, time = 1_000, timeUnit = TimeUnit.MILLISECONDS)
    @OutputTimeUnit(value = TimeUnit.NANOSECONDS)
    @BenchmarkMode(Mode.AverageTime)
    @State(Scope.Benchmark)
    public class ComputationBatchBenchmark {
        private double x1, y1, z1;
        private double x2, y2, z2;
    
        @Setup(value = Level.Iteration)
        public void setup() {
            x1 = 123.4;
            y1 = 234.5;
            z1 = 345.6;
            x2 = 456.7;
            y2 = 567.8;
            z2 = 678.9;
        }
    
        @Benchmark
        @OperationsPerInvocation(10_000)
        public void computeWithRawScalars(Blackhole bh) {
            double sum = 0;
            for (int i = 0; i < 10_000; i++) {
                sum += VectorAlgebra.computeWithRawScalars(x1, y1, z1, x2, y2, z2);
            }
            bh.consume(sum);
        }
    
        @Benchmark
        @OperationsPerInvocation(10_000)
        public void computeWithVectors(Blackhole bh) {
            double sum = 0;
            for (int i = 0; i < 10_000; i++) {
                sum += VectorAlgebra.computeWithVectors(x1, y1, z1, x2, y2, z2);
            }
            bh.consume(sum);
        }
    }

    Внезапно:

    Benchmark                                 Mode  Cnt  Score   Error  Units
    ComputationBatch...computeWithRawScalars  avgt   50  0,921 ± 0,004  ns/op
    ComputationBatch...computeWithVectors     avgt   50  2,609 ± 0,029  ns/op

    Результат бенчмарка целиком
    # JMH version: 1.20
    # VM version: JDK 1.8.0_161, VM 25.161-b12
    # VM invoker: C:\Program Files\Java\jre1.8.0_161\bin\java.exe
    # VM options: <none>
    # Warmup: 5 iterations, 1000 ms each
    # Measurement: 10 iterations, 1000 ms each
    # Timeout: 10 min per iteration
    # Threads: 1 thread, will synchronize iterations
    # Benchmark mode: Average time, time/op
    # Benchmark: ru.gnkoshelev.jbreak2018.perf_tests.vector.ComputationBatchBenchmark.computeWithRawScalars
    
    # Run progress: 0,00% complete, ETA 00:02:30
    # Fork: 1 of 5
    # Warmup Iteration   1: 1,003 ns/op
    # Warmup Iteration   2: 0,936 ns/op
    # Warmup Iteration   3: 0,918 ns/op
    # Warmup Iteration   4: 0,929 ns/op
    # Warmup Iteration   5: 0,919 ns/op
    Iteration   1: 0,923 ns/op
    Iteration   2: 0,936 ns/op
    Iteration   3: 0,914 ns/op
    Iteration   4: 0,913 ns/op
    Iteration   5: 0,914 ns/op
    Iteration   6: 0,914 ns/op
    Iteration   7: 0,916 ns/op
    Iteration   8: 0,913 ns/op
    Iteration   9: 0,919 ns/op
    Iteration  10: 0,923 ns/op
    
    # Run progress: 10,00% complete, ETA 00:02:18
    # Fork: 2 of 5
    # Warmup Iteration   1: 0,941 ns/op
    # Warmup Iteration   2: 0,926 ns/op
    # Warmup Iteration   3: 0,931 ns/op
    # Warmup Iteration   4: 0,927 ns/op
    # Warmup Iteration   5: 0,927 ns/op
    Iteration   1: 0,940 ns/op
    Iteration   2: 0,918 ns/op
    Iteration   3: 0,919 ns/op
    Iteration   4: 0,915 ns/op
    Iteration   5: 0,915 ns/op
    Iteration   6: 0,914 ns/op
    Iteration   7: 0,918 ns/op
    Iteration   8: 0,912 ns/op
    Iteration   9: 0,916 ns/op
    Iteration  10: 0,918 ns/op
    
    # Run progress: 20,00% complete, ETA 00:02:02
    # Fork: 3 of 5
    # Warmup Iteration   1: 0,923 ns/op
    # Warmup Iteration   2: 0,912 ns/op
    # Warmup Iteration   3: 0,916 ns/op
    # Warmup Iteration   4: 0,930 ns/op
    # Warmup Iteration   5: 0,916 ns/op
    Iteration   1: 0,921 ns/op
    Iteration   2: 0,932 ns/op
    Iteration   3: 0,931 ns/op
    Iteration   4: 0,919 ns/op
    Iteration   5: 0,918 ns/op
    Iteration   6: 0,915 ns/op
    Iteration   7: 0,914 ns/op
    Iteration   8: 0,918 ns/op
    Iteration   9: 0,917 ns/op
    Iteration  10: 0,917 ns/op
    
    # Run progress: 30,00% complete, ETA 00:01:47
    # Fork: 4 of 5
    # Warmup Iteration   1: 0,926 ns/op
    # Warmup Iteration   2: 0,915 ns/op
    # Warmup Iteration   3: 0,912 ns/op
    # Warmup Iteration   4: 0,917 ns/op
    # Warmup Iteration   5: 0,915 ns/op
    Iteration   1: 0,917 ns/op
    Iteration   2: 0,915 ns/op
    Iteration   3: 0,915 ns/op
    Iteration   4: 0,929 ns/op
    Iteration   5: 0,939 ns/op
    Iteration   6: 0,919 ns/op
    Iteration   7: 0,919 ns/op
    Iteration   8: 0,936 ns/op
    Iteration   9: 0,929 ns/op
    Iteration  10: 0,939 ns/op
    
    # Run progress: 40,00% complete, ETA 00:01:31
    # Fork: 5 of 5
    # Warmup Iteration   1: 0,939 ns/op
    # Warmup Iteration   2: 0,928 ns/op
    # Warmup Iteration   3: 0,947 ns/op
    # Warmup Iteration   4: 0,930 ns/op
    # Warmup Iteration   5: 0,948 ns/op
    Iteration   1: 0,925 ns/op
    Iteration   2: 0,930 ns/op
    Iteration   3: 0,914 ns/op
    Iteration   4: 0,918 ns/op
    Iteration   5: 0,914 ns/op
    Iteration   6: 0,918 ns/op
    Iteration   7: 0,928 ns/op
    Iteration   8: 0,923 ns/op
    Iteration   9: 0,924 ns/op
    Iteration  10: 0,920 ns/op
    
    
    Result "ru.gnkoshelev.jbreak2018.perf_tests.vector.ComputationBatchBenchmark.computeWithRawScalars":
      0,921 ±(99.9%) 0,004 ns/op [Average]
      (min, avg, max) = (0,912, 0,921, 0,940), stdev = 0,008
      CI (99.9%): [0,917, 0,925] (assumes normal distribution)
    
    
    # JMH version: 1.20
    # VM version: JDK 1.8.0_161, VM 25.161-b12
    # VM invoker: C:\Program Files\Java\jre1.8.0_161\bin\java.exe
    # VM options: <none>
    # Warmup: 5 iterations, 1000 ms each
    # Measurement: 10 iterations, 1000 ms each
    # Timeout: 10 min per iteration
    # Threads: 1 thread, will synchronize iterations
    # Benchmark mode: Average time, time/op
    # Benchmark: ru.gnkoshelev.jbreak2018.perf_tests.vector.ComputationBatchBenchmark.computeWithVectors
    
    # Run progress: 50,00% complete, ETA 00:01:16
    # Fork: 1 of 5
    # Warmup Iteration   1: 2,489 ns/op
    # Warmup Iteration   2: 2,493 ns/op
    # Warmup Iteration   3: 2,457 ns/op
    # Warmup Iteration   4: 2,688 ns/op
    # Warmup Iteration   5: 2,699 ns/op
    Iteration   1: 2,647 ns/op
    Iteration   2: 2,631 ns/op
    Iteration   3: 2,637 ns/op
    Iteration   4: 2,653 ns/op
    Iteration   5: 2,724 ns/op
    Iteration   6: 2,601 ns/op
    Iteration   7: 2,590 ns/op
    Iteration   8: 2,600 ns/op
    Iteration   9: 2,589 ns/op
    Iteration  10: 2,617 ns/op
    
    # Run progress: 60,00% complete, ETA 00:01:01
    # Fork: 2 of 5
    # Warmup Iteration   1: 2,500 ns/op
    # Warmup Iteration   2: 2,461 ns/op
    # Warmup Iteration   3: 2,478 ns/op
    # Warmup Iteration   4: 2,870 ns/op
    # Warmup Iteration   5: 2,695 ns/op
    Iteration   1: 2,576 ns/op
    Iteration   2: 2,558 ns/op
    Iteration   3: 2,594 ns/op
    Iteration   4: 2,587 ns/op
    Iteration   5: 2,654 ns/op
    Iteration   6: 2,605 ns/op
    Iteration   7: 2,631 ns/op
    Iteration   8: 2,573 ns/op
    Iteration   9: 2,574 ns/op
    Iteration  10: 2,595 ns/op
    
    # Run progress: 70,00% complete, ETA 00:00:45
    # Fork: 3 of 5
    # Warmup Iteration   1: 2,499 ns/op
    # Warmup Iteration   2: 2,463 ns/op
    # Warmup Iteration   3: 2,465 ns/op
    # Warmup Iteration   4: 2,596 ns/op
    # Warmup Iteration   5: 2,686 ns/op
    Iteration   1: 2,695 ns/op
    Iteration   2: 2,665 ns/op
    Iteration   3: 2,573 ns/op
    Iteration   4: 2,827 ns/op
    Iteration   5: 2,620 ns/op
    Iteration   6: 2,654 ns/op
    Iteration   7: 2,641 ns/op
    Iteration   8: 2,636 ns/op
    Iteration   9: 2,642 ns/op
    Iteration  10: 2,805 ns/op
    
    # Run progress: 80,00% complete, ETA 00:00:30
    # Fork: 4 of 5
    # Warmup Iteration   1: 2,710 ns/op
    # Warmup Iteration   2: 2,549 ns/op
    # Warmup Iteration   3: 2,713 ns/op
    # Warmup Iteration   4: 2,616 ns/op
    # Warmup Iteration   5: 2,566 ns/op
    Iteration   1: 2,577 ns/op
    Iteration   2: 2,569 ns/op
    Iteration   3: 2,562 ns/op
    Iteration   4: 2,563 ns/op
    Iteration   5: 2,559 ns/op
    Iteration   6: 2,570 ns/op
    Iteration   7: 2,560 ns/op
    Iteration   8: 2,558 ns/op
    Iteration   9: 2,552 ns/op
    Iteration  10: 2,580 ns/op
    
    # Run progress: 90,00% complete, ETA 00:00:15
    # Fork: 5 of 5
    # Warmup Iteration   1: 2,461 ns/op
    # Warmup Iteration   2: 2,443 ns/op
    # Warmup Iteration   3: 2,465 ns/op
    # Warmup Iteration   4: 2,558 ns/op
    # Warmup Iteration   5: 2,554 ns/op
    Iteration   1: 2,547 ns/op
    Iteration   2: 2,636 ns/op
    Iteration   3: 2,553 ns/op
    Iteration   4: 2,568 ns/op
    Iteration   5: 2,582 ns/op
    Iteration   6: 2,586 ns/op
    Iteration   7: 2,559 ns/op
    Iteration   8: 2,657 ns/op
    Iteration   9: 2,567 ns/op
    Iteration  10: 2,565 ns/op
    
    
    Result "ru.gnkoshelev.jbreak2018.perf_tests.vector.ComputationBatchBenchmark.computeWithVectors":
      2,609 ±(99.9%) 0,029 ns/op [Average]
      (min, avg, max) = (2,547, 2,609, 2,827), stdev = 0,059
      CI (99.9%): [2,580, 2,639] (assumes normal distribution)
    
    
    # Run complete. Total time: 00:02:33
    
    Benchmark                                 Mode  Cnt  Score   Error  Units
    ComputationBatch...computeWithRawScalars  avgt   50  0,921 ± 0,004  ns/op
    ComputationBatch...computeWithVectors     avgt   50  2,609 ± 0,029  ns/op

    Почти 3-кратное различие. И это в условиях, когда объекты реально не создаются (что происходит, если не работает ScalarReplacement мы могли заметить по бенчмаркам выше).

    Что вообще происходит?


    Ключом к разгадке является ключевое слово final в классе VectorAlgebra.Vector.

    Прогоним ещё один бенчмарк, в котором будем сравнивать результаты computeWithVector() для двух классов: FinalVector и NonFinalVector:

        public final static class FinalVector {
            private final double x, y, z;
    
            public FinalVector(double x, double y, double z) {
                this.x = x; this.y = y; this.z = z;
            }
    
            public double squared() {
                return x * x + y * y + z * z;
            }
    
            public FinalVector crossProduct(FinalVector v) {
                return new FinalVector(
                        y * v.z - z * v.y,
                        z * v.x - x * v.z,
                        x * v.y - y * v.x);
            }
        }
    
        public final static class NonFinalVector {
            private double x, y, z;
    
            public NonFinalVector(double x, double y, double z) {
                this.x = x; this.y = y; this.z = z;
            }
    
            public double squared() {
                return x * x + y * y + z * z;
            }
    
            public NonFinalVector crossProduct(NonFinalVector v) {
                return new NonFinalVector(
                        y * v.z - z * v.y,
                        z * v.x - x * v.z,
                        x * v.y - y * v.x);
            }
        }

    Код бенчмарка целиком
    package ru.gnkoshelev.jbreak2018.perf_tests.vector;
    
    import org.openjdk.jmh.annotations.Benchmark;
    import org.openjdk.jmh.annotations.BenchmarkMode;
    import org.openjdk.jmh.annotations.Fork;
    import org.openjdk.jmh.annotations.Level;
    import org.openjdk.jmh.annotations.Measurement;
    import org.openjdk.jmh.annotations.Mode;
    import org.openjdk.jmh.annotations.OperationsPerInvocation;
    import org.openjdk.jmh.annotations.OutputTimeUnit;
    import org.openjdk.jmh.annotations.Scope;
    import org.openjdk.jmh.annotations.Setup;
    import org.openjdk.jmh.annotations.State;
    import org.openjdk.jmh.annotations.Warmup;
    import org.openjdk.jmh.infra.Blackhole;
    
    import java.util.concurrent.TimeUnit;
    
    /**
     * Created by kgn on 20.03.2018.
     */
    @Fork(value = 5, warmups = 0)
    @Warmup(iterations = 5, time = 1_000, timeUnit = TimeUnit.MILLISECONDS)
    @Measurement(iterations = 10, time = 1_000, timeUnit = TimeUnit.MILLISECONDS)
    @OutputTimeUnit(value = TimeUnit.NANOSECONDS)
    @BenchmarkMode(Mode.AverageTime)
    @State(Scope.Benchmark)
    public class FinalOrNotFinalBenchmark {
        private double x1, y1, z1;
        private double x2, y2, z2;
    
        @Setup(value = Level.Iteration)
        public void setup() {
            x1 = 123.4;
            y1 = 234.5;
            z1 = 345.6;
            x2 = 456.7;
            y2 = 567.8;
            z2 = 678.9;
        }
    
        @Benchmark
        @OperationsPerInvocation(10_000)
        public void computeWithFinalsBenchmark(Blackhole bh) {
            double sum = 0;
            for (int i = 0; i < 100_000; i++) {
                sum += computeWithFinals(x1, y1, z1, x2, y2, z2);
            }
            bh.consume(sum);
        }
    
        @Benchmark
        @OperationsPerInvocation(10_000)
        public void computeWithNonFinalsBenchmark(Blackhole bh) {
            double sum = 0;
            for (int i = 0; i < 100_000; i++) {
                sum += computeWithNonFinals(x1, y1, z1, x2, y2, z2);
            }
            bh.consume(sum);
        }
    
        public static double computeWithFinals(
                double x1, double y1, double z1,
                double x2, double y2, double z2) {
            FinalVector v1 = new FinalVector(x1, y1, z1);
            FinalVector v2 = new FinalVector(x2, y2, z2);
            return v1.crossProduct(v2).squared();
        }
    
        public static double computeWithNonFinals(
                double x1, double y1, double z1,
                double x2, double y2, double z2) {
            NonFinalVector v1 = new NonFinalVector(x1, y1, z1);
            NonFinalVector v2 = new NonFinalVector(x2, y2, z2);
            return v1.crossProduct(v2).squared();
        }
    
        public final static class FinalVector {
            private final double x, y, z;
    
            public FinalVector(double x, double y, double z) {
                this.x = x; this.y = y; this.z = z;
            }
    
            public double squared() {
                return x * x + y * y + z * z;
            }
    
            public FinalVector crossProduct(FinalVector v) {
                return new FinalVector(
                        y * v.z - z * v.y,
                        z * v.x - x * v.z,
                        x * v.y - y * v.x);
            }
        }
    
        public final static class NonFinalVector {
            private double x, y, z;
    
            public NonFinalVector(double x, double y, double z) {
                this.x = x; this.y = y; this.z = z;
            }
    
            public double squared() {
                return x * x + y * y + z * z;
            }
    
            public NonFinalVector crossProduct(NonFinalVector v) {
                return new NonFinalVector(
                        y * v.z - z * v.y,
                        z * v.x - x * v.z,
                        x * v.y - y * v.x);
            }
        }
    }

    Результат:
    Benchmark                                        Mode  Cnt  Score   Error  Units
    FinalOrNotFinal...computeWithFinalsBenchmark     avgt   50  2,618 ± 0,075  ns/op
    FinalOrNotFinal...computeWithNonFinalsBenchmark  avgt   50  0,929 ± 0,005  ns/op

    Результат бенчмарка целиком
    # JMH version: 1.20
    # VM version: JDK 1.8.0_161, VM 25.161-b12
    # VM invoker: C:\Program Files\Java\jre1.8.0_161\bin\java.exe
    # VM options: <none>
    # Warmup: 5 iterations, 1000 ms each
    # Measurement: 10 iterations, 1000 ms each
    # Timeout: 10 min per iteration
    # Threads: 1 thread, will synchronize iterations
    # Benchmark mode: Average time, time/op
    # Benchmark: ru.gnkoshelev.jbreak2018.perf_tests.vector.FinalOrNotFinalBenchmark.computeWithFinalsBenchmark
    
    # Run progress: 0,00% complete, ETA 00:02:30
    # Fork: 1 of 5
    # Warmup Iteration   1: 2,461 ns/op
    # Warmup Iteration   2: 2,434 ns/op
    # Warmup Iteration   3: 2,428 ns/op
    # Warmup Iteration   4: 2,543 ns/op
    # Warmup Iteration   5: 2,546 ns/op
    Iteration   1: 2,546 ns/op
    Iteration   2: 2,545 ns/op
    Iteration   3: 2,546 ns/op
    Iteration   4: 2,545 ns/op
    Iteration   5: 2,541 ns/op
    Iteration   6: 2,543 ns/op
    Iteration   7: 2,543 ns/op
    Iteration   8: 2,673 ns/op
    Iteration   9: 2,686 ns/op
    Iteration  10: 2,637 ns/op
    
    # Run progress: 10,00% complete, ETA 00:02:18
    # Fork: 2 of 5
    # Warmup Iteration   1: 2,487 ns/op
    # Warmup Iteration   2: 2,436 ns/op
    # Warmup Iteration   3: 2,431 ns/op
    # Warmup Iteration   4: 2,574 ns/op
    # Warmup Iteration   5: 2,560 ns/op
    Iteration   1: 2,581 ns/op
    Iteration   2: 2,575 ns/op
    Iteration   3: 2,600 ns/op
    Iteration   4: 2,633 ns/op
    Iteration   5: 2,573 ns/op
    Iteration   6: 2,628 ns/op
    Iteration   7: 2,568 ns/op
    Iteration   8: 2,553 ns/op
    Iteration   9: 2,582 ns/op
    Iteration  10: 2,603 ns/op
    
    # Run progress: 20,00% complete, ETA 00:02:02
    # Fork: 3 of 5
    # Warmup Iteration   1: 2,499 ns/op
    # Warmup Iteration   2: 2,570 ns/op
    # Warmup Iteration   3: 2,564 ns/op
    # Warmup Iteration   4: 2,655 ns/op
    # Warmup Iteration   5: 2,544 ns/op
    Iteration   1: 2,537 ns/op
    Iteration   2: 2,541 ns/op
    Iteration   3: 2,543 ns/op
    Iteration   4: 2,548 ns/op
    Iteration   5: 2,547 ns/op
    Iteration   6: 2,543 ns/op
    Iteration   7: 2,540 ns/op
    Iteration   8: 2,584 ns/op
    Iteration   9: 2,590 ns/op
    Iteration  10: 2,615 ns/op
    
    # Run progress: 30,00% complete, ETA 00:01:47
    # Fork: 4 of 5
    # Warmup Iteration   1: 2,474 ns/op
    # Warmup Iteration   2: 2,524 ns/op
    # Warmup Iteration   3: 2,457 ns/op
    # Warmup Iteration   4: 2,607 ns/op
    # Warmup Iteration   5: 2,573 ns/op
    Iteration   1: 2,574 ns/op
    Iteration   2: 2,569 ns/op
    Iteration   3: 2,806 ns/op
    Iteration   4: 2,735 ns/op
    Iteration   5: 2,570 ns/op
    Iteration   6: 2,709 ns/op
    Iteration   7: 2,556 ns/op
    Iteration   8: 2,551 ns/op
    Iteration   9: 2,561 ns/op
    Iteration  10: 2,569 ns/op
    
    # Run progress: 40,00% complete, ETA 00:01:31
    # Fork: 5 of 5
    # Warmup Iteration   1: 2,464 ns/op
    # Warmup Iteration   2: 2,537 ns/op
    # Warmup Iteration   3: 2,568 ns/op
    # Warmup Iteration   4: 2,766 ns/op
    # Warmup Iteration   5: 2,607 ns/op
    Iteration   1: 2,687 ns/op
    Iteration   2: 2,573 ns/op
    Iteration   3: 2,553 ns/op
    Iteration   4: 2,527 ns/op
    Iteration   5: 2,608 ns/op
    Iteration   6: 2,550 ns/op
    Iteration   7: 2,775 ns/op
    Iteration   8: 2,570 ns/op
    Iteration   9: 3,349 ns/op
    Iteration  10: 3,218 ns/op
    
    
    Result "ru.gnkoshelev.jbreak2018.perf_tests.vector.FinalOrNotFinalBenchmark.computeWithFinalsBenchmark":
      2,618 ±(99.9%) 0,075 ns/op [Average]
      (min, avg, max) = (2,527, 2,618, 3,349), stdev = 0,152
      CI (99.9%): [2,543, 2,693] (assumes normal distribution)
    
    
    # JMH version: 1.20
    # VM version: JDK 1.8.0_161, VM 25.161-b12
    # VM invoker: C:\Program Files\Java\jre1.8.0_161\bin\java.exe
    # VM options: <none>
    # Warmup: 5 iterations, 1000 ms each
    # Measurement: 10 iterations, 1000 ms each
    # Timeout: 10 min per iteration
    # Threads: 1 thread, will synchronize iterations
    # Benchmark mode: Average time, time/op
    # Benchmark: ru.gnkoshelev.jbreak2018.perf_tests.vector.FinalOrNotFinalBenchmark.computeWithNonFinalsBenchmark
    
    # Run progress: 50,00% complete, ETA 00:01:16
    # Fork: 1 of 5
    # Warmup Iteration   1: 0,966 ns/op
    # Warmup Iteration   2: 0,920 ns/op
    # Warmup Iteration   3: 0,967 ns/op
    # Warmup Iteration   4: 0,910 ns/op
    # Warmup Iteration   5: 0,932 ns/op
    Iteration   1: 0,917 ns/op
    Iteration   2: 0,919 ns/op
    Iteration   3: 0,926 ns/op
    Iteration   4: 0,917 ns/op
    Iteration   5: 0,915 ns/op
    Iteration   6: 0,908 ns/op
    Iteration   7: 0,924 ns/op
    Iteration   8: 0,923 ns/op
    Iteration   9: 0,927 ns/op
    Iteration  10: 0,925 ns/op
    
    # Run progress: 60,00% complete, ETA 00:01:01
    # Fork: 2 of 5
    # Warmup Iteration   1: 0,947 ns/op
    # Warmup Iteration   2: 0,929 ns/op
    # Warmup Iteration   3: 0,945 ns/op
    # Warmup Iteration   4: 0,926 ns/op
    # Warmup Iteration   5: 0,931 ns/op
    Iteration   1: 0,924 ns/op
    Iteration   2: 0,925 ns/op
    Iteration   3: 0,925 ns/op
    Iteration   4: 0,930 ns/op
    Iteration   5: 0,929 ns/op
    Iteration   6: 0,934 ns/op
    Iteration   7: 0,927 ns/op
    Iteration   8: 0,942 ns/op
    Iteration   9: 0,947 ns/op
    Iteration  10: 0,932 ns/op
    
    # Run progress: 70,00% complete, ETA 00:00:46
    # Fork: 3 of 5
    # Warmup Iteration   1: 0,940 ns/op
    # Warmup Iteration   2: 0,930 ns/op
    # Warmup Iteration   3: 0,929 ns/op
    # Warmup Iteration   4: 0,928 ns/op
    # Warmup Iteration   5: 0,930 ns/op
    Iteration   1: 0,925 ns/op
    Iteration   2: 0,928 ns/op
    Iteration   3: 0,933 ns/op
    Iteration   4: 0,931 ns/op
    Iteration   5: 0,928 ns/op
    Iteration   6: 0,932 ns/op
    Iteration   7: 0,928 ns/op
    Iteration   8: 0,932 ns/op
    Iteration   9: 0,936 ns/op
    Iteration  10: 0,930 ns/op
    
    # Run progress: 80,00% complete, ETA 00:00:30
    # Fork: 4 of 5
    # Warmup Iteration   1: 0,944 ns/op
    # Warmup Iteration   2: 0,931 ns/op
    # Warmup Iteration   3: 0,925 ns/op
    # Warmup Iteration   4: 0,963 ns/op
    # Warmup Iteration   5: 0,929 ns/op
    Iteration   1: 0,930 ns/op
    Iteration   2: 0,926 ns/op
    Iteration   3: 0,923 ns/op
    Iteration   4: 0,929 ns/op
    Iteration   5: 0,929 ns/op
    Iteration   6: 0,933 ns/op
    Iteration   7: 0,927 ns/op
    Iteration   8: 0,931 ns/op
    Iteration   9: 0,926 ns/op
    Iteration  10: 0,934 ns/op
    
    # Run progress: 90,00% complete, ETA 00:00:15
    # Fork: 5 of 5
    # Warmup Iteration   1: 0,939 ns/op
    # Warmup Iteration   2: 0,931 ns/op
    # Warmup Iteration   3: 0,935 ns/op
    # Warmup Iteration   4: 0,928 ns/op
    # Warmup Iteration   5: 0,932 ns/op
    Iteration   1: 0,985 ns/op
    Iteration   2: 0,931 ns/op
    Iteration   3: 0,930 ns/op
    Iteration   4: 0,928 ns/op
    Iteration   5: 0,932 ns/op
    Iteration   6: 0,926 ns/op
    Iteration   7: 0,929 ns/op
    Iteration   8: 0,932 ns/op
    Iteration   9: 0,926 ns/op
    Iteration  10: 0,923 ns/op
    
    
    Result "ru.gnkoshelev.jbreak2018.perf_tests.vector.FinalOrNotFinalBenchmark.computeWithNonFinalsBenchmark":
      0,929 ±(99.9%) 0,005 ns/op [Average]
      (min, avg, max) = (0,908, 0,929, 0,985), stdev = 0,010
      CI (99.9%): [0,924, 0,934] (assumes normal distribution)
    
    
    # Run complete. Total time: 00:02:33
    
    Benchmark                                               Mode  Cnt  Score   Error  Units
    FinalOrNotFinalBenchmark.computeWithFinalsBenchmark     avgt   50  2,618 ± 0,075  ns/op
    FinalOrNotFinalBenchmark.computeWithNonFinalsBenchmark  avgt   50  0,929 ± 0,005  ns/op

    Почему так? Есть гипотеза, что семантика final-полей, описанная в спецификации (JLS 17.5.1: Semantics of final Fields) в части freeze action оказывает влияние на применение Scalar Replacement и результат JIT-компиляции метода.

    Если у кого-то есть строгое объяснение (например, со ссылкой на спецификацию), почему JIT-компилятор ведёт себя таким образом — пишите в комментариях.

    UPD. Отправил баг в Oracle, было получено подтверждение — JDK-8200412.

    Заключение


    На интересный результат работы с final наткнулся совершенно случайно в ходе подготовки задач. Разумеется, такие или более глубокие погружения не требовались для правильного ответа на задачу:
    Оба алгоритма дают одинаковый результат по производительности, т.к. благодаря Escape Analysis и Scalar Replacement не будут создаваться объекты Vector

    Статистика


    Среди 32 сданных вариантов было 4 верных ответа (EA/SR) и ещё 3 частично правильных ответа.

    Java 9


    Стало любопытно, как эти же бенчмарки отработают в JRE 9 (jre 9.0.4) — ниже результаты прогона бенчмарков:
    Benchmark                                        Mode  Cnt   Score   Error  Units
    ComputationOnly...computeWithRawScalars          avgt   50   5,071 ± 0,114  ns/op
    ComputationOnly...computeWithVectors             avgt   50   5,000 ± 0,106  ns/op
    CreateAndConsume...consumeDouble                 avgt   50   3,223 ± 0,147  ns/op
    CreateAndConsume...consumeObject                 avgt   50   3,222 ± 0,130  ns/op
    CreateAndConsume...createAndConsumeObject        avgt   50   4,236 ± 0,052  ns/op
    CreateAndConsume...createAndConsumeSingleVector  avgt   50   7,497 ± 0,188  ns/op
    CreateAndConsume...createAndConsumeThreeVectors  avgt   50  21,976 ± 0,677  ns/op
    CreateAndConsume...createAndConsumeTwoVectors    avgt   50  14,247 ± 0,339  ns/op

    Получившиеся результаты полностью коррелирует с полученными ранее (для jre 1.8.0_161).

    А вот для пары других бенчмарков получилось интересно:
    Benchmark                                        Mode  Cnt   Score   Error  Units
    ComputationBatch...computeWithRawScalars         avgt   50   0,924 ± 0,008  ns/op
    ComputationBatch...computeWithVectors            avgt   50   0,920 ± 0,007  ns/op
    FinalOrNotFinal...computeWithFinalsBenchmark     avgt   50   0,931 ± 0,021  ns/op
    FinalOrNotFinal...computeWithNonFinalsBenchmark  avgt   50   0,922 ± 0,006  ns/op

    Полученный результат идентичен — независимо от того, final-поля в классе или нет.

    P.S.


    Код бенчмарков на GitHub: jbreak2018-vector-perf-tests.

    UPD. Другие публикации серии: Часть 1, Часть 2, Часть 4.

    Только зарегистрированные пользователи могут участвовать в опросе. Войдите, пожалуйста.

    Понравилась отсылка к ключевой оптимизации в условии задачи?
    • +16
    • 3,1k
    • 7

    Контур

    109,24

    Делаем веб-сервисы для бизнеса

    Поделиться публикацией
    Комментарии 7
      0
      Ява не умеет AVX использовать?
        +1
        На моём процессоре Intel Core i7-4710MQ представленный код компилируется в SIMD-инструкции vaddsd, vmulsd и др.
        +1

        Мне кажется пора менять акцент с конкретных задачек, на описание алгоритмов и оптимизаций реализованных в JVM. Это улучшит общее понимание проблемы и даст возможность самостоятельно анализировать код.

          +1
          Акцент во всех трёх частях был на изучении поведения и выявлении его причин.
          Возможно, вы правы, и от академического изложения пользы для читателя будет больше. Попробую четвёртую задачу разобрать в другом ключе, благо сам текст написан лишь частично.
          +2
          Не заметил, как у меня во время редактирования пропал кусок разметки с самым интересным результатом работы бенчмарка Vector-with-final-fields vs Vector-with-non-final-fields на JRE 1.8.0_161:
          Benchmark                                        Mode  Cnt  Score   Error  Units
          FinalOrNotFinal...computeWithFinalsBenchmark     avgt   50  2,618 ± 0,075  ns/op
          FinalOrNotFinal...computeWithNonFinalsBenchmark  avgt   50  0,929 ± 0,005  ns/op

          Кроме самого факта необычности результата, интересно и то, что на JRE 9.0.4 результат не воспроизводится.

          Спасибо sheknitrtch за найденную ошибку!
            0
            Опубликована последняя четвёртая часть: Разбор перформансных задач с JBreak (часть 4).
              +1
              Отправил соответствующий баг в Oracle по странному различию в производительности по сути идентичного кода (см. FinalOrNotFinalBenchmark). Сегодня подтвердили баг: JDK-8200412.

              Только полноправные пользователи могут оставлять комментарии. Войдите, пожалуйста.

              Самое читаемое