Rick

Rick
Rick

Wednesday, April 2, 2014

Improved Path Expressions by 3x to 5x for complex list navigation expressions.

Path expressions seem only about 1/3 to 1/6 the speed of hand coded Java which is pretty good considering.

Started here

Benchmark                                                                     Mode Thr     Count  Sec         Mean   Mean error    Units
o.b.PathExpression.employeeExtractionBenchBoon                               thrpt   8         5    1     1271.227       56.568    ops/s
o.b.PathExpression.employeeExtractionBenchJava                               thrpt   8         5    1    12588.107      342.374    ops/s

o.b.PathExpression.getAllOfTheEmployeesFirstNameFromDepartmentBoonBench      thrpt   8         5    1      102.523       10.285    ops/s
o.b.PathExpression.getAllOfTheEmployeesFirstNameFromDepartmentJavaBench      thrpt   8         5    1     3195.343      131.266    ops/s


o.b.PathExpression.getAllOfTheEmployeesFirstNameFromEmployeeListBoonBench    thrpt   8         5    1      119.087        6.978    ops/s
o.b.PathExpression.getAllOfTheEmployeesFirstNameFromEmployeeListJavaBench    thrpt   8         5    1     3176.597      295.307    ops/s

Made it to here

Benchmark                                                                     Mode Thr     Count  Sec         Mean   Mean error    Units
o.b.PathExpression.employeeExtractionBenchBoon                               thrpt   8         5    1     1432.687       43.977    ops/s
o.b.PathExpression.employeeExtractionBenchJava                               thrpt   8         5    1    12036.937       65.131    ops/s

o.b.PathExpression.getAllOfTheEmployeesFirstNameFromDepartmentBoonBench      thrpt   8         5    1      587.150       71.170    ops/s
o.b.PathExpression.getAllOfTheEmployeesFirstNameFromDepartmentJavaBench      thrpt   8         5    1     3333.123      682.546    ops/s

o.b.PathExpression.getAllOfTheEmployeesFirstNameFromEmployeeListBoonBench    thrpt   8         5    1      927.043       65.186    ops/s
o.b.PathExpression.getAllOfTheEmployeesFirstNameFromEmployeeListJavaBench    thrpt   8         5    1     3279.203      282.563    ops/s
We are 5x to 10x faster than when we started. There might be some more optimization. But it is looking a lot better. :)

Notes on benchmarking.
This is more notebook than docs.
I have been benchmarking different aspect of Boon and improving them.
I did this for JSON (now faster than Jackson and GSON). I did this String parsing (now faster than JDK string parsing). And now I am looking at path expressions.
    @GenerateMicroBenchmark
    @OutputTimeUnit(TimeUnit.SECONDS)
    public void employeeExtractionBenchBoon(BlackHole bh) throws Exception {
        bh.consume(employeeExtractionBoon());
    }

    private Object employeeExtractionBoon() {
         return  atIndex(departments, "this.employees");
    }

    @GenerateMicroBenchmark
    @OutputTimeUnit(TimeUnit.SECONDS)
    public void employeeExtractionBenchJava(BlackHole bh) throws Exception {
        bh.consume(employeeExtractionJava());
    }

    private Object employeeExtractionJava() {

        List<Employee> employees = new ArrayList();

        for (Department department : departments) {
            employees.addAll(department.employees);
        }
        return employees;
    }


I am using JMH and as you can see from above. I am just using two ways to iterate through a list and extract employees from departments. There are only four departments so this might not be the best way to test.
Here are the initial results:
Benchmark                                          Mode Thr     Count  Sec         Mean   Mean error    Units
o.b.PathExpression.employeeExtractionBenchBoon    thrpt   8         5    1  3192132.090   164404.276    ops/s
o.b.PathExpression.employeeExtractionBenchJava    thrpt   8         5    1 54021283.153 21598841.289    ops/s
3,192,132 operations per second versus 54,021,283 operations per second.
The time difference between plain java and the Boon expression is 15x. It is 15 times faster to use plain Java for grabbing list of lists. 16.9 to be exact. Next I will start up visualvm and see where Boon is spending its time.
According to jvisualvm profiler Boon is spending most of its time in Conversions.unifyList. Then BeanUtils.getFieldValues, and thenBeanUtils.getPropertyValues. With Conversions.unifyList is in the far lead. It is a nasty bit of code to unify lists of lists into a single list.
I actually did not think it would call unifyList is this use case, but now that I think about it of course it does.
jvisualvm tells a different story. It says unifyList, splitByCharsNoneEmpty are about equal in time. The Str.splitByCharsNoneEmpty made the list, but not by much.
Let me add some more paths to the mix.
    @GenerateMicroBenchmark
    @OutputTimeUnit(TimeUnit.SECONDS)
    public void getAllOfTheEmployeesFirstNameFromEmployeeListBoonBench(BlackHole bh) throws Exception {
        bh.consume(getAllOfTheEmployeesFirstNameFromEmployeeListBoon());
    }

    private Object getAllOfTheEmployeesFirstNameFromEmployeeListBoon() {
        return atIndex(allEmployees, "firstName");
    }


    @GenerateMicroBenchmark
    @OutputTimeUnit(TimeUnit.SECONDS)
    public void getAllOfTheEmployeesFirstNameFromEmployeeListJavaBench(BlackHole bh) throws Exception {
        bh.consume(getAllOfTheEmployeesFirstNameFromEmployeeListJava());
    }

    private Object getAllOfTheEmployeesFirstNameFromEmployeeListJava() {

        List<String> employeesFirstNames = new ArrayList();

        for (Employee employee : allEmployees) {
            employeesFirstNames.add(employee.getFirstName());
        }
        return employeesFirstNames;
    }

The above is a path that should not hit unifyList.
Here is another path that will hit unifyList
    @GenerateMicroBenchmark
    @OutputTimeUnit(TimeUnit.SECONDS)
    public void getAllOfTheEmployeesFirstNameFromDepartmentBoonBench(BlackHole bh) throws Exception {
        bh.consume(getAllOfTheEmployeesFirstNameFromDepartmentBoon());
    }

    private Object getAllOfTheEmployeesFirstNameFromDepartmentBoon() {
        return atIndex(departments, "employees.firstName");
    }

    @GenerateMicroBenchmark
    @OutputTimeUnit(TimeUnit.SECONDS)
    public void getAllOfTheEmployeesFirstNameFromDepartmentJavaBench(BlackHole bh) throws Exception {
        bh.consume(getAllOfTheEmployeesFirstNameFromDepartmentJava());
    }

    private Object getAllOfTheEmployeesFirstNameFromDepartmentJava() {

        List<String> employeesFirstNames = new ArrayList();

        for (Department department : departments) {
            for (Employee employee : department.getEmployees()) {
                employeesFirstNames.add(employee.getFirstName());
            }
        }
        return employeesFirstNames;
    }

For this benchmark. I am using 100,000 employees (generated with mostly random names).
A test run in JMH looks like this:
$ java -jar target/microbenchmarks.jar ".*PathExpression.*" -wi 5 -i 5 -f 1 -t 8

Here are the results.
Benchmark                                                                     Mode Thr     Count  Sec         Mean   Mean error    Units
o.b.PathExpression.employeeExtractionBenchBoon                               thrpt   8         5    1  2723259.153  1145620.830    ops/s
o.b.PathExpression.employeeExtractionBenchJava                               thrpt   8         5    1 48914088.320  5119843.984    ops/s
o.b.PathExpression.getAllOfTheEmployeesFirstNameFromDepartmentBoonBench      thrpt   8         5    1   903079.780    30861.050    ops/s
o.b.PathExpression.getAllOfTheEmployeesFirstNameFromDepartmentJavaBench      thrpt   8         5    1 45053486.367  4804329.054    ops/s
o.b.PathExpression.getAllOfTheEmployeesFirstNameFromEmployeeListBoonBench    thrpt   8         5    1  1324116.787    81300.600    ops/s
o.b.PathExpression.getAllOfTheEmployeesFirstNameFromEmployeeListJavaBench    thrpt   8         5    1 51982899.700  3694392.319    ops/s
I was expecting different results and it looks like I messed up and did not add the 100K auto-generated employees. Damn it.
Fixed. Let me try again.
Ok. Now here is actually processing 100K employees.
Benchmark                                                                     Mode Thr     Count  Sec         Mean   Mean error    Units
o.b.PathExpression.employeeExtractionBenchBoon                               thrpt   8         5    1     1271.227       56.568    ops/s
o.b.PathExpression.employeeExtractionBenchJava                               thrpt   8         5    1    12588.107      342.374    ops/s

Boon 10x slower than doing it by Java by hand.

o.b.PathExpression.getAllOfTheEmployeesFirstNameFromDepartmentBoonBench      thrpt   8         5    1      102.523       10.285    ops/s
o.b.PathExpression.getAllOfTheEmployeesFirstNameFromDepartmentJavaBench      thrpt   8         5    1     3195.343      131.266    ops/s

Boon 30x slower than Java by hand (for path expressions)

o.b.PathExpression.getAllOfTheEmployeesFirstNameFromEmployeeListBoonBench    thrpt   8         5    1      119.087        6.978    ops/s
o.b.PathExpression.getAllOfTheEmployeesFirstNameFromEmployeeListJavaBench    thrpt   8         5    1     3176.597      295.307    ops/s

Boon 30x slower than doing it by hand. 
I was expecting getAllOfTheEmployeesFirstNameFromEmployeeListBoonBench to do much better so I will start there with the profiling.
I run the test again but this time with just the method I want to profile.
java -jar target/microbenchmarks.jar ".PathExpression.*getAllOfTheEmployeesFirstNameFromEmployeeListBoonBench." -wi 5 -i 5000 -f 1 -t 8
And a lot of iteration so I can look at jvisualvm and think.
When I do this Reflection.combineFieldMaps is taking the most time (I don't remember exactly what this does)... SplitByCharsNoneEmpty is taking the second most time. I know how to fix that one. Then we have BeanUtils.getPropertyValue. This is with the jvisualvmSampler. Now let's look at the jvisualvm profiler. It says splitbychars is where I want to look the most and Reflection.combineFieldMaps a far second.
Ok.. So I cache the results of a string split, and I fixed the combineFieldMaps problem (or at least I think I did).
java -jar target/microbenchmarks.jar ".PathExpression." -wi 5 -i 5 -f 1 -t 8
Benchmark                                                                     Mode Thr     Count  Sec         Mean   Mean error    Units
o.b.PathExpression.employeeExtractionBenchBoon                               thrpt   8         5    1     1249.670       95.282    ops/s
o.b.PathExpression.employeeExtractionBenchJava                               thrpt   8         5    1    12028.693      124.035    ops/s

10x slower still.

o.b.PathExpression.getAllOfTheEmployeesFirstNameFromDepartmentBoonBench      thrpt   8         5    1      217.363       11.890    ops/s
o.b.PathExpression.getAllOfTheEmployeesFirstNameFromDepartmentJavaBench      thrpt   8         5    1     3227.960      233.379    ops/s

Now instead of 30x slower, we are at 11x so almost a 3x improvement in speed.

o.b.PathExpression.getAllOfTheEmployeesFirstNameFromEmployeeListBoonBench    thrpt   8         5    1      292.207       20.021    ops/s
o.b.PathExpression.getAllOfTheEmployeesFirstNameFromEmployeeListJavaBench    thrpt   8         5    1     3241.347      167.004    ops/s

From 30x slower to 12 x slower so like a 2.75 x improvements in speed.

2x to 3x improvement in speed is not too bad for a couple hours of performance tuning.
Lets run jvisualvm again and see what the long poles are....
$ java -jar target/microbenchmarks.jar ".*PathExpression.*getAllOfTheEmployeesFirstNameFromEmployeeListBoonBench.*" -wi 5 -i 5000 -f 0 -t 8
# Fork: N/A, test runs in the existing VM
# Warmup: 5 iterations, 1 s each
# Measurement: 5000 iterations, 1 s each
# Threads: 8 threads, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Running: org.boon.PathExpression.getAllOfTheEmployeesFirstNameFromEmployeeListBoonBench
# Warmup Iteration   1: 308.500 ops/s
# Warmup Iteration   2: 314.417 ops/s
# Warmup Iteration   3: 311.867 ops/s
# Warmup Iteration   4: 321.533 ops/s
# Warmup Iteration   5: 312.867 ops/s
Iteration   1: 314.667 ops/s
Iteration   2: 318.733 ops/s
Iteration   3: 319.383 ops/s
Iteration   4: 319.700 ops/s
Iteration   5: 264.350 ops/s
Iteration   6: 266.783 ops/s

According to jvisualvm sampler, I need to look at BeanUtils.atIndex (which is the method we are using... so...). Next up is BeanUtils.getPropertyValue, then we have unifyList (which I don't know why this is getting called per se). Also FastStringUtils.toCharArray (and I don't know why this is getting called at all). Those are the tops. Now let's look at the profiler. The profiler says look at BeanUtils.getFieldValues, getPropertyValue, Conversion.unifyList, StringScanner.isDigits (which is the one that calls FastStringUtils.toCharArray).
I optimized unifyList, and I avoid the call to FastStringUtils.toCharArray altogether as well as StringScanner.isDigits.
Benchmark                                                                     Mode Thr     Count  Sec         Mean   Mean error    Units
o.b.PathExpression.employeeExtractionBenchBoon                               thrpt   8         5    1     1479.100       90.829    ops/s
o.b.PathExpression.employeeExtractionBenchJava                               thrpt   8         5    1    11888.187      546.265    ops/s

Now it is only 8x slower instead of 10.

o.b.PathExpression.getAllOfTheEmployeesFirstNameFromDepartmentBoonBench      thrpt   8         5    1      265.120       17.130    ops/s
o.b.PathExpression.getAllOfTheEmployeesFirstNameFromDepartmentJavaBench      thrpt   8         5    1     3170.723      284.468    ops/s

Looks like about a 20% improvement here.

o.b.PathExpression.getAllOfTheEmployeesFirstNameFromEmployeeListBoonBench    thrpt   8         5    1      323.763       20.463    ops/s
o.b.PathExpression.getAllOfTheEmployeesFirstNameFromEmployeeListJavaBench    thrpt   8         5    1     3229.353      143.103    ops/s

Looks to be about a 20% improvement here to.

So it is not time to crack open the vodka yet, but it is getting better.
It looks like getFieldValues is the long poll.
I did some tweaks to getFieldValues and this is what I got now...
Benchmark                                                                     Mode Thr     Count  Sec         Mean   Mean error    Units
o.b.PathExpression.employeeExtractionBenchBoon                               thrpt   8         5    1     1563.507      142.736    ops/s
o.b.PathExpression.employeeExtractionBenchJava                               thrpt   8         5    1    12033.940      270.036    ops/s
o.b.PathExpression.getAllOfTheEmployeesFirstNameFromDepartmentBoonBench      thrpt   8         5    1      279.177       12.791    ops/s
o.b.PathExpression.getAllOfTheEmployeesFirstNameFromDepartmentJavaBench      thrpt   8         5    1     3287.407      301.868    ops/s
o.b.PathExpression.getAllOfTheEmployeesFirstNameFromEmployeeListBoonBench    thrpt   8         5    1      378.337       35.866    ops/s
o.b.PathExpression.getAllOfTheEmployeesFirstNameFromEmployeeListJavaBench    thrpt   8         5    1     3263.107      232.833    ops/s
There are some improvements but nothing to write home about yet.
Now everything points to getPropertyValue. :) Slay one dragon and Pareto has you focused on another.
getFieldValues is still around but no where near the top.
Ok so now another round of optimization.
Benchmark                                                                     Mode Thr     Count  Sec         Mean   Mean error    Units
o.b.PathExpression.employeeExtractionBenchBoon                               thrpt   8         5    1     1360.407      202.870    ops/s
o.b.PathExpression.employeeExtractionBenchJava                               thrpt   8         5    1    11996.353       54.584    ops/s
o.b.PathExpression.getAllOfTheEmployeesFirstNameFromDepartmentBoonBench      thrpt   8         5    1      570.703       29.017    ops/s
o.b.PathExpression.getAllOfTheEmployeesFirstNameFromDepartmentJavaBench      thrpt   8         5    1     3291.660      250.998    ops/s
o.b.PathExpression.getAllOfTheEmployeesFirstNameFromEmployeeListBoonBench    thrpt   8         5    1      946.940       13.226    ops/s
o.b.PathExpression.getAllOfTheEmployeesFirstNameFromEmployeeListJavaBench    thrpt   8         5    1     3298.597      185.509    ops/s
Looks like that last round helped out a lot. We increased our throughput by 2x to 3x.

Summary

Started here

Benchmark                                                                     Mode Thr     Count  Sec         Mean   Mean error    Units
o.b.PathExpression.employeeExtractionBenchBoon                               thrpt   8         5    1     1271.227       56.568    ops/s
o.b.PathExpression.employeeExtractionBenchJava                               thrpt   8         5    1    12588.107      342.374    ops/s

o.b.PathExpression.getAllOfTheEmployeesFirstNameFromDepartmentBoonBench      thrpt   8         5    1      102.523       10.285    ops/s
o.b.PathExpression.getAllOfTheEmployeesFirstNameFromDepartmentJavaBench      thrpt   8         5    1     3195.343      131.266    ops/s


o.b.PathExpression.getAllOfTheEmployeesFirstNameFromEmployeeListBoonBench    thrpt   8         5    1      119.087        6.978    ops/s
o.b.PathExpression.getAllOfTheEmployeesFirstNameFromEmployeeListJavaBench    thrpt   8         5    1     3176.597      295.307    ops/s

Made it to here

Benchmark                                                                     Mode Thr     Count  Sec         Mean   Mean error    Units
o.b.PathExpression.employeeExtractionBenchBoon                               thrpt   8         5    1     1432.687       43.977    ops/s
o.b.PathExpression.employeeExtractionBenchJava                               thrpt   8         5    1    12036.937       65.131    ops/s

o.b.PathExpression.getAllOfTheEmployeesFirstNameFromDepartmentBoonBench      thrpt   8         5    1      587.150       71.170    ops/s
o.b.PathExpression.getAllOfTheEmployeesFirstNameFromDepartmentJavaBench      thrpt   8         5    1     3333.123      682.546    ops/s

o.b.PathExpression.getAllOfTheEmployeesFirstNameFromEmployeeListBoonBench    thrpt   8         5    1      927.043       65.186    ops/s
o.b.PathExpression.getAllOfTheEmployeesFirstNameFromEmployeeListJavaBench    thrpt   8         5    1     3279.203      282.563    ops/s
We are 5x to 10x faster than when we started. There might be some more optimization. But it is looking a lot better. :)

No comments:

Post a Comment

Kafka and Cassandra support, training for AWS EC2 Cassandra 3.0 Training