multithreading - Explicit vs intrinsic locking degradation in java -
i have 2 identical classes find next fibonacci number. difference 1 of them uses intrinsic locking , other uses explicit locking. instrinsic locking implementation more faster 1 explicit locking , faster stm or lock free implementations.
impl/threads | 1 | 2 | 4 | 8 | 16 | --- | --- | --- | --- | --- | --- | intrinsiclocking | 4959 | 3132 | 3458 | 3059 | 3402 | explicitlocking | 4112 | 5348 | 6478 | 12895 | 13492 | stm | 5193 | 5210 | 4899 | 5259 | 6733 | lockfree | 4362 | 3601 | 3660 | 4732 | 4923 |
the table shows average time of single computation of next fibonacci number. tested on java 8 , 7. code placed on github https://github.com/f0y/fibench/tree/master/src/main/java/fibonacci/mdl
can explain why intrinsic locking implementation wins?
that benchmark wrong on many levels, not make sense discuss results yet.
here's simple cut jmh benchmark:
package fibonacci.bench; import fibonacci.mdl.explicitlocking; import fibonacci.mdl.fibonaccigenerator; import fibonacci.mdl.intrinsiclocking; import fibonacci.mdl.lockfree; import fibonacci.mdl.stm; import org.openjdk.jmh.annotations.benchmarkmode; import org.openjdk.jmh.annotations.generatemicrobenchmark; import org.openjdk.jmh.annotations.level; import org.openjdk.jmh.annotations.measurement; import org.openjdk.jmh.annotations.mode; import org.openjdk.jmh.annotations.scope; import org.openjdk.jmh.annotations.setup; import org.openjdk.jmh.annotations.state; import org.openjdk.jmh.annotations.warmup; import java.math.biginteger; /* implementation notes: * benchmark not exhibit steady state, means can not timed runs. instead, have time single invocations; therefore preset benchmark mode. * each iteration should start pristine state, therefore reinitialize in @setup(iteration). * since interested in performance beyond first invocation, have call several times , aggregate time; why have batchsize > 1. note performance might different depending on given batch size. * since have provide warmup, many iterations. * performance different run run, because measuring undeterministic thread allocations. jmh us, hence multiple forks. * don't want profiles difference fibonaccigenerator mix up. jmh takes care of forking each test. */ @benchmarkmode(mode.singleshottime) @warmup(iterations = 100, batchsize = jmhbench.batch_size) @measurement(iterations = 100, batchsize = jmhbench.batch_size) @state(scope.benchmark) public class jmhbench { public static final int batch_size = 50000; private fibonaccigenerator explicitlock; private intrinsiclocking intrinsiclock; private lockfree lockfree; private stm stm; @setup(level.iteration) public void setup() { explicitlock = new explicitlocking(); intrinsiclock = new intrinsiclocking(); lockfree = new lockfree(); stm = new stm(); } @generatemicrobenchmark public biginteger stm() { return stm.next(); } @generatemicrobenchmark public biginteger explicitlock() { return explicitlock.next(); } @generatemicrobenchmark public biginteger intrinsiclock() { return intrinsiclock.next(); } @generatemicrobenchmark public biginteger lockfree() { return lockfree.next(); } }
on linux x86_64, jdk 8b129 , 4 threads yeilds:
benchmark mode samples mean mean error units f.b.jmhbench.explicitlock ss 100 1010.921 31.117 ms f.b.jmhbench.intrinsiclock ss 100 1121.355 20.386 ms f.b.jmhbench.lockfree ss 100 1848.635 83.700 ms f.b.jmhbench.stm ss 100 1893.477 52.665 ms
Comments
Post a Comment