Skip to content

Commit 68e036f

Browse files
authored
Merge pull request #5 from franz1981/ap
Add async-profiler JMH instructions
2 parents 7b0d520 + 8a1e634 commit 68e036f

File tree

7 files changed

+135
-16
lines changed

7 files changed

+135
-16
lines changed

README.md

Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
## How to run a specific JMH benchmark attaching async-profiler?
2+
3+
Install https://github.com/async-profiler/async-profiler/releases/tag/v3.0
4+
and configure it as described as per https://github.com/async-profiler/async-profiler?tab=readme-ov-file#basic-usage.
5+
6+
Build a proper JMH benchmark jar to have more options to run the benchmark.
7+
```shell
8+
$ ./gradlew jmhJar
9+
```
10+
11+
On x86 Linux, run `QueryEntityLazyInitCollectionLoop` attaching wall/cpu and alloc profiling, gc and perfnorm
12+
with 2 forks and 100 loop iterations (it's a specific benchmark parameter):
13+
```shell
14+
$ java -jar basic/target/libs/hibernate-orm-benchmark-basic-1.0-SNAPSHOT-jmh.jar QueryEntityLazyInitCollectionLoop -f 2 -prof gc -prof perfnorm -prof "async:rawCommand=alloc,wall;event=cpu;output=jfr;dir=/tmp;libPath=${ASYNC_PROFILER_HOME}/lib/libasyncProfiler.so" -pcount=100
15+
```
16+
The `rawCommand` parameter has been used to additionally set secondary events which are not yet included into the JMH async-profiler plugin
17+
i.e. wall-clock profiling and allocation profiling (using the default `alloc` event thresholds, see `JVMTI_EVENT_SAMPLED_OBJECT_ALLOC` at https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html i.e. 512KB).
18+
19+
To change the allocation threshold replace `alloc` with `alloc=128` for 128 bytes threshold.
20+
21+
Running the previous command should print something like (assuming Linux perf to be installed):
22+
```
23+
Secondary result "org.hibernate.benchmark.queryl1hit.QueryEntityLazyInitCollectionLoop.test:·async":
24+
Async profiler results:
25+
/tmp/org.hibernate.benchmark.queryl1hit.QueryEntityLazyInitCollectionLoop.test-Throughput-count-100/jfr-cpu.jfr
26+
Async profiler results:
27+
/tmp/org.hibernate.benchmark.queryl1hit.QueryEntityLazyInitCollectionLoop.test-Throughput-count-100/jfr-cpu.jfr
28+
```
29+
and finally the results:
30+
```
31+
Benchmark (count) Mode Cnt Score Error Units
32+
QueryEntityLazyInitCollectionLoop.test 100 thrpt 6 40.391 ± 1.919 ops/s
33+
QueryEntityLazyInitCollectionLoop.test:CPI 100 thrpt 2 0.303 clks/insn
34+
QueryEntityLazyInitCollectionLoop.test:IPC 100 thrpt 2 3.300 insns/clk
35+
QueryEntityLazyInitCollectionLoop.test:L1-dcache-load-misses 100 thrpt 2 5329572.054 #/op
36+
QueryEntityLazyInitCollectionLoop.test:L1-dcache-loads 100 thrpt 2 173216246.787 #/op
37+
QueryEntityLazyInitCollectionLoop.test:L1-icache-load-misses 100 thrpt 2 192632.126 #/op
38+
QueryEntityLazyInitCollectionLoop.test:L1-icache-loads 100 thrpt 2 14780212.318 #/op
39+
QueryEntityLazyInitCollectionLoop.test:branch-misses 100 thrpt 2 187771.566 #/op
40+
QueryEntityLazyInitCollectionLoop.test:branches 100 thrpt 2 92994927.715 #/op
41+
QueryEntityLazyInitCollectionLoop.test:cycles 100 thrpt 2 140706458.155 #/op
42+
QueryEntityLazyInitCollectionLoop.test:dTLB-load-misses 100 thrpt 2 38567.276 #/op
43+
QueryEntityLazyInitCollectionLoop.test:dTLB-loads 100 thrpt 2 3969019.541 #/op
44+
QueryEntityLazyInitCollectionLoop.test:iTLB-load-misses 100 thrpt 2 16704.402 #/op
45+
QueryEntityLazyInitCollectionLoop.test:iTLB-loads 100 thrpt 2 414076.369 #/op
46+
QueryEntityLazyInitCollectionLoop.test:instructions 100 thrpt 2 464215980.492 #/op
47+
QueryEntityLazyInitCollectionLoop.test:stalled-cycles-frontend 100 thrpt 2 13056272.972 #/op
48+
QueryEntityLazyInitCollectionLoop.test:·async 100 thrpt NaN ---
49+
QueryEntityLazyInitCollectionLoop.test:·gc.alloc.rate 100 thrpt 6 1155.831 ± 53.937 MB/sec
50+
QueryEntityLazyInitCollectionLoop.test:·gc.alloc.rate.norm 100 thrpt 6 30431129.790 ± 11700.499 B/op
51+
QueryEntityLazyInitCollectionLoop.test:·gc.count 100 thrpt 6 22.000 counts
52+
QueryEntityLazyInitCollectionLoop.test:·gc.time 100 thrpt 6 25.000 ms
53+
```
54+
55+
The position of the `jfr` files can be used along with the async-profiler converter, to produce the 3 types of flamegraphs we're interested in.
56+
57+
For alloc flamegraphs:
58+
```
59+
$ java -cp ${ASYNC_PROFILER_HOME}/lib/converter.jar jfr2flame --alloc --total /tmp/org.hibernate.benchmark.queryl1hit.QueryEntityLazyInitCollectionLoop.test-Throughput-count-100/jfr-cpu.jfr alloc.html
60+
```
61+
The `--total` flag is used to produce flamegraph samples based on the configred allocation threshold, aiming to show the total size of allocations.
62+
63+
Instead, for wall clock and cpu flamegraphs:
64+
```
65+
$ java -cp ~/async-profiler/lib/converter.jar jfr2flame --state default /tmp/org.hibernate.benchmark.queryl1hit.QueryEntityLazyInitCollectionLoop.test-Throughput-count-100/jfr-cpu.jfr cpu.html
66+
67+
$ java -cp ~/async-profiler/lib/converter.jar jfr2flame --state runnable,sleeping /tmp/org.hibernate.benchmark.queryl1hit.QueryEntityLazyInitCollectionLoop.test-Throughput-count-100/jfr-cpu.jfr wall.html
68+
```
69+
Why we're using `--state` here?
70+
71+
see https://github.com/async-profiler/async-profiler/issues/740#issue-1650739133 for more info: TLDR both wall and cpu events are collected
72+
using `jdk.ExecutionSample`, hence to distinguish between them, async-profiler uses the `state` field to differentiate between them.
73+
74+
To better understand this, we can type:
75+
```shell
76+
$ jfr print --events jdk.ExecutionSample /tmp/org.hibernate.benchmark.queryl1hit.QueryEntityLazyInitCollectionLoop.test-Throughput-count-100/jfr-cpu.jfr | grep -c STATE_DEFAULT
77+
302
78+
$ jfr summary /tmp/org.hibernate.benchmark.queryl1hit.QueryEntityLazyInitCollectionLoop.test-Throughput-count-100/jfr-cpu.jfr
79+
80+
Version: 2.0
81+
Chunks: 1
82+
Start: 2024-05-17 12:32:19 (UTC)
83+
Duration: 3 s
84+
85+
Event Type Count Size (bytes)
86+
=========================================================
87+
jdk.ObjectAllocationInNewTLAB 7087 141954
88+
jdk.ExecutionSample 3509 55622
89+
jdk.NativeLibrary 28 2157
90+
jdk.ActiveSetting 23 705
91+
jdk.InitialSystemProperty 17 1066
92+
jdk.CPULoad 3 63
93+
jdk.GCHeapSummary 3 123
94+
jdk.Metadata 1 6952
95+
jdk.Checkpoint 1 294850
96+
jdk.ActiveRecording 1 75
97+
jdk.OSInformation 1 108
98+
jdk.CPUInformation 1 365
99+
jdk.JVMInformation 1 348
100+
jdk.ObjectAllocationOutsideTLAB 0 0
101+
jdk.JavaMonitorEnter 0 0
102+
jdk.ThreadPark 0 0
103+
profiler.Log 0 0
104+
profiler.Window 0 0
105+
profiler.LiveObject 0 0
106+
```
107+
Which shows 2 interesting things:
108+
- as expected, the cpu-related `jdk.ExecutionSample` events are just a part of the overall ones e.g. 302 vs 3509
109+
- there are other events collected by flight-recorder while `async-profiler` agent has been attached

basic/build.gradle

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,6 @@ plugins {
1212
dependencies {
1313
testRuntimeOnly 'com.h2database:h2:2.2.224'
1414

15-
testImplementation 'org.junit.jupiter:junit-jupiter-api:5.10.2'
16-
testRuntimeOnly 'org.junit.jupiter:junit-jupiter-engine'
17-
1815
testImplementation 'org.openjdk.jmh:jmh-core:1.36'
1916
testImplementation 'org.openjdk.jmh:jmh-generator-annprocess:1.36'
2017
def orm = project.findProperty("orm")

basic/src/test/java/org/hibernate/benchmark/queryl1hit/QueryEntityLazyInitCollectionBase.java

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -23,10 +23,8 @@
2323
import org.openjdk.jmh.annotations.State;
2424
import org.openjdk.jmh.annotations.TearDown;
2525

26-
import static org.junit.jupiter.api.Assertions.assertFalse;
27-
2826
@State(Scope.Thread)
29-
public abstract class QueryEntityLazyInitCollectionBase {
27+
public class QueryEntityLazyInitCollectionBase {
3028

3129
protected EntityManagerFactory entityManagerFactory;
3230

@@ -56,7 +54,15 @@ protected void queryAuthors(EntityManager em) {
5654
authors.forEach( author -> assertFalse( author.books.isEmpty() ) );
5755
}
5856

59-
public abstract void test();
57+
private static void assertFalse(boolean empty) {
58+
if (empty) {
59+
throw new AssertionError();
60+
}
61+
}
62+
63+
public void test() {
64+
throw new UnsupportedOperationException("Not implemented");
65+
}
6066

6167
public void populateData(EntityManager entityManager) {
6268
final Book book = new Book();

basic/src/test/java/org/hibernate/benchmark/queryl1hit/QueryEntityLazyInitCollectionLoop.java

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,34 +1,42 @@
11
package org.hibernate.benchmark.queryl1hit;
22

33
import java.io.IOException;
4-
import java.util.List;
54

65
import jakarta.persistence.EntityManager;
76
import jakarta.persistence.FlushModeType;
87
import org.openjdk.jmh.Main;
98
import org.openjdk.jmh.annotations.Benchmark;
9+
import org.openjdk.jmh.annotations.Fork;
10+
import org.openjdk.jmh.annotations.Measurement;
11+
import org.openjdk.jmh.annotations.Param;
1012
import org.openjdk.jmh.annotations.Scope;
1113
import org.openjdk.jmh.annotations.State;
14+
import org.openjdk.jmh.annotations.Warmup;
1215
import org.openjdk.jmh.runner.Runner;
1316
import org.openjdk.jmh.runner.RunnerException;
1417
import org.openjdk.jmh.runner.options.Options;
1518
import org.openjdk.jmh.runner.options.OptionsBuilder;
1619
import org.openjdk.jmh.runner.options.TimeValue;
1720

18-
import static org.junit.jupiter.api.Assertions.assertFalse;
19-
2021
/**
2122
* Query an entity and initialize a lazy collection in a loop to hit the L1 cache with repeatedly.
2223
*/
2324
@State(Scope.Thread)
25+
@Fork(2)
26+
@Warmup(iterations = 3, time = 1)
27+
@Measurement(iterations = 3, time = 1)
2428
public class QueryEntityLazyInitCollectionLoop extends QueryEntityLazyInitCollectionBase {
2529

30+
@Param({ "100000" })
31+
public int count;
32+
33+
2634
@Benchmark
2735
public void test() {
2836
final EntityManager em = entityManagerFactory.createEntityManager();
2937
em.setFlushMode( FlushModeType.COMMIT );
3038
em.getTransaction().begin();
31-
for ( int i = 0; i < 100_000; i++ ) {
39+
for ( int i = 0; i < count; i++ ) {
3240
queryAuthors( em );
3341
}
3442
em.getTransaction().commit();

basic/src/test/java/org/hibernate/benchmark/queryl1hit/QueryEntityLazyInitCollectionSingle.java

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,6 @@
1010
import org.openjdk.jmh.runner.options.TimeValue;
1111

1212
import java.io.IOException;
13-
import java.util.List;
14-
15-
import static org.junit.jupiter.api.Assertions.assertFalse;
1613

1714
/**
1815
* Query an entity and initialize a lazy collection.

basic/src/test/java/org/hibernate/benchmark/queryl1hit/QueryEntityQueryReferringEntityBase.java

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
import org.openjdk.jmh.annotations.State;
2323
import org.openjdk.jmh.annotations.TearDown;
2424
@State(Scope.Thread)
25-
public abstract class QueryEntityQueryReferringEntityBase {
25+
public class QueryEntityQueryReferringEntityBase {
2626

2727
static final int NUMBER_OF_ENTITIES = 1000;
2828

@@ -49,7 +49,9 @@ public void destroy() {
4949
entityManagerFactory.close();
5050
}
5151

52-
public abstract void perf();
52+
public void perf() {
53+
throw new UnsupportedOperationException("Not implemented");
54+
}
5355

5456
public void queryEmployees(EntityManager entityManager, boolean includeQueryAccess) {
5557
TypedQuery<Employee> query = entityManager.createQuery( "select e From Employee e", Employee.class );

gradlew

100644100755
File mode changed.

0 commit comments

Comments
 (0)