You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Dec 15, 2025. It is now read-only.
Copy file name to clipboardExpand all lines: README.md
+10-13Lines changed: 10 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -184,7 +184,7 @@ Note:
184
184
185
185
Sometimes you may need to clean up the data inside zookeeper. First stop the server, then run "rm -rf /path/to/zookeeper/datadir" to clean the data dir. The directory is defined in your config file.
186
186
187
-
2. Kafka setup
187
+
3. Kafka setup
188
188
189
189
When configuring Kafka and topic count, we need to ensure disk won't become bottleneck. It is suggested to start several brokers in each kafka node, and configure each broker several disks. Different brokers in the same node may share disks but have their own directories in the same disk. Our topic count is 16 for each kafka node, that is, if the kafka cluster contains only 1 kafka node, then we create topics with 16 partitions. For environment with 3 kafka nodes, we create topics with 48 partitions.
190
190
@@ -198,7 +198,7 @@ Note:
198
198
199
199
Same with ZooKeeper, you may need to clean old data that's located in disks of kafka brokers. Just `rm -rf <all_data_path>` in all your kafka nodes and directories.
200
200
201
-
3. Spark setup
201
+
4. Spark setup
202
202
203
203
All spark streaming related parameters can be defined in `conf/99-user_defined_properties.conf`.
204
204
@@ -212,7 +212,7 @@ Note:
212
212
213
213
Spark streaming can be deployed as YARN mode or standalone mode. For YARN mode, just set `hibench.spark.master` to `yarn-client`. For standalone mode, set it to `spark://spark_master_ip:port` and run `sbin/start-master.sh` in your spark home.
214
214
215
-
4. Storm setup
215
+
5. Storm setup
216
216
217
217
The conf file is `conf/storm.yaml`. Basically we configure following params:
218
218
@@ -227,7 +227,7 @@ Note:
227
227
Run `bin/storm nimbus` to start nimbus and `bin/storm ui` to setup storm ui
228
228
Run `bin/storm supervisor` to start storm supervisors
229
229
230
-
5. HiBench setup
230
+
6. HiBench setup
231
231
232
232
Same as [step.2 in previous section](#hibenchconf).
233
233
@@ -254,18 +254,15 @@ Note:
254
254
255
255
Note: For SparkStreaming, receiver mode (Spark version >= 1.4). The first run will always fail. You'll need to wait a few more minutes, running `prepare/zkUtils.sh` to ensure the topic has be created. Then re-run the workload again. For Spark version == 1.3, it'll be OK.
256
256
257
-
5. View the report:
257
+
7. View the report:
258
258
259
-
Goto `<HiBench_Root>/report` to check for the final report:
260
-
-`report/hibench.report`: Overall report about all workloads.
261
-
-`report/<workload>/<language APIs>/bench.log`: Raw logs on client side.
262
-
-`report/<workload>/<language APIs>/monitor.html`: System utilization monitor results.
263
-
-`report/<workload>/<language APIs>/conf/<workload>.conf`: Generated environment variable configurations for this workload.
264
-
-`report/<workload>/<language APIs>/conf/sparkbench/<workload>/sparkbench.conf`: Generated configuration for this workloads, which is used for mapping to environment variable.
265
-
-`report/<workload>/<language APIs>/conf/sparkbench/<workload>/spark.conf`: Generated configuration for spark.
259
+
Same as [step.4 in previous section](#viewreport)
266
260
267
-
Note, the throughput and lattency of each batch is printed to terminal during running streaming work endlessly. For SparkStreaming, press ctrl+c will stop the works. For Storm & Trident, you'll need to execute `storm/bin/stop.sh` to stop the works. For Samza, you'll have to kill all applications in YARN manually, or restart YARN.
261
+
However, the streamingbench is very different with nonstreaming workloads. Streaming workloads will collect throughput and lattency endlessly and print to terminal directly and log to `report/<workload>/<language APIs>/bench.log`.
262
+
263
+
8. Stop the streaming workloads:
268
264
265
+
For SparkStreaming press `ctrl+c` will stop the works. For Storm & Trident, you'll need to execute `storm/bin/stop.sh` to stop the works. For Samza, currently you'll have to kill all applications in YARN manually, or restart YARN cluster directly.
0 commit comments