Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
4b378cb
Added e2e workflow
GlassOfWhiskey Mar 11, 2025
4513ec0
Use C++ vector instead of C array
bebora Mar 7, 2025
8642464
Add weights to diagonal terms
bebora Mar 20, 2025
99bbdff
Use annealer with floating point support
bebora Mar 24, 2025
69a0e4c
Compute Silhouette score from a separate executable
bebora Mar 24, 2025
e712242
Added score-based loop
GlassOfWhiskey Mar 28, 2025
a91c87f
Use pseudo-random parameters for clustering algorithms
bebora Mar 25, 2025
e5f2bb4
Added E4 scripts
GlassOfWhiskey Mar 31, 2025
f5e0c57
Fix overflow when using larger datasets (more than 2^16 points)
bebora Apr 8, 2025
33bd6dc
Handle edge cases in Silhouette score computation
bebora Apr 8, 2025
bba227b
Allow https url for SimulatedAnnealing submodule
bebora Apr 8, 2025
f30d92c
Generate medium size dataset
bebora Apr 9, 2025
87bfbe9
Update workflow inputs
bebora Apr 9, 2025
ebfacfb
Update default threshold
bebora Apr 10, 2025
2a58727
Launch and measure serial jobs using a bash script
bebora Apr 10, 2025
cd05777
Compute aggregate metrics for workflow approach
bebora Apr 11, 2025
eb7346b
Profile single workflow with annealing sleep
bebora Apr 12, 2025
848c475
Add option to run workflow with already compiled executables
bebora Apr 12, 2025
c7a354e
Coarse metrics for dual workflow jobs
bebora Apr 12, 2025
6d60e7d
Launch dual workflow runs
bebora Apr 12, 2025
6e5f265
Compute finer workflow metrics
bebora Apr 13, 2025
6e869fd
Update .gitignore
bebora Apr 13, 2025
ca7803d
Reduce resource requirements for compilation and Silhouette score com…
bebora Jul 13, 2025
4dca67d
Equalise compilation and execution behaviour between E4 and CINECA cl…
bebora Jul 30, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -55,3 +55,8 @@ Thumbs.db

# Build directory
build/

# StreamFlow
.streamflow
report.html
output.txt
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[submodule "SimulatedAnnealing"]
path = SimulatedAnnealing
url = [email protected]:E4-Computer-Engineering/SimulatedAnnealing.git
55 changes: 53 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,24 +3,33 @@
This work is based upon the approach from [A clustering aggregation algorithm on neutral-atoms and annealing quantum processors](https://arxiv.org/pdf/2412.07558).

## How to run
Make sure to have a working MPI installation available. Its include path should either be added to $INCLUDE or $MPI_INC.

Make sure to have a working MPI installation available. Its include path should either be added to `$INCLUDE` or `$MPI_INC`.

The code can be compiled using `make`. The newly built executable will be under the build/bin directory.

The code can be run as follows:

```bash
mpirun -n 3 build/bin/clustering data/input/cluster_points_article.csv
```

You can optionally add another argument to save the output matrix to file:

```bash
mpirun -n 3 build/bin/clustering data/input/cluster_points_article.csv example_output.txt
```

You can add one more optional argument to save the indices of points that comprise each cluster:

```bash
mpirun -n 3 build/bin/clustering data/input/cluster_points_article.csv example_output.txt cluster_indices.txt
```

### Expected output

Running the clustering executable will create an overlap matrix in the following form:

```
-1 8 8 8 0 0 0 0
0 -1 0 8 0 0 0 0
Expand All @@ -31,33 +40,75 @@ Running the clustering executable will create an overlap matrix in the following
0 0 0 0 0 0 -1 8
0 0 0 0 0 0 0 -1
```

Each column/row represent a possible cluster. The diagonal terms are equal to -1, the off-diagonal ones are either 0 or a positive integer $\lambda$. Positive values denote overlaps between clusters. The value of $\lambda$ is defined as the number of different clusters, in this case 8, in order to prevent the selection of overlapping clusters.

If you choose to also save the points of each cluster, they will be in this form:

```
0,1,3,4
2,5,7
6,8,9
```

Each line corresponds to a different cluster. Each of its comma-separated values corresponds to a point from the original input file.

## Workflow run

It is also possible to run the whole Classical-Quantum pipeline (clustering + simulated annealing) as a workflow using the [StreamFlow](https://streamflow.di.unito.it) WMS. To do that, you need to clone this repository and all the included submodules, as follows:

```bash
git clone --recurse-submodules [email protected]:E4-Computer-Engineering/clustering-mis.git
```

The StreamFlow WMS requires Python 3.9 or newer. It can easily be installed as a Python package using the following commands:

```bash
python -m venv venv
source venv/bin/activate
pip install streamflow[report]==0.2.0.dev12
```

The workflow configuration is expressed in a declarative `streamflow.yml` file. An [example](workflow/streamflow.yml) targeting the [CINECA@Leonardo](https://leonardo-supercomputer.cineca.eu/) HPC facility is included in this repository. Modify it by adding your credentials (`username` and `sshKey`) and a path to a working directory in a shared portion of the Leonardo filesystem (e.g., in your `$HOME` folder).

At this point, simply run the workflow using this command:

```bash
streamflow run --name smart-hpc-qc workflow/streamflow.yml
```

When the workflow completes succesfully, you should find an `output.txt` file containing the results of the simulated annealing phase. In addition, the following command generates a report of the workflow run:

```bash
streamflow report --file workflow/streamflow.yml smart-hpc-qc
```

## TODO

- [ ] Add brief description with images

## Suggested dev setup

It recommended to use [VS Code](https://code.visualstudio.com/).

### Linting and autocompletion

IntelliSense from the Microsoft-provided C++ and Makefile extensions reports errors even if the code compiles.
It is recommended to use the [clangd extension](https://marketplace.visualstudio.com/items?itemName=llvm-vs-code-extensions.vscode-clangd) instead.

Install the clangd extension and allow it to disable IntelliSense. Install the clangd language server if prompted.

Then install [bear](https://github.com/rizsotto/Bear) and, from the project root directory, run the following:

```bash
make clean; bear -- make
```

This will create a `compile_commands.json` file that is used by clangd to correctly inspect code.
Run "clangd: Restart language server" from the Command Palette (Ctrl+Shift+P) to read the newly created file.

You need to execute again the commands from above and restart the language server after each Makefile change
You need to execute again the commands from above and restart the language server after each Makefile change

### Formatting

The clangd extension from the previous section can also format C/C++ code. Invoke it from Command Palette -> Format Document.
1 change: 1 addition & 0 deletions SimulatedAnnealing
Submodule SimulatedAnnealing added at 61f0d7
22 changes: 22 additions & 0 deletions workflow/cwl/clt/annealing.cwl
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
cwlVersion: v1.2
class: CommandLineTool
requirements:
ToolTimeLimit:
timelimit: 300
arguments:
- position: 3
valueFrom: output.txt
inputs:
annealing:
type: File
inputBinding:
position: 1
qubo:
type: File
inputBinding:
position: 2
outputs:
output:
type: File
outputBinding:
glob: output.txt
18 changes: 18 additions & 0 deletions workflow/cwl/clt/build.cwl
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
cwlVersion: v1.2
class: CommandLineTool
requirements:
InitialWorkDirRequirement:
listing: $(inputs.src)
baseCommand: [make]
inputs:
src:
type:
type: array
items: [File, Directory]
output_path: string
outputs:
output:
type: File
outputBinding:
glob: $(inputs.output_path)

34 changes: 34 additions & 0 deletions workflow/cwl/clt/clustering.cwl
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
cwlVersion: v1.2
class: CommandLineTool
requirements:
ToolTimeLimit:
timelimit: 300
baseCommand: [mpirun, --bind-to, core:overload-allowed]
arguments:
- position: 4
valueFrom: output.txt
- position: 5
valueFrom: indices.txt
inputs:
clustering:
type: File
inputBinding:
position: 2
points:
type: File
inputBinding:
position: 3
processes:
type: int
inputBinding:
position: 1
prefix: -n
outputs:
indices:
type: File
outputBinding:
glob: indices.txt
output:
type: File
outputBinding:
glob: output.txt
19 changes: 19 additions & 0 deletions workflow/cwl/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
annealing_src:
- class: File
path: ../../SimulatedAnnealing/Makefile
- class: Directory
path: ../../SimulatedAnnealing/src
clustering_src:
- class: File
path: ../../Makefile
- class: File
path: ../../clustering.cpp
- class: File
path: ../../points.cpp
- class: Directory
path: ../../include
- class: Directory
path: ../../vendor
points:
class: File
path: ../../data/input/cluster_points_article.csv
51 changes: 51 additions & 0 deletions workflow/cwl/main.cwl
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
cwlVersion: v1.2
class: Workflow
requirements:
StepInputExpressionRequirement: {}
ToolTimeLimit:
timelimit: 300
inputs:
annealing_src:
type:
type: array
items: [File, Directory]
clustering_src:
type:
type: array
items: [File, Directory]
points: File
processes:
type: int
default: 3
outputs:
annealing:
type: File
outputSource: annealing/output
steps:
build-clustering:
run: clt/build.cwl
in:
src: clustering_src
output_path:
valueFrom: build/bin/clustering
out: [output]
clustering:
run: clt/clustering.cwl
in:
clustering: build-clustering/output
points: points
processes: processes
out: [indices, output]
build-annealing:
run: clt/build.cwl
in:
src: annealing_src
output_path:
valueFrom: build/bin/simAnnSingle.out
out: [output]
annealing:
run: clt/annealing.cwl
in:
annealing: build-annealing/output
qubo: clustering/output
out: [output]
5 changes: 5 additions & 0 deletions workflow/slurm/leonardo.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#!/bin/bash

module load openmpi/4.1.6--gcc--12.2.0

{{streamflow_command}}
55 changes: 55 additions & 0 deletions workflow/streamflow.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
version: v1.0
workflows:
smart-hpc-qc:
type: cwl
config:
file: cwl/main.cwl
settings: cwl/config.yml
bindings:
- step: /build-clustering
target:
deployment: leonardo
service: dcgp
- step: /clustering
target:
deployment: leonardo
service: dcgp
- step: /build-annealing
target:
deployment: leonardo
service: booster
- step: /annealing
target:
deployment: leonardo
service: booster
database:
type: sqlite
config:
connection: .streamflow/sqlite.db
deployments:
leonardo-ssh:
type: ssh
config:
nodes:
- hostname: login.leonardo.cineca.it
checkHostKey: false
sshKey: </path/to/ssh/key>
username: <username>
workdir: </path/to/shared/workdir>
leonardo:
type: slurm
config:
maxConcurrentJobs: 2
services:
booster:
account: IscrC_SHPC-QC
file: slurm/leonardo.sh
partition: boost_usr_prod
dcgp:
account: IscrC_SHPC-QC_0
file: slurm/leonardo.sh
gres: tmpfs:1g
partition: dcgp_usr_prod
ntasks: 3
wraps: leonardo-ssh