Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

modules/nixos: add comin #559

Merged
merged 2 commits into from
Mar 22, 2024
Merged

modules/nixos: add comin #559

merged 2 commits into from
Mar 22, 2024

Conversation

zowoq
Copy link
Contributor

@zowoq zowoq commented May 5, 2023

Polling the git repo looks much nicer than autoUpgrade that we're currently using.

I'd like to trial it on build01 but need to address a couple of thing first.

@nlewo
Copy link
Member

nlewo commented May 5, 2023

That could still be a bit early since i didn't do any release yet. So i was still feeling free to break interfaces ;)

But do not hesitate to create issues with features you would like to see.

@Mic92
Copy link
Member

Mic92 commented May 7, 2023

How does it make deployment failures visible? Does it make the service fail? I also made the experience that many service upgrades failures can be mitigated by just adding one retry of nixos-rebuild. So it would be cool if it does that.

@nlewo
Copy link
Member

nlewo commented May 7, 2023

How does it make deployment failures visible?

I'm planning to provide a Prometheus metrics. Currently, comin reports its status over http and when the head_commit_deployed attribute is false, it means an issue occurred during the deployment.

Does it make the service fail?

No (it is a daemon).

I also made the experience that many service upgrades failures can be mitigated by just adding one retry of nixos-rebuild. So it would be cool if it does that.

Noted (that should not be hard to add this option).
I also had planned to address this kind of needs by adding the force=true parameter to the http route /deploy (currently used by webhooks).

@zowoq
Copy link
Contributor Author

zowoq commented Aug 1, 2023

We talked about deployments a bit more recently and looks like we'll switch cachix deploy as they going to add telemetry and have support for nix-darwin.

Sorry for the noise here @nlewo.

@zowoq zowoq closed this Aug 1, 2023
@zowoq zowoq deleted the comin branch August 1, 2023 23:04
@nlewo
Copy link
Member

nlewo commented Aug 13, 2023

@zowoq no worries and thank you for your explanations.
(Moreover, comin is definitively not ready for being used by others. Sometimes, i "forget" to publish my softwares... but for the first time, i published it too early :/)

@zowoq zowoq restored the comin branch March 7, 2024 01:28
@zowoq
Copy link
Contributor Author

zowoq commented Mar 7, 2024

The cachix deploy telemetry didn't end up working out so I'd like to try this now that it has had a release.

Would need to use http for the status but that doesn't look to be hard to add to our monitoring.

{
  "Generation": {
    "BuildEndedAt": "0001-01-01T00:00:00Z",
    "BuildErr": null,
    "BuildStartedAt": "0001-01-01T00:00:00Z",
    "DrvPath": "",
    "EvalEndedAt": "0001-01-01T00:00:00Z",
    "EvalErr": null,
    "EvalMachineId": "",
    "EvalStartedAt": "0001-01-01T00:00:00Z",
    "OutPath": "",
    "RepositoryStatus": {
      "Error": null,
      "main_branch_name": "",
      "main_commit_id": "",
      "main_remote_name": "",
      "remotes": null,
      "selected_branch_is_testing": false,
      "selected_branch_name": "",
      "selected_commit_id": "",
      "selected_commit_msg": "",
      "selected_remote_name": ""
    },
    "Status": 0
  },
  "deployment": {
    "end_at": "0001-01-01T00:00:00Z",
    "error_msg": "",
    "generation": {
      "BuildEndedAt": "0001-01-01T00:00:00Z",
      "BuildErr": null,
      "BuildStartedAt": "0001-01-01T00:00:00Z",
      "DrvPath": "",
      "EvalEndedAt": "0001-01-01T00:00:00Z",
      "EvalErr": null,
      "EvalMachineId": "",
      "EvalStartedAt": "0001-01-01T00:00:00Z",
      "OutPath": "",
      "RepositoryStatus": {
        "Error": null,
        "main_branch_name": "",
        "main_commit_id": "",
        "main_remote_name": "",
        "remotes": null,
        "selected_branch_is_testing": false,
        "selected_branch_name": "",
        "selected_commit_id": "",
        "selected_commit_msg": "",
        "selected_remote_name": ""
      },
      "Status": 0
    },
    "operation": "",
    "restart_comin": false,
    "start_at": "0001-01-01T00:00:00Z",
    "status": 0
  },
  "hostname": "build01",
  "is_fetching": false,
  "is_running": false,
  "repository_status": {
    "Error": {},
    "main_branch_name": "",
    "main_commit_id": "",
    "main_remote_name": "",
    "remotes": [
      {
        "fetched": true,
        "fetched_at": "2024-03-07T00:57:43.502544262Z",
        "main": {
          "error_msg": "The branch 'origin/fail' doesn't exist",
          "name": "fail"
        },
        "name": "origin",
        "testing": {},
        "url": "https://github.com/nix-community/infra.git"
      }
    ],
    "selected_branch_is_testing": false,
    "selected_branch_name": "",
    "selected_commit_id": "",
    "selected_commit_msg": "",
    "selected_remote_name": ""
  }
}
time="2024-03-07T04:48:38Z" level=info msg="The manager is started"
time="2024-03-07T04:48:38Z" level=info msg="  hostname = build01"
time="2024-03-07T04:48:38Z" level=info msg="  machineId = 2a7f74e3b4f6409eb031a1e606fff1bc"
time="2024-03-07T04:48:38Z" level=info msg="Starting the poller for the remote 'origin' with period 300s"
time="2024-03-07T04:48:38Z" level=info msg="  repositoryPath = /var/lib/comin/repository"
time="2024-03-07T04:48:38Z" level=info msg="Starting the webhook server on 127.0.0.1:4242"
time="2024-03-07T04:48:39Z" level=info msg="New commits have been fetched from 'https://github.com/nix-community/infra.git'"
time="2024-03-07T04:48:39Z" level=info msg="Running 'nix --extra-experimental-features nix-command --extra-experimental-features flakes show-derivation /var/lib/comin/repository#nixosConfigurations.build01.config.system.build.toplevel -L'"
warning: 'show-derivation' is a deprecated alias for 'derivation show'
Pass '--accept-flake-config' to trust it
warning: ignoring untrusted flake configuration setting 'extra-trusted-public-keys'.
Pass '--accept-flake-config' to trust it
time="2024-03-07T04:48:50Z" level=info msg="The derivation path is /nix/store/7xd89npijmy73y286zzkw25qd379g7p7-nixos-system-build01-24.05.20240306.edf9f14.drv"
time="2024-03-07T04:48:50Z" level=info msg="The output path is /nix/store/zjvs12b8bc7x2hvmcq9rzmpdalr67yb0-nixos-system-build01-24.05.20240306.edf9f14"
time="2024-03-07T04:48:50Z" level=info msg="Running 'nix --extra-experimental-features nix-command --extra-experimental-features flakes eval /var/lib/comin/repository#nixosConfigurations.build01.config.services.comin.machineId --json'"
Pass '--accept-flake-config' to trust it
warning: ignoring untrusted flake configuration setting 'extra-trusted-public-keys'.
Pass '--accept-flake-config' to trust it
time="2024-03-07T04:48:51Z" level=info msg="Evaluation error: The defined comin.machineId (2a7f74e3b4f6409eb031a1e606fff1bc) is different to the machine id () of this machine"
time="2024-03-07T04:49:16Z" level=info msg="Getting status request /status from 127.0.0.1:40056"

@zowoq zowoq reopened this Mar 7, 2024
@zowoq zowoq force-pushed the comin branch 2 times, most recently from d65dfc2 to 61d95bb Compare March 7, 2024 04:46
@nlewo
Copy link
Member

nlewo commented Mar 11, 2024

time="2024-03-07T04:48:51Z" level=info msg="Evaluation error: The defined comin.machineId (2a7f74e3b4f6409eb031a1e606fff1bc) is different to the machine id () of this machine"

This is an issue in comin which should be fixed by nlewo/comin#14.

Would need to use http for the status but that doesn't look to be hard to add to our monitoring.

Which metrics would you like to collect?
In case you use Prometheus, i created nlewo/comin#9 to collect a list of useful metrics. Maybe that could fit your need better than the unstable status output.

@zowoq
Copy link
Contributor Author

zowoq commented Mar 11, 2024

This is an issue in comin which should be fixed by ...

Thank you!

Which metrics would you like to collect?

I think a status/exit code of the most recent deployment and the commit of the last successful deployment would be enough for us.

In case you use Prometheus ...

Yes, we're using prometheus so metrics in that format would be excellent.

@zowoq zowoq changed the title comin modules/nixos: add comin Mar 16, 2024
@zowoq
Copy link
Contributor Author

zowoq commented Mar 22, 2024

I'll test this on build01 for a bit and also so I have a live machine to work on while I figure out the the http status monitoring.

Follow up in #1166.

@zowoq zowoq marked this pull request as ready for review March 22, 2024 00:59
@zowoq zowoq added this pull request to the merge queue Mar 22, 2024
Merged via the queue into master with commit 748591f Mar 22, 2024
36 checks passed
@zowoq zowoq deleted the comin branch March 22, 2024 01:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants