-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[APP-7364] [APP-7366] [RSDK-9684] [APP-7154] Add more logging, reduce monitoring loop time, misc small fixes. #56
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code scan ✅
Have some questions which are mainly aimed at me getting up to speed! I will manually test these fixes later today.
@@ -37,7 +37,7 @@ <h2>Smart Machine Setup</h2> | |||
<div class="form-group"> | |||
<label for="network">Network</label> | |||
{{if eq (len .VisibleSSIDs) 0}} | |||
<input type="text" name="ssid" placeholder="Enter Wifi SSID" id="network" required> | |||
<input type="text" name="ssid" placeholder="Enter Wifi SSID" id="network" required autocorrect="off" autocapitalize="off"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For context these are "on" if omitted and that was not allowing you to connect with a SSID that was lowercased? Nice find if so 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Micheal Lee found/suggested this fix. But yeah, I guess some phones/browsers see an empty text field, and if you type a password that's a normal "word" it auto-capitalizes it, which isn't ideal for passwords.
We don't use a "password" type field here, because I wanted this to be visible before submitting. Otherwise it's way too easy to accidentally typo and then it takes several minutes for the device to try out the thing, fail, and restart a hotspot.
if err := os.MkdirAll(filepath.Dir(serviceFilePath), 0o755); err != nil { | ||
return errw.Wrapf(err, "creating directory %s", filepath.Dir(serviceFilePath)) | ||
} | ||
// use this later to avoid re-enabling an existing agent service a user might have disabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Want to confirm my understanding...there are users who have Viam agent installed but who don't want it to automatically start up on boot, so they've "disabled" auto start. When they upgrade their Viam agent, the configuration file where they've "disabled" the auto start gets overwritten such that startup on boot becomes true...and that's what you're fixing in this PR.
Copied the following snippet from the ticket - is this the final product (i.e. the user now has to manually restart Viam agent after an upgrade)?
Jan 07 16:11:09 pluto viam-agent[911]: 2025-01-07T22:11:09.496Z INFO viam-agent agent/subsystem.go:355 viam-agent updated from 0.11.0 to 0.12.0
Jan 07 16:11:10 pluto viam-agent[911]: 2025-01-07T22:11:09.515Z INFO viam-agent viamagent/viamagent.go:131 writing systemd service file to /usr/local/lib/systemd/system/viam-agent.service
Jan 07 16:11:10 pluto viam-agent[911]: 2025-01-07T22:11:09.515Z INFO viam-agent viamagent/viamagent.go:146 enabling systemd viam-agent service
Jan 07 16:11:10 pluto viam-agent[911]: 2025-01-07T22:11:10.091Z INFO viam-agent viamagent/viamagent.go:168 Install complete. Please (re)start the service with 'systemctl restart viam-agent' when ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if someone installed viam-agent, then runs systemctl disable viam-agent
it won't start at boot. The problem is that, previously, when we did an upgrade, this same code got called, and it ALWAYS enabled the service again. Now it should only enable it on first/new install.
The snippet is the old behavior, where it shows "enabling systemd viam-agent service" was the problem. That should only happen on new installs now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Working on getting manual testing setup. Lmk if I'm moving in right direction. Using an RPi5 because agent runs on linux (and I use Mac). Should I be:
- pulling a config from prod (without actually starting the viam-server)
- check out your branch and run
make
- run the binary that I've created which includes the changes in your PR
- validate the viam-agent starts at reboot of the RPi5
- run
systemctl disable viam-agent
- upgrade the
viam-agent
on the RPi5 by doing what exactly? - validate that the
viam-agent
service isn't running on 2nd reboot
Please let me know gaps here. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validated by disabling the agent and upgrading from a stable version
{
"agent": {
"viam-agent": {
"pin_version": "0.12.0"
},
"viam-server": {
"release_channel": "stable",
"attributes": {}
}
}
}
to a pinned URL:
{
"agent": {
"viam-agent": {
"pin_url": "file:///home/jeep/dev/agent/bin/viam-agent-custom-aarch64"
},
"viam-server": {
"release_channel": "stable",
"attributes": {}
}
}
}
After the upgrade and subsequent reboot, I validated the viam-agent
system service wasn't running and was still disabled:
○ viam-agent.service - Viam Services Agent
Loaded: loaded (/usr/local/lib/systemd/system/viam-agent.service; disabled; preset: enabled)
Active: inactive (dead)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After re-enable and reboot the viam-agent starts up on its own 👍
● viam-agent.service - Viam Services Agent
Loaded: loaded (/usr/local/lib/systemd/system/viam-agent.service; enabled; preset: enabled)
Active: active (running) since Fri 2025-01-10 13:56:50 EST; 1min 22s ago
Main PID: 758 (viam-agent)
Tasks: 10 (limit: 9247)
CPU: 4.953s
CGroup: /system.slice/viam-agent.service
└─758 /opt/viam/bin/viam-agent --config /etc/viam.json
@@ -24,7 +24,7 @@ import ( | |||
) | |||
|
|||
const ( | |||
minimalCheckInterval = time.Second * 60 | |||
minimalCheckInterval = time.Second * 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the monitoring loop checking for on a super high level? Is this monitoring for FTDC data in the viam-server? What other things are "monitored" on a regular basis via the agent ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how often it fetches a new config from the cloud. The cloud itself can say "check again after X time") but that's currently not implemented, so it falls to this.
The main loop basically: check for new config > apply changes/updates (if there) > start subsystems that should be started > check health of subsystems > repeat
So the check interval determins (roughly) the overall timing. At 5 seconds, it may have trouble keeping up if there are slow responses or other work to do, but that should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
} | ||
// use this later to avoid re-enabling an existing agent service a user might have disabled | ||
_, err = os.Stat(serviceFilePath) | ||
newInstall := err != nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth checking for a file does not exist error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This covers edge cases like file corruption too. If we can't stat, then it's best to treat it as a new install, IMHO. But a good thought! In many other cases it'd be worth differentiating.
@Otterverse - working through the cases you need tested - I'm not familiar with the wifi setup flow - how should I be testing
And does the following mean adding an unknown field to the cloud config? Otherwise, how would it be able to re-fetch its config if I change one of its values? Or is it cached on the robot at startup?
|
After installing and using the custom version, you can delete the wifi network with
You can always get the config from the link in App for the machine. Click the status bar "offline" notification to make it drop down and give details. There's a "machine credentials" button there which gives you the contents of /etc/viam.json. And to corrupt that file, I meant just make it invalid... open it up with a text editor like nano or whatever, and just delete some curly brackets or something to make it invalid json. Ping me on slack if you have any questions. Thanks again for the work here! |
Quick PR for easy fixes.
Needs testing for the linked Jira issues