A bit of feedback #119

Shiu · 2025-08-27T22:58:46Z

Shiu
Aug 27, 2025

First, thank you for creating Agent OS. I've implemented it into an already existing project and thought I would share my experience with it so far.

To be completely honest, it's been a bit of a struggle, mainly because the "flow" doesn't quite seem to fit my use case. But the spec system is nice enough that I decided to persist and improve the setup a bit.

Mainly, the system seems to be too rigid when you're still experimenting in some areas. It also seems to be making a bit too many assumptions in general. For example, running tests that don't make sense for a particular tasks. But I the main problem for me was that I felt like I was loosing control of which steps it should be taking, I'll try and go a bit more in detail.

The initial setup of planning the product was an absolute slog for me. I understand the benefits of having a clear vision of your product, but it needs to allow some more uncertainty. I don't remember the entire process, but things like success criteria, primary users, or describing "The Problem" may not be important to a project, at least not for mine. I was uncertain if these were absolute requirements and what would happen if I wasn't being following instructions exactly, so I went ahead and described everything as well as I could. This took forever to do, and to add to the problem, I was not able reject a document change without it wanting to start the whole process from scratch. Doing so meant it would skip obvious steps. This first step nearly made me abort implementing Agent OS entirely.

I also immediately noticed that the Already implemented text it generated was not even close to reality despite having been very clear about the current state of everything. If there is any uncertainty about implementation status maybe it could ask for clarification? Or maybe it could be a step by step process with more user interaction.

Once I had the initial setup done I was able to create a spec, it took a few tries to get it correct, but this process felt easy in comparison to what came before. I love the idea of creating a spec like this, and its something I would would adopt even if I wasn't using Agent OS.

I honestly don't remember how my initial /create-task command went, but I don't remember any glaring issues here. Though, I remember being a bit confused with some of the tasks it would add, especially the tests. It didn't detail what they tests would actually do, at least not very well.

The real problems started when I executed the first task. First I made the mistake of accidentally running /execute-tasks instead of /execute-task. I think these two commands should have a much clearer distinction. But I let it run since I was curious about the tests it created and I considered this first task execution a bit of a test anyways.

In the spec I had been very clear about what was implemented already, what was working and what was not. Despite this, it would breeze through everything, it barely made any changes to the existing code, it didn't check if any utilities were already in place and it didn't ask for any validation on the test results, it assumed no manual testing was needed and marked all tasks as complete. FWIW I had spent around 6 hours just documenting so I'm certain that I had not missed anything. It was basically being a runaway train.

I've made some changes to make it fit my workflow better. I know you mention customizing Agent OS to fit each users own needs, but considering I've spent two days making it work for me, maybe some customization could be automated. I found it difficult to know where to start, even the AI would get confused with the instructions.
Starting with the plan-product command, maybe being able to specify which features of Agent OS we want to take advantage of would help, especially for smaller projects or already existing projects where the workflow is already defined to a degree.

Here's some of the modifications I ended up making:

Task creation: I modified the tasks that were generated to include manual verification requirements. Most of the changes I'm currently making involve UI/UX improvements that need visual confirmation, so I added (verify: manual) to tasks that require human testing rather than automated tests. This helped a lot with post-execution flow.

Post-execution flow: The 8-step post-execution sequence would take 5+ minutes even for small changes. It would run automated tests even after I had manually verified everything was working, and also generate multiple redundant summaries. I also removed the notification sound entirely. If I'm not at the computer I won't hear it anyways, and if I'm at the computer I don't need it to begin with. I also deleted the redundant completion summary (since git commits already capture that info well enough for me), and I modified the test logic so it skips automated testing when manual verification is specified and complete. I also made it stop re-verifying tasks.md which it would do even if I specified it had been done already.

Recap generation: It completely ignored template constraints and generate marketing-style documentation. I'd specify "1 paragraph + bullet points" and get back 163 lines claiming my system was "comprehensive and sophisticated" when it was barely halfway done. I had to add strict constraints to prevent this language and limit recap documents to 50 lines maximum.

Utility discovery: It wouldn't discover existing utilities I had already built, so it would try to create duplicate functionality instead of using what was there. I had to update my README.md to reflect the current codebase and add an automatic step to keep it current when new utilities are added.

These changes have reduced the overhead from 5-10 minutes to under a minute and at least so far I haven't found any issues with them.

I think the core issue for me is that Agent OS seems designed for planned, team-based development where you know exactly what you're building upfront. For experimental solo work where the scope might change, it imho needs more flexibility.
Agent OS also seems to assume a bit too much knowledge about how to properly spec, it could do with some more more documentation. The same goes for planning a product. Knowing the importance of a step would help.

With that said, I like where I'm at now after my modifications, but the process of getting there took far too long.

I hope you can use this feedback. It's very possible that I've just completely missed the point, but hey, that's a kind of feedback as well right?

Oh and please don't get me wrong, I think Agent OS could be a real game changer, it's just not quite there for me yet, at least not without some modifications.

Cheers!

Shiu · 2025-08-28T00:26:28Z

Shiu
Aug 28, 2025
Author

Something caught my eye after setting up a small project just now.

I wanted to run /execute-task and noticed that I only had the /execute-tasks command this time.

My main project still has two command options, so I asked the AI to investigate. It claimed that there should only be a single execute-tasks command. Clearly something went wrong during the setup process then. Maybe this could have been causing some big issues in general. After all the modification I've done, I can't really check if it would make a difference, at least not without a lot of work.

In any case my other points still stand.

0 replies

Shiu · 2025-09-09T15:59:39Z

Shiu
Sep 9, 2025
Author

Apologies for re-opening this discussion, but I thought my recent findings could be useful to someone else.

I had just about accepted the fact I probably had done something stupid and was to blame for the issues with the execute-task(s) command and some other oddities, when it happened again. This time I was seeing test output mixed with the agent instructions which was baffling to say the least.
So, it turns out that Claude Code stores message history in .claude.json and once this file reaches a certain size Gremlins appear, MCP servers stop working (or worse they only work partly), and Claude Code decides to do really strange stuff.

I ended up creating a python script that clears all message history from .claude.json and I've not experienced any issues since then.

With that said, I would recommend getting the current agent.os instructions updated to use XLM tags, CoT prompting and clear personas. The execution loops are more reliable once that is done. I don't know how active this repo is currently, but updating the syntax is easy. Just point Claude Code towards https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/system-prompts and some of the pages contained there.

I hope it helps someone.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

A bit of feedback #119

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

A bit of feedback #119

Uh oh!

Shiu Aug 27, 2025

Replies: 2 comments

Uh oh!

Shiu Aug 28, 2025 Author

Uh oh!

Shiu Sep 9, 2025 Author

Shiu
Aug 27, 2025

Shiu
Aug 28, 2025
Author

Shiu
Sep 9, 2025
Author