-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiregion stackset deployment fails #85
Comments
OK, let me help you out here, because I just went through this particular nightmare myself (still working on it, actually). You are the victim of a bad README and even worse naming conventions. First off, the datadog-forwarder StackSet does NOT deploy the datadog forwarder. It deploys a lambda that runs once to create a datadog integration with that particular AWS account. You cannot deploy it to multiple regions (well, actually, you can, but there's no reason to, and it's a major hassle to, anyway, but I did find a way. Wish I hadn't). You deploy the stackSet to create the datadog integration via lambda. It runs once, and if you already have an integration set up for a particular AWS account, and run it again, be it from a different region, or by upgrading to a new version of the stackSet template, it will fail with a 409. Despite what the README says, you can only deploy it in one region per account. Full stop. This is a discrepancy between the README and the web docs for this thing. What you want is to deploy the stackSet once, which will create an integration in each account to which you deploy it, then you can deploy the datadog-forwarder stack (see what I mean about bad naming conventions?) to each account and in each region where you need it. There is no stackSet to deploy the actual forwarder, as far as I can see, so you have to do it manually in each account. Follow the directions here: https://docs.datadoghq.com/logs/guide/forwarder/?tab=cloudformation If you do try to deploy the stackSet to multiple regions, or you try to update the template (because when we first deployed it, it was using a python 3.8 runtime, which is being deprecated, hence the start of this whole nightmare for me, needed to upgrade all our lambdas to a newer runtime), it will fail with a 409, and then datadog will delete the integration for that account (see the mention of this issue from June 24). This is because the integration already exists for that account. The fact that it deletes the integration on the failure of a stackSet deploy is hugely bad design. You'll lose all your clickops-based configs for that integration, and if you didn't make note of them, you'll have to figure out what you had configured from scratch. So, yeah. Bad design, bad docs, bad naming conventions. Hope this helps. |
Expected Behavior
As stated in the README one can choose multiple regions for a StackSet:
"Select which regions in which you’d like to deploy the integration. Note that you can modify regions to monitor from the Datadog AWS configuration page after deploying the stack."
Actual Behavior
Deployment fails for all stacks except one because
DatadogAPICall
resource fails with the 409 conflict error. The root cause is that all stacks try to create an AWS integration for the same account with the Datadog API and only one succeeds. I haven't found a feasible way to fix this so far, because with the Datadog API it seems to be not possible to get an External ID for an integration, I may only to reset it, but it will cause failures for other parallel stacks in any case this way.Steps to Reproduce the Problem
Specifications
Stacktrace
The text was updated successfully, but these errors were encountered: