Skip to content

Commit cf4eef5

Browse files
committedOct 30, 2018
Squash private work and make public on github.
commit 7b28c785af6a2b14ae1d4ef01a03fc47aecab687 Author: Rob Brackett <rob@robbrackett.com> Date: Tue Oct 30 09:10:58 2018 -0700 Deploy edgi-govdata-archiving/web-monitoring-db#422 commit c9dd8a1e87bff17ce2ebbf0a83f716079c55aa39 Author: Jason Sherman <jsnshrmn@users.noreply.github.com> Date: Tue Oct 30 09:38:53 2018 -0500 add link to components guide in README.md. commit 7264026217867661ffd25aafc79048f48e646fbb Merge: 8219358 0f887d5 Author: Jason Sherman <jsnshrmn@users.noreply.github.com> Date: Tue Oct 30 09:29:44 2018 -0500 Merge pull request #12 from danielballan/components-docs Add documentation for getting up and running. commit 0f887d5d915de752658d16d006051cc00211ddbc Author: Jason Sherman <jsnshrmn@users.noreply.github.com> Date: Tue Oct 30 09:23:40 2018 -0500 Added information about services to components.md. Local services config should now be copied from Keybase. commit 82193585757d50216b04ba268ac7bd2bec5310e0 Author: Jason Sherman <jsnshrmn@users.noreply.github.com> Date: Tue Oct 30 09:17:38 2018 -0500 correct typo in README.md commit 1e00e21d2b5e969ecab5b9c8ad4826297881bd89 Merge: 076db08 217f7f0 Author: Jason Sherman <jsnshrmn@users.noreply.github.com> Date: Tue Oct 30 09:15:01 2018 -0500 Merge pull request #16 from danielballan/go-public drop sensitive info so we can squash and go public. commit 076db08401cd7bcd9056de1ec717e2f57b42fb67 Author: Rob Brackett <rob@robbrackett.com> Date: Mon Oct 29 16:12:25 2018 -0700 Roll API + Import services to get new secrets I updated the passwords for the auto annotation bot since we keep going back and forth about what's right here. commit 217f7f0dd6307e6946bcb6ff95bc9e589a67cfbd Author: Jason Sherman <jsn.sherman@gmail.com> Date: Mon Oct 29 15:44:26 2018 -0500 fix eol issue that makes the diff of go-public garbage. commit 9f10460e519e8e6ba110787a2aa7911bfdbeafcc Merge: a0acde2 c3b0904 Author: Jason Sherman <jsnshrmn@users.noreply.github.com> Date: Mon Oct 29 15:34:32 2018 -0500 Merge pull request #14 from danielballan/incident-2018-10-10-differ-crashed-whole-cluster Incident: 2018-10-10 - Differ locked up cluster commit a0acde2bf8ebd90d139b553b2efdaab9182d0c38 Author: Jason Sherman <jsn.sherman@gmail.com> Date: Mon Oct 29 13:56:17 2018 -0500 use INCREMENTAL_UPDATE to deploy api pods with latest secrets. Add note about this to README.md. commit ebd97551223a192e8702fd404a1cfe10b6e2bfef Author: Rob Brackett <rob@robbrackett.com> Date: Sun Oct 28 14:19:28 2018 -0700 Deploy edgi-govdata-archiving/web-monitoring-ui#311, edgi-govdata-archiving/web-monitoring-ui#313 commit a2bba65fe2944ea99d0869bb2dba568f9dd63c41 Author: Rob Brackett <rob@robbrackett.com> Date: Tue Oct 23 23:06:46 2018 -0700 Release three updates to -db edgi-govdata-archiving/web-monitoring-db#412, edgi-govdata-archiving/web-monitoring-db#419, edgi-govdata-archiving/web-monitoring-db#421 commit a459a7a5bab8137811fe0d69f3f3a06fc6ee99ad Author: Rob Brackett <rob@robbrackett.com> Date: Tue Oct 23 08:58:54 2018 -0700 Update CACHE_DATE_DIFFER for diff color change commit c3b09040e87fafc1aff713f08d0d37541c6dcc83 Author: Jason Sherman <jsnshrmn@users.noreply.github.com> Date: Fri Oct 26 14:37:01 2018 -0500 Update 2018-10-10--differ-locked-cluster.md commit f994e52e204c5958a9e9358b2c4d4b0682ca05f8 Author: Jason Sherman <jsnshrmn@users.noreply.github.com> Date: Fri Oct 26 14:36:07 2018 -0500 Update lessons for incident 2018-10-10 Diff Service Locked Up the Whole Cluster commit 2ec97b0783e45b3c90039ba3826a6ca6dfe97fed Merge: b5d04b0 17f6ddf Author: Jason Sherman <jsnshrmn@users.noreply.github.com> Date: Thu Oct 25 10:07:36 2018 -0500 Merge pull request #13 from danielballan/incident-2018-10-09-reboot-loop Incident: 2018-10-09 secrets problems on staging commit 17f6ddfa968a36950632c7576228406687a7c4f4 Author: Jason Sherman <jsnshrmn@users.noreply.github.com> Date: Thu Oct 25 09:59:45 2018 -0500 reworded action item for validation changed it from an open question to an action item that states that we need to make a determination. commit b691a6cf029baa010f323c656090c50d46e0f459 Author: Jason Sherman <jsnshrmn@users.noreply.github.com> Date: Thu Oct 25 08:20:44 2018 -0500 Add action item for @jsnshrmn commit b5d04b089eb535351ab91f883322e457dda6c079 Author: Rob Brackett <rob@robbrackett.com> Date: Tue Oct 23 00:29:12 2018 -0700 Try out some new diff colors commit 3f6cc4991f183e75c4addd2596c3a53700866c9f Author: Jason Sherman <jsn.sherman@gmail.com> Date: Tue Oct 16 14:51:12 2018 -0500 update README.md to reference new services configuration. commit bc70249602866f2e7eb752b62898bd3a67d227ad Author: Jason Sherman <jsn.sherman@gmail.com> Date: Tue Oct 16 14:35:29 2018 -0500 Move services with sensitive information out of version control. Add example services template. commit 17b547eae596b906e8c1fda791f0cf46c1534c03 Merge: 56876e0 35c4ecf Author: Jason Sherman <jsnshrmn@users.noreply.github.com> Date: Tue Oct 16 11:42:37 2018 -0500 Merge pull request #10 from danielballan/diffing-liveness-and-readiness Configure liveness and readiness probes for diffing deployment. commit 35c4ecf617e6650b8b086212ecd7a5b5096eed6f Author: Jason Sherman <jsn.sherman@gmail.com> Date: Tue Oct 16 11:40:18 2018 -0500 switch to tcp socket healthcheck commit b73aab55165d07f18df06a5636b88b7ce8ed515f Author: Jason Sherman <jsn.sherman@gmail.com> Date: Tue Oct 16 11:28:17 2018 -0500 Deploy ui image:6fa54911bede5b135e890391198fbba68cd20853 commit 0cf0787fc7c262fecd73d4c36b7c43ec417eab39 Author: Jason Sherman <jsn.sherman@gmail.com> Date: Tue Oct 16 10:31:15 2018 -0500 Configure liveness and readiness probes to check defined container ports for api and ui. commit 47f045a6196430df764ef88ee81caba8a825ef1e Author: Jason Sherman <jsn.sherman@gmail.com> Date: Tue Oct 16 10:26:17 2018 -0500 Configure liveness and readiness probes to check defined container ports for diffing. commit 56876e088f5fbc3329d1c1d5ea64942b98f56015 Author: Jason Sherman <jsn.sherman@gmail.com> Date: Tue Oct 16 09:30:27 2018 -0500 Deploy 4 diffing replicas to production. commit 53b9007b31f95389b78d33ad9f2424f60cbbc93b Author: Rob Brackett <rob@robbrackett.com> Date: Sun Oct 14 16:48:04 2018 -0700 Deploy edgi-govdata-archiving/web-monitoring-db#409, edgi-govdata-archiving/web-monitoring-db#415 commit a4bf791b596300d6bac0b8fd733547cbe9a44d52 Author: Rob Brackett <rob@robbrackett.com> Date: Sat Oct 13 21:45:18 2018 -0700 Deploy edgi-govdata-archiving/web-monitoring-processing#281, edgi-govdata-archiving/web-monitoring-processing#285, and edgi-govdata-archiving/web-monitoring-processing#289 commit 2cc6024bd7d1a227265c607cc4dd85521933f13c Author: Jason Sherman <jsn.sherman@gmail.com> Date: Thu Oct 11 18:18:05 2018 -0500 add resource limits to import worker as well. commit 5d47264b6a5e639c5e0df1ed20466ce76df58a38 Author: Jason Sherman <jsn.sherman@gmail.com> Date: Thu Oct 11 17:55:21 2018 -0500 lower cpu request values to the silliness required to get them all to 'fit' on our available nodes. commit 557c4a918fa238c441ed588221d15fe95dd5dc67 Merge: 73771c5 b30ae3b Author: Jason Sherman <jsnshrmn@users.noreply.github.com> Date: Thu Oct 11 17:21:04 2018 -0500 Merge pull request #15 from danielballan/add-resource-limits-for-containers merging with the idea that this will benefit from iteration based on real world load information and refinement based on more experience with kubernetes. commit 93931135cc9a9f089c35f4f0dc4149b36b74911e Author: Jason Sherman <jsnshrmn@users.noreply.github.com> Date: Thu Oct 11 15:46:09 2018 -0500 Update timeline with notes from @jsnshrmn commit b30ae3b04ede348c92fdbd5381e792bbe5c8950e Author: Jason Sherman <jsn.sherman@gmail.com> Date: Thu Oct 11 14:36:18 2018 -0500 Set some baseline resource limits for containers. commit 73771c5c3fe13ab923e0c972937bb05f497b5cd5 Author: Jason Sherman <jsn.sherman@gmail.com> Date: Thu Oct 11 14:04:00 2018 -0500 Enable core metrics. commit ca091c7d55acf6961c2701c3b2a38636c54e069e Author: Rob Brackett <rob@robbrackett.com> Date: Thu Oct 11 11:49:08 2018 -0700 Incident: 2018-10-10 - Differ locked up cluster This is not a full report; it needs additional details from @jsnshrmn and some lessons learned. commit d876cb356074f29dbd72262c1c66d51def46a4a2 Author: Jason Sherman <jsn.sherman@gmail.com> Date: Thu Oct 11 10:06:13 2018 -0500 capture work on resource limits. commit 5fef23f0d5b82596b150112c1ac780ee3a8d01a1 Author: Rob Brackett <rob@robbrackett.com> Date: Wed Oct 10 23:37:32 2018 -0700 Deploy DB hotfix: edgi-govdata-archiving/web-monitoring-db@0a46db0 commit 04e1764dbc5fa381ae13168871e4f3a909885f68 Author: Rob Brackett <rob@robbrackett.com> Date: Wed Oct 10 21:05:57 2018 -0700 Add clarification about cause and timing commit 0332711f1bb60700043fee84bf277176a29a1629 Author: Rob Brackett <rob@robbrackett.com> Date: Wed Oct 10 20:58:07 2018 -0700 Add incident report for 2018-10-09 commit 83278665dc390795e16e9f964775c1683f21e0cb Author: Jason Sherman <jsn.sherman@gmail.com> Date: Wed Oct 10 21:00:26 2018 -0500 HOTFIX: frantically avoid diffing PDFs in jobs commit 22450e3120f80b53d753b46988e6085f88ed707d Author: Rob Brackett <rob@robbrackett.com> Date: Wed Oct 10 15:37:45 2018 -0700 Deploy edgi-govdata-archiving/web-monitoring-db#406 and edgi-govdata-archiving/web-monitoring-db#408 commit fe4a375ef03d424ffd413e6de596110bf2a0e923 Author: Rob Brackett <rob@robbrackett.com> Date: Tue Oct 9 17:41:33 2018 -0700 Deploy edgi-govdata-archiving/web-monitoring-db#401 Note there was some hijinks with the staging environment, hence the `INCREMENTAL_UPDATE` change. I *think* this was down to a screwed-up entry in the secrets file (newline at the end of the encoded `cache-date-differ` string), but the logs were super unclear :\ commit 6993841db12a314739ccc6fa78153176771c46d2 Merge: f02fb2f e6a1663 Author: Jason Sherman <jsnshrmn@users.noreply.github.com> Date: Tue Oct 9 16:25:24 2018 -0500 Merge pull request #11 from danielballan/better-logging Merging this in. We can modify this configuration fairly easily in the future now that we have all of the bits in place to do so. commit a218bcd97f33bcb10c82beab195bb7ed3d7a0f73 Author: Dan Allan <daniel.b.allan@gmail.com> Date: Tue Oct 9 16:10:14 2018 -0400 Add documentation for getting up and running. commit e6a16631e574ec078051b720104aae69008e9df0 Author: Jason Sherman <jsn.sherman@gmail.com> Date: Tue Oct 9 13:53:46 2018 -0500 Combine a number of the aws cloudwatch logstreams that are written by default and give them more human readable names. commit a9197822986c2532802665b6a4194247c1495614 Author: Jason Sherman <jsn.sherman@gmail.com> Date: Tue Oct 9 13:52:11 2018 -0500 setting K8S_NODE_NAME appropriately reduces calls to the kubernetes api from fluent-plugin-kubernetes_metadata_filter. commit f02fb2f1c4f973ad01f8abf94e01a3a4acd78d31 Author: Dan Allan <daniel.b.allan@gmail.com> Date: Mon Oct 8 19:19:12 2018 -0400 Clean up duplicates / unused files. commit 6fe2df4dd8de856563adf02e71459e4957e97bd6 Author: Dan Allan <daniel.b.allan@gmail.com> Date: Mon Oct 8 19:13:16 2018 -0400 Remove copies of cloudwatch config, which are in kube-system now. commit 090e733b66c562bb808307b56711e5d445f489f2 Author: Dan Allan <daniel.b.allan@gmail.com> Date: Mon Oct 8 19:13:00 2018 -0400 Add explicit namespace to staging templates. commit 615f25fd9edaeb130568685b7422baf55da791d0 Author: Dan Allan <daniel.b.allan@gmail.com> Date: Mon Oct 8 18:59:57 2018 -0400 Disable New Relic. commit 8fc47a1f944876f916de49af63f030890ca4a3ba Author: Dan Allan <daniel.b.allan@gmail.com> Date: Mon Oct 8 18:56:24 2018 -0400 This was what we deployed to production the first time. commit bcb5acae74d307cb95bbe1171e4df3e7116a101d Merge: 5657da8 8c8404b Author: Jason Sherman <jsnshrmn@users.noreply.github.com> Date: Mon Oct 8 15:54:38 2018 -0500 Merge pull request #8 from danielballan/fixes-from-making-new-cluster Fixups from deploying on danallan.com commit 8c8404b7ffc8aaf696ccd814486b1523bee35567 Author: Dan Allan <daniel.b.allan@gmail.com> Date: Mon Oct 8 15:56:34 2018 -0400 Make it easier to copy/paste exports. commit 5657da80d8a4b0dbd13a0e80d7d9c88c6c64600a Author: Jason Sherman <jsn.sherman@gmail.com> Date: Mon Oct 8 14:20:10 2018 -0500 Initial logging cloudwatch implementation. commit 82eb85b7ad923afa9b0e23be5aa29152b6835d9b Author: Rob Brackett <rob@robbrackett.com> Date: Fri Oct 5 14:42:35 2018 -0700 Deploy edgi-govdata-archiving/web-monitoring-processing#278 and edgi-govdata-archiving/web-monitoring-processing#280 commit 9de7e2269664275b513a4c32f623a1326d2bd4d5 Author: Rob Brackett <rob@robbrackett.com> Date: Fri Oct 5 14:40:21 2018 -0700 Deploy edgi-govdata-archiving/web-monitoring-db#402 and edgi-govdata-archiving/web-monitoring-db#405 commit 72e95cdf04b27989f595577391c06b38ca512ddc Author: Dan Allan <daniel.b.allan@gmail.com> Date: Tue Oct 2 15:10:54 2018 -0600 Do not hard-code our HOSTED_ZONE but make it an env var. commit d3e3c8cbcf69287b0e7863d19588da34156605aa Author: Dan Allan <daniel.b.allan@gmail.com> Date: Tue Oct 2 14:35:29 2018 -0600 kops create ... was missing --state arg. commit 53019677d10489643ab82e5605cef3a55bb53d3a Author: Dan Allan <daniel.b.allan@gmail.com> Date: Tue Oct 2 14:35:16 2018 -0600 Remove dollar signs from lines for easier copy/paste. commit 974c5b04385245d1e3274d43e9a28fd92821339c Author: Rob Brackett <rob@robbrackett.com> Date: Wed Sep 26 17:10:29 2018 -0700 Deploy edgi-govdata-archiving/web-monitoring-ui#297 commit 107bbeff0782e44804dac9f6332cd3ac41c2e409 Author: Rob Brackett <rob@robbrackett.com> Date: Wed Sep 26 14:50:36 2018 -0700 Deploy edgi-govdata-archiving/web-monitoring-db#396, edgi-govdata-archiving/web-monitoring-db#397 commit 6fa306cc5c75675d79c95b48ab074d3e3abac436 Author: Rob Brackett <rob@robbrackett.com> Date: Wed Sep 26 10:09:43 2018 -0700 Deploy edgi-govdata-archiving/web-monitoring-db#397 commit d82a8aa678146363533a7dfe5655d78d1c309985 Author: Rob Brackett <rob@robbrackett.com> Date: Wed Sep 26 01:36:33 2018 -0700 Deploy edgi-govdata-archiving/web-monitoring-ui#290 and edgi-govdata-archiving/web-monitoring-ui#289. commit e4d1fd51d4011347ac8d45511f55cef2a9da41b3 Author: Rob Brackett <rob@robbrackett.com> Date: Wed Sep 12 15:05:00 2018 -0700 Switch to using EDGI's new S3 buckets Note this includes/requires a change to our secrets file. This is a subtask of edgi-govdata-archiving/web-monitoring#101 commit 22679aeddadfed7d1c17d415f534d6644a77eda5 Author: Rob Brackett <rob@robbrackett.com> Date: Wed Aug 29 21:24:42 2018 -0700 Deploy edgi-govdata-archiving/web-monitoring-processing#250 commit f7dfc82f10dac39e09a462ead906f862cdd38dea Author: Rob Brackett <rob@robbrackett.com> Date: Sun Aug 19 13:56:05 2018 -0700 Deploy edgi-govdata-archiving/web-monitoring-ui#246 commit ecd619de5c6f88dd37a98ffe1e9c28bac6880246 Author: Rob Brackett <rob@robbrackett.com> Date: Mon Jun 25 21:07:21 2018 -0700 Deploy edgi-govdata-archiving/web-monitoring-processing#200, edgi-govdata-archiving/web-monitoring-processing#201 commit 26a0cd2cbacf16f4641bce5d324cfad11b0ab167 Author: Rob Brackett <rob@robbrackett.com> Date: Tue Jun 19 17:36:16 2018 -0700 Deploy security fix edgi-govdata-archiving/web-monitoring-db#331 commit 055421da81f394b19c39b19f5f2290799aab0e28 Author: Rob Brackett <rob@robbrackett.com> Date: Tue Jun 19 17:12:12 2018 -0700 Deploy edgi-govdata-archiving/web-monitoring-processing#153 commit 3c5720e8825286fd44497d1754f799c828ca0cf9 Author: Rob Brackett <rob@robbrackett.com> Date: Wed Jun 13 12:58:42 2018 -0700 Remove deprecated/unused app configuration Also makes notes about other config we *should* change/remove. commit 5e4291a5448bb98d94fa2572a6b6abf93fcaa129 Author: Rob Brackett <rob@robbrackett.com> Date: Mon Jun 11 18:00:24 2018 -0700 Deploy hotfix edgi-govdata-archiving/web-monitoring-ui#227 commit 4fed9584eaa58f3b7eb228a172efcb46ab6115e1 Author: Rob Brackett <rob@robbrackett.com> Date: Mon Jun 11 16:39:35 2018 -0700 Reduce UI from 3 to 2 replicas UI does little to no heavy server work, so it doesn't really make sense to spend a lot of resources on it. Keep two in case one machine goes haywire, but we shouldn't need more than that. commit bbf1ef18066aab9ece0d938edeed7b4a2812d623 Author: Rob Brackett <rob@robbrackett.com> Date: Mon Jun 11 16:38:48 2018 -0700 Deploy edgi-govdata-archiving/web-monitoring-ui#225 commit e80e575da7285566998486c426fb73e778a882cc Author: Rob Brackett <rob@robbrackett.com> Date: Fri Jun 8 13:13:30 2018 -0700 Deploy edgi-govdata-archiving/web-monitoring-db#325 web-monitoring-db commit: 239b9d3858d5bef21811a2271f923089a320f53b commit cfccb2679bb32f05a0192a85811c1c0219b8c74f Author: Rob Brackett <rob@robbrackett.com> Date: Wed May 9 11:54:22 2018 -0700 Deploy edgi-govdata-archiving/web-monitoring-ui#218 commit d7de27db569fa433de60de1368cf0855a36301e7 Author: Rob Brackett <rob@robbrackett.com> Date: Mon Apr 16 23:15:08 2018 -0700 Deploy DB: import jobs not creating new pages edgi-govdata-archiving/web-monitoring-db#277 commit 0fb13c2b53b5f53c7d75a9eeb4a930b286673c3d Author: Rob Brackett <rob@robbrackett.com> Date: Mon Apr 16 22:48:02 2018 -0700 Deploy UI: Fix assigned pages infinite spinner (edgi-govdata-archiving/web-monitoring-ui#177) commit 50249006f6e384c3d0ff5f3799121a7614d7be43 Author: Rob Brackett <rob@robbrackett.com> Date: Sun Apr 8 22:08:24 2018 -0700 Update API server to hash validation & backoff Deploys edgi-govdata-archiving/web-monitoring-db#271, edgi-govdata-archiving/web-monitoring-db#272 commit 627f7cfa91ce50a7a2190239efcddfce4cf935c1 Author: Rob Brackett <rob@robbrackett.com> Date: Wed Apr 4 12:24:44 2018 -0700 Fix misconfigured S3 region for staging commit 75f39ddec356b232a22bc08af4c59ae663c6cb37 Author: Rob Brackett <rob@robbrackett.com> Date: Mon Mar 19 15:05:34 2018 -0700 Deploy DB dependency updates commit d9c9025746620d0cf3fddbc30880a76da85b33b1 Author: Rob Brackett <rob@robbrackett.com> Date: Mon Mar 19 13:19:26 2018 -0700 Deploy UI with tags/maintainers edgi-govdata-archiving/web-monitoring-ui#210 commit e9fef4a031531d837545cf11538eb699a3e9c101 Author: Rob Brackett <rob@robbrackett.com> Date: Tue Mar 13 23:38:59 2018 -0700 Update deployed images for all services commit c537f8e9dcdc564d5fcd71e0bb0499b460bf11d1 Author: Rob Brackett <rob@robbrackett.com> Date: Wed Jan 31 12:24:09 2018 -0800 Fix working bucket on import worker and API commit 75749547e7d8f9b7f266c0e4e7646c2c7bf22609 Author: Rob Brackett <rob@robbrackett.com> Date: Tue Jan 30 09:50:08 2018 -0800 Deploy new differ; add CACHE_DATE_DIFFER to API commit 234624f11d77847d22b169f79fb5db1a097d2709 Author: danielballan <daniel.b.allan@gmail.com> Date: Mon Jan 22 13:02:38 2018 -0500 Use new cert that includes monitoring-staging DNS entry. commit 4cc91745b9d3ed4da7cea61350f9aac9b004ba0d Author: danielballan <daniel.b.allan@gmail.com> Date: Mon Jan 22 12:32:34 2018 -0500 Fix typo commit b9a92a189aaf1591b8ebc9f000ac629933eb03ab Author: danielballan <daniel.b.allan@gmail.com> Date: Mon Jan 22 12:28:08 2018 -0500 Set HOST_URL in secrets. commit d10e0df0cd10150c8f2c7bb00d54ef1d78854abb Author: danielballan <daniel.b.allan@gmail.com> Date: Mon Jan 22 12:15:21 2018 -0500 Use new cert that includes monitoring-staging api-staging.monitoring DNS entry. commit b17fc8b8cb3779e5089d1d153212aaac7d5d40ee Author: Rob Brackett <rob@robbrackett.com> Date: Sat Jan 20 11:57:20 2018 -0800 Update configuration from deploying staging today! commit 0ab116797fb638f4b43f91ed578589a8655b8ad8 Author: danielballan <daniel.b.allan@gmail.com> Date: Sat Jan 20 11:07:57 2018 -0500 Add link to cheatsheet. commit 0e3d3e20d7e281486c3d1a2219163ff42f7a8163 Author: danielballan <daniel.b.allan@gmail.com> Date: Sat Jan 20 10:41:36 2018 -0500 Fix typo commit 0bc933e27fa12f2ce69df75429be6801b81117d2 Author: danielballan <daniel.b.allan@gmail.com> Date: Sat Jan 20 10:39:33 2018 -0500 Add install instructions. commit 0d8790cfe60e9d7a9345c830f3213073ef4f486b Author: danielballan <daniel.b.allan@gmail.com> Date: Sat Jan 20 10:14:10 2018 -0500 Tweak formatting. commit 3a35870266beccf8be8c5ba7c4e9021af3b3fd58 Author: danielballan <daniel.b.allan@gmail.com> Date: Sat Jan 20 10:10:47 2018 -0500 Fix Markdown formatting. commit df6f3e8c034f85954764fe492143b09ba2717ac5 Author: danielballan <daniel.b.allan@gmail.com> Date: Thu Jan 18 22:05:30 2018 -0500 Mention secrets in instructions. commit db59eaab596dd464f48191f0379b544c0b533d17 Author: danielballan <daniel.b.allan@gmail.com> Date: Thu Jan 18 14:44:43 2018 -0500 Simplify instructions and fix typos. commit 5f8e181098bdd73d829cc42bd6b10849dde6eeb2 Author: danielballan <daniel.b.allan@gmail.com> Date: Thu Jan 18 14:44:10 2018 -0500 Make naming consistent. commit ce91bcd209ff5af335b8ec878e625e79866d0e78 Author: danielballan <daniel.b.allan@gmail.com> Date: Thu Jan 18 13:45:33 2018 -0500 Update README after first successful CLI-only deployment. commit fd75afad8001e86a09ff2855fbf805848a3ab413 Author: danielballan <daniel.b.allan@gmail.com> Date: Thu Jan 18 12:55:22 2018 -0500 Fix typo commit 2438686b56e120ea47380b8e4fa24cd5192557b9 Author: danielballan <daniel.b.allan@gmail.com> Date: Thu Jan 18 12:51:39 2018 -0500 Update addresses. commit 8535080b5e113e59042664a0833e6455cc0bf2d2 Author: danielballan <daniel.b.allan@gmail.com> Date: Thu Jan 18 12:51:31 2018 -0500 Remove hard-coded namespace. commit 5ee5b6b8b0290a6ffe5d9453406f10ee18c11a03 Author: danielballan <daniel.b.allan@gmail.com> Date: Thu Jan 18 12:51:13 2018 -0500 Deployment works. commit cb5d79dbfa74e702deba07f8d313ae1dbcecf35e Author: danielballan <daniel.b.allan@gmail.com> Date: Wed Jan 17 23:24:41 2018 -0500 Set NAMESPACE before creating RDS. commit 47305ffa08fe21d7861d1458ec706cdfaf86232b Author: danielballan <daniel.b.allan@gmail.com> Date: Wed Jan 17 23:20:26 2018 -0500 Working instructions for creating the RDS from the CLI. commit c563adcc11418f129162f8489db7a699aa162b27 Author: danielballan <daniel.b.allan@gmail.com> Date: Wed Jan 17 19:24:23 2018 -0500 Explain namespaces more in the README. commit 858ee09bf935fb578b7972e344a7a001cd937df7 Author: danielballan <daniel.b.allan@gmail.com> Date: Wed Jan 17 19:22:32 2018 -0500 Format references. commit eaa32b6714e9fa1ff5d5e32a8acc4a6931e453f8 Author: danielballan <daniel.b.allan@gmail.com> Date: Wed Jan 17 19:21:39 2018 -0500 Remove unused file. commit cb910cf03fe94b6f09310d57d5f4c9c579289718 Author: danielballan <daniel.b.allan@gmail.com> Date: Wed Jan 17 19:08:17 2018 -0500 Fix Markdown formatting. commit b80372406b59e4ece420ff4c8a26d0fb6b77d7d9 Author: danielballan <daniel.b.allan@gmail.com> Date: Wed Jan 17 19:07:22 2018 -0500 initial commit
0 parents  commit cf4eef5

35 files changed

+2026
-0
lines changed
 

‎.gitignore

+7
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
templates/secrets.yaml
2+
templates/secrets.*.yaml
3+
templates/ui-secrets.yaml
4+
templates/ui-secrets.*.yaml
5+
*secrets*
6+
services.yaml
7+
!examples/services.yaml

‎README.md

+456
Large diffs are not rendered by default.

‎components.md

+111
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
# Components
2+
3+
In the section, you will install and configure all the components necessary to
4+
connect to and modify EDGI's Kubernetes cluster.
5+
6+
## This Repository
7+
8+
This repository contains templates that specify the configuration of the
9+
cluster. You will need a local copy.
10+
11+
```sh
12+
git clone https://github.com/edgi-govdata-archiving/web-monitoring-kube
13+
cd web-monitoring-kube
14+
```
15+
16+
## The Kubernetes Client, ``kubectl``
17+
18+
To operate on the cluster, we use ``kubectl``, a commandline program that runs
19+
on your local machine, connects to the cluster, and issues commands to the
20+
cluster.
21+
22+
The cluster is running version 1.10.3. Install a compatible version of the
23+
client (>= 1.10.2, <= 1.10.4).
24+
25+
[Install kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/)
26+
27+
## Keybase
28+
29+
To share secret files containing authentication keys and other sensitive
30+
configuration, the development team uses Keybase.
31+
32+
[Install Keybase](https://keybase.io/download)
33+
34+
If you do not have an account, you will be prompted to create one when you start
35+
to use keybase. Ask a member of the development team to invite you to the
36+
``edgi_wm_kube`` team.
37+
38+
## Kubernetes configuration
39+
40+
To connect to the cluster, you will need a configuration file that includes the
41+
address of the cluster and secret authentication information. Because this file
42+
contains secrets, it is not stored in this repository but rather shared via
43+
Keybase.
44+
45+
If you are not using Kubernetes to manage any other clusters, you can simply
46+
copy the file from Keybase:
47+
48+
```sh
49+
mkdir ~/.kube
50+
cp /keybase/team/edgi_wm_kube/kube_config.yaml ~/.kube/config
51+
```
52+
53+
If you have other clusters to manage, you will have to manually merge the
54+
contents of that file with your existing ``~/.kube/config``.
55+
56+
Set the context and verify that it worked:
57+
58+
```sh
59+
kubectl config set-context kube.monitoring.envirodatagov.org
60+
kubectl config current-context
61+
```
62+
63+
The output should be ``kube.monitoring.envirodatagov.org``.
64+
65+
## Try communicating with the cluster
66+
67+
```sh
68+
kubectl get nodes
69+
```
70+
71+
The output should something look like:
72+
73+
```
74+
NAME STATUS ROLES AGE VERSION
75+
ip-172-20-63-114.us-west-2.compute.internal Ready node 32d v1.10.3
76+
ip-172-20-63-2.us-west-2.compute.internal Ready master 32d v1.10.3
77+
ip-172-20-81-52.us-west-2.compute.internal Ready node 32d v1.10.3
78+
```
79+
80+
## Secrets
81+
82+
Templates containing secret configuration parameters are stored in Keybase as
83+
well. Copy them into your checkout of ``web-monitoring-kube`` like so:
84+
85+
```sh
86+
cp /keybase/team/edgi_wm_kube/secrets.production.yaml templates/production
87+
cp /keybase/team/edgi_wm_kube/secrets.staging.yaml templates/staging
88+
cp /keybase/team/edgi_wm_kube/ui-secrets.production.yaml templates/production
89+
cp /keybase/team/edgi_wm_kube/ui-secrets.staging.yaml templates/staging
90+
```
91+
92+
## Services
93+
94+
Services provide the network endpoints to access running pods. While most services contain no sensitive information (and are therefore in version control) a few web-monitoring services require sensitive information. Templates containing our local service configuration parameters are stored in Keybase as well. Copy them into your checkout of ``web-monitoring-kube`` like so:
95+
96+
```sh
97+
cp /keybase/team/edgi_wm_kube/services.production.yaml templates/production
98+
cp /keybase/team/edgi_wm_kube/services.staging.yaml templates/staging
99+
```
100+
101+
## Getting Oriented
102+
103+
In ``templates/``, there are separate directories corresponding to the
104+
*namespaces* in the Kubernetes cluster.
105+
106+
* ``kube-system`` -- cluter-wide objects related to capturing logs
107+
* ``production`` -- objects deployed to the production namespace
108+
* ``staging`` -- objects deployed to the staging namespace
109+
110+
The contents of the templates in ``production/`` and ``staging/`` differ only by
111+
their ``namespace: ...`` parameter and the values of the secrets.

‎create-db.jq

+12
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
{
2+
DBName: "web_monitoring_db",
3+
AllocatedStorage: 20,
4+
DBInstanceClass: "db.t2.medium",
5+
Engine: "postgres",
6+
MasterUserPassword: env.DB_PASSWORD,
7+
DBInstanceIdentifier: env.DB_INSTANCE_IDENTIFIER,
8+
MasterUsername: "master",
9+
PubliclyAccessible: true,
10+
VpcSecurityGroupIds: [env.NODES_SEC_GROUP],
11+
DBSubnetGroupName: "web-monitoring-db-subnet"
12+
}

‎create-ingress-alias.jq

+33
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
{
2+
HostedZoneId: env.KUBE_ZONE,
3+
ChangeBatch:
4+
{
5+
Comment: "create subdomains for rails server and ui",
6+
Changes: [
7+
{
8+
Action: "UPSERT",
9+
ResourceRecordSet: {
10+
Name: env.API_DNS_NAME,
11+
Type: "A",
12+
AliasTarget: {
13+
DNSName: env.API_TARGET,
14+
EvaluateTargetHealth: false,
15+
HostedZoneId: env.ELB_ZONE
16+
}
17+
}
18+
},
19+
{
20+
Action: "UPSERT",
21+
ResourceRecordSet: {
22+
Name: env.UI_DNS_NAME,
23+
Type: "A",
24+
AliasTarget: {
25+
DNSName: env.UI_TARGET,
26+
EvaluateTargetHealth: false,
27+
HostedZoneId: env.ELB_ZONE
28+
}
29+
}
30+
}
31+
]
32+
}
33+
}

‎examples/services.yaml

+55
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
---
2+
apiVersion: v1
3+
kind: Service
4+
metadata:
5+
name: rds
6+
namespace: example
7+
spec:
8+
type: ExternalName
9+
externalName: postgres.db.server.example.com
10+
ports:
11+
- port: 5432
12+
targetPort: 5432
13+
protocol: TCP
14+
---
15+
apiVersion: v1
16+
kind: Service
17+
metadata:
18+
name: api
19+
namespace: example
20+
annotations:
21+
service.beta.kubernetes.io/aws-load-balancer-ssl-cert: arn:aws:acm:us-east-1:123456789012:certificate/12345678-1234-1234-1234-123456789012
22+
service.beta.kubernetes.io/aws-load-balancer-backend-protocol: http
23+
service.beta.kubernetes.io/aws-load-balancer-ssl-ports: https
24+
spec:
25+
selector:
26+
app: api
27+
ports:
28+
- name: https
29+
port: 443
30+
targetPort: 3000
31+
- name: http
32+
port: 80
33+
targetPort: 3000
34+
type: LoadBalancer
35+
---
36+
apiVersion: v1
37+
kind: Service
38+
metadata:
39+
name: ui
40+
namespace: example
41+
annotations:
42+
service.beta.kubernetes.io/aws-load-balancer-ssl-cert: arn:aws:acm:us-east-1:123456789012:certificate/12345678-1234-1234-1234-123456789012
43+
service.beta.kubernetes.io/aws-load-balancer-backend-protocol: http
44+
service.beta.kubernetes.io/aws-load-balancer-ssl-ports: https
45+
spec:
46+
selector:
47+
app: ui
48+
ports:
49+
- name: https
50+
port: 443
51+
targetPort: 3001
52+
- name: http
53+
port: 80
54+
targetPort: 3001
55+
type: LoadBalancer
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
# 2018-10-09: API & Import Worker Infinitely Rebooting on Staging
2+
3+
## Summary
4+
5+
After updating the `api` and `import-worker` deployments, Kubernetes was stuck in a loop infinitely rebooting them. It turns out that this was caused by a badly base64-encoded secret. Specifically `cache-date-differ` ended with a newline character.
6+
7+
Updating the secrets to have a correct value and then replacing `api-deployment.yaml` and `import-worker-deployment.yaml` (by updating the `INCREMENTAL_UPDATE` env var) resolved the issue. It appears this happened because we pushed new secrets to staging *after* all the other deployment configs, so the issue didn’t crop up until the the next deployment change that used those secrets.
8+
9+
10+
## Timeline
11+
12+
All times in PDT.
13+
14+
### 2018-10-09 17:15
15+
16+
@Mr0grog pushed new deploy configurations to the cluster for both staging and production. Staging immediately started rebooting repeatedly; production was fine.
17+
18+
### 2018-10-09 17:20
19+
20+
@Mr0grog tried deleting the rebooting pods (rookie move, me!), which just made more rebooting pods.
21+
22+
### 2018-10-09 17:25
23+
24+
@Mr0grog checking the logs reveals only one log line, which is cryptic:
25+
26+
```
27+
starting container process caused "process_linux.go:295: setting oom score for ready process caused write /proc/3002/oom_score_adj: invalid argument
28+
```
29+
30+
Luckily, stackoverflow [gave us a good lead](https://stackoverflow.com/questions/49296359/kubernetes-secret-in-google-container-engine-fails-setting-oom-score-for-read) that the problem might be in decoding secrets. We checked all the secrets and noted that `cache-date-differ`, when decoded, ended with a newline, which seemed wrong. Re-encoding the correct value, replacing the secrets in the cluster, then replacing the deployment configs immediately resolved the issue.
31+
32+
### 2018-10-09 17:30
33+
34+
Incident resolved.
35+
36+
37+
## Lessons
38+
39+
### What Went Well
40+
41+
Kubernetes did its job swimmingly — although we were having problems, the staging environment was up and available the whole time (it never replaced one of the pods because the others were still restarting). At first, @Mr0grog was freaking out, but he quickly realized the staging service was still available the whole time. This was a huge relief.
42+
43+
### What Went Wrong
44+
45+
That is one cryptic and unhelpful error message. It’s also still unclear what the *real* issue was. The secrets value in question decoded fine; it just ended with a newline. Was Kubernetes choking on the newline? Or was the server crashing when it booted and tried to parse the string as a date, but Kubernetes swallowed what would have been a useful error log line and threw out the confusing error instead?
46+
47+
48+
## Action Items
49+
50+
51+
- We should only use the binary Data field where it is useful to have base64-encoded values. Examples could include actual binary data, or string data that would require some of its characters to be escaped. @jsnshrmn will stringify our secrets where convenient so that we can avoid the error-prone process of manually base64-encoding every secret.
52+
- Determine if additional measures, such as a script or system for validating our secrets files, are necessary.
53+
54+
55+
## Responders
56+
57+
- @Mr0grog
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
# 2018-10-10: Diff Service Locked Up the Whole Cluster
2+
3+
## Summary
4+
5+
The diffing service consumed all resources in all cluster nodes, locking up not only itself but also all our other services (API, Import Worker, UI, Redis). This appears to have happened as a result of the new automated analysis job (see https://github.com/edgi-govdata-archiving/web-monitoring-db/pull/406) requesting a source code diff of a PDF and an HTML page, specifically the change:
6+
7+
```
8+
0c2f9d24-df8e-48b4-919b-7d1eca02d3c4..7c9904f4-c090-4f0d-8c3f-ba3dc4fade25
9+
```
10+
11+
12+
## Timeline
13+
14+
All times in PDT.
15+
16+
### 2018-10-10 15:30
17+
18+
@Mr0grog pushed the new analysis code to production (https://github.com/danielballan/web-monitoring-kube/commit/22450e3120f80b53d753b46988e6085f88ed707d) and queued a day’s worth of analysis jobs. He watched the first ~350 work successfully, then headed to a meeting.
19+
20+
### 2018-10-10 16:45
21+
22+
@Mr0grog checked in on the analysis progress between meetings (Web Monitoring Analyst meeting was at 17:00) only to find that the cluster had spun up about 9 pods for each type of deployment, with each being listed as unknown status.
23+
24+
@Mr0grog posted the situation in Slack and proceeded to try and delete some pods and to look at the flood of errors starting on Sentry. Errors make it clear that the analysis job is hitting problems, so it appears this is another instance of the diff service running amok.
25+
26+
### 2018-10-10 17:12
27+
28+
@jsnshrmn signs on to help. Now that we have an idea of cause and two people, @jsnshrmn works on addressing Kubernetes, @Mr0grog works on patching code to avoid triggering the problem.
29+
30+
### 2018-10-10 17:28
31+
32+
@jsnshrmn clears out testing resources of all types (`kubectl --namespace=testing delete all --all`) since they are of only ocassional value, but were still consuming some resources. The remaining resources were forcibly removed (`kubectl --namespace=staging delete pods --all --force --grace-period=0` because they were stuck in an "Unknown" state.
33+
34+
@jsnshrmn clears out the cluster’s production resources by deleting each deployment (`kubectl delete deployment.apps/<appname>`) which should have deleted all pods. This tooks several minutes per deployment because many pods were in an "Unknown" state, and could not be gracefully evicted, eventually timing out. @jsnshrmn tries to gracefully shut down the remain pods which were all in an "Unknown" state (`kubectl --namespace=production delete pods --all`) and (`kubectl --namespace=production delete pod/<podname>`), but without success.
35+
36+
@jsnshrmn forcefully deleted all remaining pods (`kubectl --namespace=staging delete pods --all --force --grace-period=0`). This quiets things down. We think it is safe to start services again since we won’t have saved the state of any queues, so no more analysis jobs will be queued.
37+
38+
### 2018-10-10 17:34
39+
40+
@Mr0grog finishes writing a hotfix (https://github.com/edgi-govdata-archiving/web-monitoring-db/commit/6ea87de05d52d823f989e531dc1a600e25edab25) that basically causes the analysis job to check the URL for any file extensions that might indicate non-HTML content.
41+
42+
### 2018-10-10 17:42
43+
44+
@jsnshrmn also clears out all staging resources (we noticed that staging was still stuck, probably because everything is sharing just a couple nodes).
45+
46+
### 2018-10-10 17:45
47+
48+
Hotfix image is published.
49+
50+
### 2018-10-10 17:50
51+
52+
All services and deployments are recreated in staging and production (with the hotfix).
53+
54+
### 2018-10-10 17:55
55+
56+
Kubectl reports that all pods are stuck in a pending state.
57+
58+
### 2018-10-10 18:02
59+
60+
@jsnshrmn hard restarts one of the cluster nodes from the AWS console. That seems to give Kubernetes a nice kick in the pants and the pods start coming up.
61+
62+
### 2018-10-10 18:15
63+
64+
All nodes have been restarted, all pods in staging and production appear to be up and operational. Incident appears to be resolved.
65+
66+
### 2018-10-10 22:40
67+
68+
Issue recurs. @Mr0grog writes another hotfix that simply stops the problematic behavior (optimistically trying a diff if we aren’t sure that it won’t work): https://github.com/edgi-govdata-archiving/web-monitoring-db/commit/0a46db02b78d8332f974b86eafff067a09eb3ac2
69+
70+
### 2018-10-10 22:50
71+
72+
Pods still seem stuck. @Mr0grog attempts to stop one of the nodes; instead the node winds up terminated (not actually sure what happened here; it seems like Kubernetes did this itself when stopping ocurred; it looks like we should have been more careful to delete all services and pods first) and Kubernetes auto-created a new node.
73+
74+
### 2018-10-10 22:58
75+
76+
New node comes online and services are accessible, but clearly not stable. Sentry reports lots of odd connectivity errors and `kubectl` still reports lots of unknown status pods. @Mr0grog decides to give it a break for 30 minutes until imports complete before doing anything more damaging.
77+
78+
### 2018-10-10 23:35
79+
80+
Sentry errors seem to have stopped, all the parts of the cluster can communicate, and bad pods are no longer reported. It probably just took some time for everything to settle out after Kubernetes terminated and recreated a whole node.
81+
82+
@Mr0grog starts long Versionista archive job to recover data lost from 15:00 through now.
83+
84+
### 2018-10-10 23:55
85+
86+
Versionista jobs complete without major problems. Incident appears to be over.
87+
88+
### 2018-10-11 07:26
89+
90+
@jsnshrmn determines that implementing resource limits in a verifiable way requires core metrics, a set of services that are deployed by default in Kubernetes, but not in our KOPS deployment. Builtin diagnostic tools like `kubectl top <pods|nodes>` aren't working. @jsnshrmn notices that a previously deployed metrics gathering framework based on Prometheus and Grafana is broken.
91+
92+
### 2018-10-11 08:01
93+
94+
@jsnshrmn gets verification from @dallan that the old metrics framework can be deleted and does so (`kubectl --namespace=monitoring delete all --all; kubectl delete namespaces monitoring`).
95+
96+
### 2018-10-11 08:28
97+
98+
While checking the `kube-system` namespace (where core metrics would be deployed), @jsnshrmn discovered that pods related to proxy and logging are not functioning correctly.
99+
100+
### 2018-10-11 08:40
101+
102+
@jsnshrmn Performs `kubectl --namespace=kube-system inspect|logs` on pod, deployments, and consistently show various communication errors in various layers of the stack.
103+
104+
### 2018-10-11 09:07
105+
106+
@jsnshrmn notices that one of the errors shown in the proxy logs on this node is related to the dns pod that should be providing services for the node. @jsnshrmn deletes the dns node, and performs `kubectl --namespace=kube-system logs pods/<dnspodname>` and finds `Error response from daemon: grpc: the connection is unavailable`, yet another network error, this time from a pod that reported it was working before deletion. @jsnshrmn stops/starts the node via the EC2 console.
107+
108+
### 2018-10-11 11:35
109+
110+
Basic troubleshooting tools now work. Incident resolved.
111+
112+
113+
## Lessons
114+
115+
### What Went Well
116+
117+
Since we didn't configure the Kubernetes master node as a cluster node too, it stayed up; meaning that the cluster could be communicated with via kubectl. Our data, which is stored in an external database and AWS S3 buckets, was not lost. The Kubernetes scheduler behaved as documented.
118+
119+
### What Went Wrong
120+
121+
There is no magic in the Kubernetes scheduler. If it hasn't been given any specifications regarding the resources that may normally be consumed by a container, it assumes the container will consume *no resources at all* and may attempt to deploy infinite containers to a node. Additionally, if no resource limits are set, a single container is allowed to consume more than 100% of the resources on a node, potentially knocking it offline. We supplied the scheduler with no information about how many containers should be running simultanously on a single node nor resource limits for any of our containers. Because of this, the problematic diffing containers overwhelmed one node (which they were all deployed to) causing it to become unavailable. The scheduler dilligently noted that the specified number of container replicas were no longer available, and redeployed to the remaining cluster node to come into compliance with the replica count specified. Once the diffing pod was operational again, it then knocked the only remaining cluster node offline, leaving us with an outage.
122+
123+
124+
## Action Items
125+
126+
- Look into ways to set resource limits or node affinity ([processing#154](https://github.com/edgi-govdata-archiving/web-monitoring-processing/issues/154))
127+
- Clean up the ugly hotfix code ([db#411](https://github.com/edgi-govdata-archiving/web-monitoring-db/issues/411))
128+
- Protect `html_text_dmp` and `html_source_dmp` with content-type checking and sniffing ([processing#287](https://github.com/edgi-govdata-archiving/web-monitoring-processing/issues/287))
129+
130+
131+
## Responders
132+
133+
- @Mr0grog
134+
- @jsnshrmn
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
---
2+
apiVersion: v1
3+
kind: ServiceAccount
4+
metadata:
5+
name: fluentd
6+
namespace: kube-system
7+
8+
---
9+
apiVersion: rbac.authorization.k8s.io/v1beta1
10+
kind: ClusterRole
11+
metadata:
12+
name: fluentd
13+
namespace: kube-system
14+
rules:
15+
- apiGroups:
16+
- ""
17+
resources:
18+
- pods
19+
- namespaces
20+
verbs:
21+
- get
22+
- list
23+
- watch
24+
25+
---
26+
kind: ClusterRoleBinding
27+
apiVersion: rbac.authorization.k8s.io/v1beta1
28+
metadata:
29+
name: fluentd
30+
roleRef:
31+
kind: ClusterRole
32+
name: fluentd
33+
apiGroup: rbac.authorization.k8s.io
34+
subjects:
35+
- kind: ServiceAccount
36+
name: fluentd
37+
namespace: kube-system
38+
39+
---
40+
apiVersion: extensions/v1beta1
41+
kind: DaemonSet
42+
metadata:
43+
name: fluentd
44+
namespace: kube-system
45+
labels:
46+
k8s-app: fluentd-logging
47+
version: v1
48+
kubernetes.io/cluster-service: "true"
49+
spec:
50+
template:
51+
metadata:
52+
annotations:
53+
iam.amazonaws.com/role: us-west-2a.staging.kubernetes.ruist.io-service-role
54+
labels:
55+
k8s-app: fluentd-logging
56+
version: v1
57+
kubernetes.io/cluster-service: "true"
58+
spec:
59+
serviceAccount: fluentd
60+
serviceAccountName: fluentd
61+
tolerations:
62+
- key: node-role.kubernetes.io/master
63+
effect: NoSchedule
64+
containers:
65+
- name: fluentd
66+
image: fluent/fluentd-kubernetes-daemonset:cloudwatch
67+
env:
68+
- name: LOG_GROUP_NAME
69+
value: "k8s"
70+
- name: AWS_REGION
71+
value: "us-west-2"
72+
- name: FLUENT_UID
73+
value: "0"
74+
- name: FLUENTD_CONF
75+
value: web-monitoring/fluent.conf
76+
- name: K8S_NODE_NAME
77+
valueFrom:
78+
fieldRef:
79+
fieldPath: spec.nodeName
80+
resources:
81+
limits:
82+
memory: 200Mi
83+
requests:
84+
cpu: 100m
85+
memory: 200Mi
86+
volumeMounts:
87+
- name: varlog
88+
mountPath: /var/log
89+
- name: varlibdockercontainers
90+
mountPath: /var/lib/docker/containers
91+
readOnly: true
92+
- name: config-vol
93+
mountPath: /fluentd/etc/web-monitoring
94+
terminationGracePeriodSeconds: 30
95+
volumes:
96+
- name: varlog
97+
hostPath:
98+
path: /var/log
99+
- name: varlibdockercontainers
100+
hostPath:
101+
path: /var/lib/docker/containers
102+
- name: config-vol
103+
configMap:
104+
name: fluentd-config
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
---
2+
apiVersion: v1
3+
kind: ConfigMap
4+
metadata:
5+
name: fluentd-config
6+
namespace: kube-system
7+
data:
8+
fluent.conf: |
9+
@include /fluentd/etc/kubernetes.conf
10+
# Amend tags before the cloudwatch_logs plugin picks them up. Modifies
11+
# the name of the destination AWS cloudwatch log stream..
12+
<match kubernetes.var.log.containers.*.log>
13+
@type record_reformer
14+
renew_record false
15+
enable_ruby true
16+
tag ${record['kubernetes']['namespace_name']}.${record['kubernetes']['labels']['app']||record['kubernetes']['labels']['k8s-app']}.${record['kubernetes']['container_name']}.log
17+
</match>
18+
<match **>
19+
@type cloudwatch_logs
20+
@id out_cloudwatch_logs
21+
log_group_name "#{ENV['LOG_GROUP_NAME']}"
22+
auto_create_stream true
23+
use_tag_as_stream true
24+
</match>
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
---
2+
apiVersion: rbac.authorization.k8s.io/v1beta1
3+
kind: ClusterRoleBinding
4+
metadata:
5+
name: metrics-server:system:auth-delegator
6+
roleRef:
7+
apiGroup: rbac.authorization.k8s.io
8+
kind: ClusterRole
9+
name: system:auth-delegator
10+
subjects:
11+
- kind: ServiceAccount
12+
name: metrics-server
13+
namespace: kube-system
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
---
2+
apiVersion: rbac.authorization.k8s.io/v1beta1
3+
kind: RoleBinding
4+
metadata:
5+
name: metrics-server-auth-reader
6+
namespace: kube-system
7+
roleRef:
8+
apiGroup: rbac.authorization.k8s.io
9+
kind: Role
10+
name: extension-apiserver-authentication-reader
11+
subjects:
12+
- kind: ServiceAccount
13+
name: metrics-server
14+
namespace: kube-system
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
---
2+
apiVersion: apiregistration.k8s.io/v1beta1
3+
kind: APIService
4+
metadata:
5+
name: v1beta1.metrics.k8s.io
6+
spec:
7+
service:
8+
name: metrics-server
9+
namespace: kube-system
10+
group: metrics.k8s.io
11+
version: v1beta1
12+
insecureSkipTLSVerify: true
13+
groupPriorityMinimum: 100
14+
versionPriority: 100
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
---
2+
apiVersion: v1
3+
kind: ServiceAccount
4+
metadata:
5+
name: metrics-server
6+
namespace: kube-system
7+
---
8+
apiVersion: extensions/v1beta1
9+
kind: Deployment
10+
metadata:
11+
name: metrics-server
12+
namespace: kube-system
13+
labels:
14+
k8s-app: metrics-server
15+
spec:
16+
selector:
17+
matchLabels:
18+
k8s-app: metrics-server
19+
template:
20+
metadata:
21+
name: metrics-server
22+
labels:
23+
k8s-app: metrics-server
24+
spec:
25+
serviceAccountName: metrics-server
26+
volumes:
27+
# mount in tmp so we can safely use from-scratch images and/or read-only containers
28+
- name: tmp-dir
29+
emptyDir: {}
30+
containers:
31+
- name: metrics-server
32+
image: k8s.gcr.io/metrics-server-amd64:v0.3.0
33+
imagePullPolicy: Always
34+
# @TODO: properly configure TLS.
35+
command:
36+
- /metrics-server
37+
- --kubelet-insecure-tls
38+
volumeMounts:
39+
- name: tmp-dir
40+
mountPath: /tmp
41+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
---
2+
apiVersion: v1
3+
kind: Service
4+
metadata:
5+
name: metrics-server
6+
namespace: kube-system
7+
labels:
8+
kubernetes.io/name: "Metrics-server"
9+
spec:
10+
selector:
11+
k8s-app: metrics-server
12+
ports:
13+
- port: 443
14+
protocol: TCP
15+
targetPort: 443
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
---
2+
apiVersion: rbac.authorization.k8s.io/v1
3+
kind: ClusterRole
4+
metadata:
5+
name: system:metrics-server
6+
rules:
7+
- apiGroups:
8+
- ""
9+
resources:
10+
- pods
11+
- nodes
12+
- nodes/stats
13+
- namespaces
14+
verbs:
15+
- get
16+
- list
17+
- watch
18+
- apiGroups:
19+
- "extensions"
20+
resources:
21+
- deployments
22+
verbs:
23+
- get
24+
- list
25+
- watch
26+
---
27+
apiVersion: rbac.authorization.k8s.io/v1
28+
kind: ClusterRoleBinding
29+
metadata:
30+
name: system:metrics-server
31+
roleRef:
32+
apiGroup: rbac.authorization.k8s.io
33+
kind: ClusterRole
34+
name: system:metrics-server
35+
subjects:
36+
- kind: ServiceAccount
37+
name: metrics-server
38+
namespace: kube-system
+116
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
apiVersion: extensions/v1beta1
2+
kind: Deployment
3+
metadata:
4+
name: api
5+
namespace: production
6+
spec:
7+
replicas: 2
8+
template:
9+
metadata:
10+
labels:
11+
app: api
12+
spec:
13+
containers:
14+
- name: rails-server
15+
image: envirodgi/db-rails-server:629266d0d8fb87260343ed7a763de37653108efc
16+
imagePullPolicy: Always
17+
ports:
18+
- containerPort: 3000
19+
resources:
20+
requests:
21+
memory: "256Mi"
22+
cpu: "100m"
23+
limits:
24+
memory: "1024Mi"
25+
cpu: "1500m"
26+
readinessProbe:
27+
tcpSocket:
28+
port: 3000
29+
initialDelaySeconds: 5
30+
periodSeconds: 10
31+
livenessProbe:
32+
tcpSocket:
33+
port: 3000
34+
initialDelaySeconds: 5
35+
env:
36+
- name: ALLOWED_ARCHIVE_HOSTS
37+
value: "https://edgi-wm-versionista.s3.amazonaws.com/ https://edgi-wm-versionista.s3-us-west-2.amazonaws.com/ https://s3-us-west-2.amazonaws.com/edgi-wm-versionista/ https://edgi-versionista-archive.s3.amazonaws.com/ https://edgi-versionista-archive.s3.amazonaws.com/edgi-versionista-archive/"
38+
- name: AUTO_ANNOTATION_USER
39+
valueFrom:
40+
secretKeyRef:
41+
name: app-secrets
42+
key: auto_annotation_user
43+
- name: AWS_ACCESS_KEY_ID
44+
valueFrom:
45+
secretKeyRef:
46+
name: app-secrets
47+
key: aws_access_key_id
48+
- name: AWS_ARCHIVE_BUCKET
49+
value: edgi-wm-archive
50+
- name: AWS_REGION
51+
value: us-west-2
52+
- name: AWS_SECRET_ACCESS_KEY
53+
valueFrom:
54+
secretKeyRef:
55+
name: app-secrets
56+
key: aws_secret_access_key
57+
- name: AWS_WORKING_BUCKET
58+
value: edgi-wm-db-internal
59+
- name: DATABASE_URL
60+
valueFrom:
61+
secretKeyRef:
62+
name: app-secrets
63+
key: database_rds
64+
- name: DIFFER_DEFAULT
65+
value: http://diffing:80
66+
- name: CACHE_DATE_DIFFER
67+
value: "2018-10-23T07:00:00Z"
68+
# TODO: consider making this not a secret
69+
- name: HOST_URL
70+
valueFrom:
71+
secretKeyRef:
72+
name: app-secrets
73+
key: host_url
74+
- name: LANG
75+
value: en_US.UTF-8
76+
- name: MAIL_SENDER
77+
value: website.monitoring@envirodatagov.org
78+
- name: MAX_COLLECTION_PAGE_SIZE
79+
value: "1000"
80+
# FIXME: We don't have a New Relic account that isn't tied in with
81+
# Heroku. We can't really afford it and it's not doing us much good
82+
# right now anyway, so we should just remove all the config for it.
83+
- name: NEW_RELIC_AGENT_ENABLED
84+
value: "false"
85+
- name: POSTMARK_API_TOKEN
86+
valueFrom:
87+
secretKeyRef:
88+
name: app-secrets
89+
key: postmark_api_token
90+
- name: RACK_ENV
91+
value: production
92+
- name: RAILS_ENV
93+
value: production
94+
- name: RAILS_LOG_TO_STDOUT
95+
value: enabled
96+
- name: RAILS_SERVE_STATIC_FILES
97+
value: enabled
98+
- name: REDIS_URL
99+
value: redis://redis-master:6379
100+
- name: SECRET_KEY_BASE
101+
valueFrom:
102+
secretKeyRef:
103+
name: app-secrets
104+
key: secret_key_base
105+
- name: SENTRY_DSN
106+
valueFrom:
107+
secretKeyRef:
108+
name: app-secrets
109+
key: sentry_dsn
110+
- name: TOKEN_PRIVATE_KEY
111+
valueFrom:
112+
secretKeyRef:
113+
name: app-secrets
114+
key: token_private_key
115+
- name: INCREMENTAL_UPDATE
116+
value: "3"
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
apiVersion: extensions/v1beta1
2+
kind: Deployment
3+
metadata:
4+
name: diffing
5+
namespace: production
6+
spec:
7+
replicas: 4
8+
template:
9+
metadata:
10+
labels:
11+
app: diffing-server
12+
spec:
13+
containers:
14+
- name: processing
15+
image: envirodgi/processing:ce708709d66daa02a6c2811ee5b4d38f9543536a
16+
imagePullPolicy: Always
17+
ports:
18+
- containerPort: 80
19+
resources:
20+
requests:
21+
memory: "265Mi"
22+
cpu: "100m"
23+
limits:
24+
memory: "1024Mi"
25+
cpu: "500m"
26+
readinessProbe:
27+
tcpSocket:
28+
port: 80
29+
initialDelaySeconds: 5
30+
periodSeconds: 10
31+
livenessProbe:
32+
tcpSocket:
33+
port: 80
34+
initialDelaySeconds: 5
35+
periodSeconds: 10
36+
env:
37+
- name: DIFFER_COLOR_INSERTION
38+
value: "#a1d76a"
39+
- name: DIFFER_COLOR_DELETION
40+
value: "#e8a4c8"
+14
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
apiVersion: v1
2+
kind: Service
3+
metadata:
4+
name: diffing
5+
namespace: production
6+
spec:
7+
selector:
8+
app: diffing-server
9+
ports:
10+
- name: http
11+
protocol: TCP
12+
port: 80
13+
targetPort: 80
14+
type: ClusterIP
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
apiVersion: extensions/v1beta1
2+
kind: Deployment
3+
metadata:
4+
name: import-worker
5+
namespace: production
6+
spec:
7+
replicas: 2
8+
template:
9+
metadata:
10+
labels:
11+
app: import-worker
12+
spec:
13+
containers:
14+
- name: db-import-worker
15+
image: envirodgi/db-import-worker:629266d0d8fb87260343ed7a763de37653108efc
16+
imagePullPolicy: Always
17+
resources:
18+
requests:
19+
memory: "256Mi"
20+
cpu: "100m"
21+
limits:
22+
memory: "1024Mi"
23+
cpu: "1500m"
24+
env:
25+
- name: ALLOWED_ARCHIVE_HOSTS
26+
value: "https://edgi-wm-versionista.s3.amazonaws.com/ https://edgi-wm-versionista.s3-us-west-2.amazonaws.com/ https://s3-us-west-2.amazonaws.com/edgi-wm-versionista/ https://edgi-versionista-archive.s3.amazonaws.com/ https://edgi-versionista-archive.s3.amazonaws.com/edgi-versionista-archive/"
27+
# Set to "true" to be safe/conservative on what kinds of diffs to try
28+
# If we suddenly see the diff server going nuts, maybe uncomment this.
29+
# - name: ANALYSIS_REQUIRE_MEDIA_TYPE
30+
# value: "true"
31+
- name: AUTO_ANNOTATION_USER
32+
valueFrom:
33+
secretKeyRef:
34+
name: app-secrets
35+
key: auto_annotation_user
36+
- name: AWS_ACCESS_KEY_ID
37+
valueFrom:
38+
secretKeyRef:
39+
name: app-secrets
40+
key: aws_access_key_id
41+
- name: AWS_ARCHIVE_BUCKET
42+
value: edgi-wm-archive
43+
- name: AWS_REGION
44+
value: us-west-2
45+
- name: AWS_SECRET_ACCESS_KEY
46+
valueFrom:
47+
secretKeyRef:
48+
name: app-secrets
49+
key: aws_secret_access_key
50+
- name: AWS_WORKING_BUCKET
51+
value: edgi-wm-db-internal
52+
- name: CACHE_DATE_DIFFER
53+
value: "2018-10-23T07:00:00Z"
54+
- name: DATABASE_RDS
55+
valueFrom:
56+
secretKeyRef:
57+
name: app-secrets
58+
key: database_rds
59+
- name: DIFFER_DEFAULT
60+
value: http://diffing:80
61+
- name: HOST_URL
62+
valueFrom:
63+
secretKeyRef:
64+
name: app-secrets
65+
key: host_url
66+
- name: LANG
67+
value: en_US.UTF-8
68+
- name: MAIL_SENDER
69+
value: website.monitoring@envirodatagov.org
70+
- name: MAX_COLLECTION_PAGE_SIZE
71+
value: "1000"
72+
- name: NEW_RELIC_AGENT_ENABLED
73+
value: "false"
74+
- name: POSTMARK_API_TOKEN
75+
valueFrom:
76+
secretKeyRef:
77+
name: app-secrets
78+
key: postmark_api_token
79+
- name: RACK_ENV
80+
value: production
81+
- name: RAILS_ENV
82+
value: production
83+
- name: RAILS_LOG_TO_STDOUT
84+
value: enabled
85+
- name: RAILS_SERVE_STATIC_FILES
86+
value: enabled
87+
- name: REDIS_URL
88+
value: redis://redis-master:6379
89+
- name: REDIS_CACHE_URL
90+
valueFrom:
91+
secretKeyRef:
92+
name: app-secrets
93+
key: redis_cache_url
94+
- name: SECRET_KEY_BASE
95+
valueFrom:
96+
secretKeyRef:
97+
name: app-secrets
98+
key: secret_key_base
99+
- name: SENTRY_DSN
100+
valueFrom:
101+
secretKeyRef:
102+
name: app-secrets
103+
key: sentry_dsn
104+
- name: TOKEN_PRIVATE_KEY
105+
valueFrom:
106+
secretKeyRef:
107+
name: app-secrets
108+
key: token_private_key
109+
- name: INCREMENTAL_UPDATE
110+
value: "3"
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
apiVersion: extensions/v1beta1
2+
kind: Deployment
3+
metadata:
4+
name: redis-master
5+
namespace: production
6+
spec:
7+
replicas: 1
8+
template:
9+
metadata:
10+
labels:
11+
app: redis
12+
role: master
13+
tier: backend
14+
spec:
15+
containers:
16+
- name: master
17+
image: gcr.io/google_containers/redis:e2e # or just image: redis
18+
resources:
19+
requests:
20+
cpu: 100m
21+
memory: 100Mi
22+
ports:
23+
- containerPort: 6379
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
apiVersion: v1
2+
kind: Service
3+
metadata:
4+
name: redis-master
5+
namespace: production
6+
labels:
7+
app: redis
8+
role: master
9+
tier: backend
10+
spec:
11+
ports:
12+
- port: 6379
13+
targetPort: 6379
14+
selector:
15+
app: redis
16+
role: master
17+
tier: backend
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
apiVersion: extensions/v1beta1
2+
kind: Deployment
3+
metadata:
4+
name: redis-slave
5+
namespace: production
6+
spec:
7+
replicas: 2
8+
template:
9+
metadata:
10+
labels:
11+
app: redis
12+
role: slave
13+
tier: backend
14+
spec:
15+
containers:
16+
- name: slave
17+
image: gcr.io/google_samples/gb-redisslave:v1
18+
resources:
19+
requests:
20+
cpu: 100m
21+
memory: 100Mi
22+
env:
23+
- name: GET_HOSTS_FROM
24+
value: dns
25+
# If your cluster config does not include a dns service, then to
26+
# instead access an environment variable to find the master
27+
# service's host, comment out the 'value: dns' line above, and
28+
# uncomment the line below:
29+
# value: env
30+
ports:
31+
- containerPort: 6379
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
apiVersion: v1
2+
kind: Service
3+
metadata:
4+
name: redis-slave
5+
namespace: production
6+
labels:
7+
app: redis
8+
role: slave
9+
tier: backend
10+
spec:
11+
ports:
12+
- port: 6379
13+
selector:
14+
app: redis
15+
role: slave
16+
tier: backend
+69
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
apiVersion: extensions/v1beta1
2+
kind: Deployment
3+
metadata:
4+
name: ui
5+
namespace: production
6+
spec:
7+
replicas: 2
8+
template:
9+
metadata:
10+
labels:
11+
app: ui
12+
spec:
13+
containers:
14+
- name: ui
15+
image: envirodgi/ui:fd12f43b6818e06994aab358fc52ad3e74b5c6c4
16+
imagePullPolicy: Always
17+
ports:
18+
- containerPort: 3001
19+
resources:
20+
requests:
21+
memory: "256Mi"
22+
cpu: "100m"
23+
limits:
24+
memory: "1024Mi"
25+
cpu: "500m"
26+
readinessProbe:
27+
tcpSocket:
28+
port: 3001
29+
initialDelaySeconds: 5
30+
periodSeconds: 10
31+
livenessProbe:
32+
tcpSocket:
33+
port: 3001
34+
initialDelaySeconds: 5
35+
env:
36+
- name: FORCE_SSL
37+
value: "true"
38+
- name: GOOGLE_DICTIONARY_SHEET_ID
39+
valueFrom:
40+
secretKeyRef:
41+
name: ui-secrets
42+
key: google_dictionary_sheet_id
43+
- name: GOOGLE_IMPORTANT_CHANGE_SHEET_ID
44+
valueFrom:
45+
secretKeyRef:
46+
name: ui-secrets
47+
key: google_important_change_sheet_id
48+
- name: GOOGLE_SERVICE_CLIENT_EMAIL
49+
valueFrom:
50+
secretKeyRef:
51+
name: ui-secrets
52+
key: google_service_client_email
53+
- name: GOOGLE_SHEETS_PRIVATE_KEY
54+
valueFrom:
55+
secretKeyRef:
56+
name: ui-secrets
57+
key: google_sheets_private_key
58+
- name: GOOGLE_TASK_SHEET_ID
59+
valueFrom:
60+
secretKeyRef:
61+
name: ui-secrets
62+
key: google_task_sheet_id
63+
- name: WEB_MONITORING_DB_URL
64+
valueFrom:
65+
secretKeyRef:
66+
name: ui-secrets
67+
key: web_monitoring_db_url
68+
- name: INCREMENTAL_UPDATE
69+
value: "1"

‎templates/staging/api-deployment.yaml

+113
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
apiVersion: extensions/v1beta1
2+
kind: Deployment
3+
metadata:
4+
name: api
5+
namespace: staging
6+
spec:
7+
replicas: 2
8+
template:
9+
metadata:
10+
labels:
11+
app: api
12+
spec:
13+
containers:
14+
- name: rails-server
15+
image: envirodgi/db-rails-server:629266d0d8fb87260343ed7a763de37653108efc
16+
imagePullPolicy: Always
17+
ports:
18+
- containerPort: 3000
19+
resources:
20+
requests:
21+
memory: "256Mi"
22+
cpu: "100m"
23+
limits:
24+
memory: "1024Mi"
25+
cpu: "1500m"
26+
readinessProbe:
27+
tcpSocket:
28+
port: 3000
29+
initialDelaySeconds: 5
30+
periodSeconds: 10
31+
livenessProbe:
32+
tcpSocket:
33+
port: 3000
34+
initialDelaySeconds: 5
35+
env:
36+
- name: ALLOWED_ARCHIVE_HOSTS
37+
value: "https://edgi-wm-versionista.s3.amazonaws.com/ https://edgi-wm-versionista.s3-us-west-2.amazonaws.com/ https://s3-us-west-2.amazonaws.com/edgi-wm-versionista/ https://edgi-versionista-archive.s3.amazonaws.com/ https://edgi-versionista-archive.s3.amazonaws.com/edgi-versionista-archive/"
38+
- name: AUTO_ANNOTATION_USER
39+
valueFrom:
40+
secretKeyRef:
41+
name: app-secrets
42+
key: auto_annotation_user
43+
- name: AWS_ACCESS_KEY_ID
44+
valueFrom:
45+
secretKeyRef:
46+
name: app-secrets
47+
key: aws_access_key_id
48+
- name: AWS_ARCHIVE_BUCKET
49+
value: edgi-wm-archive-staging
50+
- name: AWS_REGION
51+
value: us-west-2
52+
- name: AWS_SECRET_ACCESS_KEY
53+
valueFrom:
54+
secretKeyRef:
55+
name: app-secrets
56+
key: aws_secret_access_key
57+
- name: AWS_WORKING_BUCKET
58+
value: edgi-wm-db-internal-staging
59+
- name: DATABASE_URL
60+
valueFrom:
61+
secretKeyRef:
62+
name: app-secrets
63+
key: database_rds
64+
- name: DIFFER_DEFAULT
65+
value: http://diffing:80
66+
- name: CACHE_DATE_DIFFER
67+
value: "2018-10-23T07:00:00Z"
68+
# TODO: consider making this not a secret
69+
- name: HOST_URL
70+
valueFrom:
71+
secretKeyRef:
72+
name: app-secrets
73+
key: host_url
74+
- name: LANG
75+
value: en_US.UTF-8
76+
- name: MAIL_SENDER
77+
value: website.monitoring@envirodatagov.org
78+
- name: MAX_COLLECTION_PAGE_SIZE
79+
value: "1000"
80+
- name: NEW_RELIC_AGENT_ENABLED
81+
value: "false"
82+
- name: POSTMARK_API_TOKEN
83+
valueFrom:
84+
secretKeyRef:
85+
name: app-secrets
86+
key: postmark_api_token
87+
- name: RACK_ENV
88+
value: production
89+
- name: RAILS_ENV
90+
value: production
91+
- name: RAILS_LOG_TO_STDOUT
92+
value: enabled
93+
- name: RAILS_SERVE_STATIC_FILES
94+
value: enabled
95+
- name: REDIS_URL
96+
value: redis://redis-master:6379
97+
- name: SECRET_KEY_BASE
98+
valueFrom:
99+
secretKeyRef:
100+
name: app-secrets
101+
key: secret_key_base
102+
- name: SENTRY_DSN
103+
valueFrom:
104+
secretKeyRef:
105+
name: app-secrets
106+
key: sentry_dsn
107+
- name: TOKEN_PRIVATE_KEY
108+
valueFrom:
109+
secretKeyRef:
110+
name: app-secrets
111+
key: token_private_key
112+
- name: INCREMENTAL_UPDATE
113+
value: "4"
+40
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
apiVersion: extensions/v1beta1
2+
kind: Deployment
3+
metadata:
4+
name: diffing
5+
namespace: staging
6+
spec:
7+
replicas: 2
8+
template:
9+
metadata:
10+
labels:
11+
app: diffing-server
12+
spec:
13+
containers:
14+
- name: processing
15+
image: envirodgi/processing:ce708709d66daa02a6c2811ee5b4d38f9543536a
16+
imagePullPolicy: Always
17+
ports:
18+
- containerPort: 80
19+
resources:
20+
requests:
21+
memory: "256Mi"
22+
cpu: "100m"
23+
limits:
24+
memory: "1024Mi"
25+
cpu: "500m"
26+
readinessProbe:
27+
tcpSocket:
28+
port: 80
29+
initialDelaySeconds: 5
30+
periodSeconds: 10
31+
livenessProbe:
32+
tcpSocket:
33+
port: 80
34+
initialDelaySeconds: 5
35+
periodSeconds: 10
36+
env:
37+
- name: DIFFER_COLOR_INSERTION
38+
value: "#a1d76a"
39+
- name: DIFFER_COLOR_DELETION
40+
value: "#e8a4c8"
+14
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
apiVersion: v1
2+
kind: Service
3+
metadata:
4+
name: diffing
5+
namespace: staging
6+
spec:
7+
selector:
8+
app: diffing-server
9+
ports:
10+
- name: http
11+
protocol: TCP
12+
port: 80
13+
targetPort: 80
14+
type: ClusterIP
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
apiVersion: extensions/v1beta1
2+
kind: Deployment
3+
metadata:
4+
name: import-worker
5+
namespace: staging
6+
spec:
7+
replicas: 2
8+
template:
9+
metadata:
10+
labels:
11+
app: import-worker
12+
spec:
13+
containers:
14+
- name: db-import-worker
15+
image: envirodgi/db-import-worker:629266d0d8fb87260343ed7a763de37653108efc
16+
imagePullPolicy: Always
17+
resources:
18+
requests:
19+
memory: "256Mi"
20+
cpu: "100m"
21+
limits:
22+
memory: "1024Mi"
23+
cpu: "1500m"
24+
env:
25+
- name: ALLOWED_ARCHIVE_HOSTS
26+
value: "https://edgi-wm-versionista.s3.amazonaws.com/ https://edgi-wm-versionista.s3-us-west-2.amazonaws.com/ https://s3-us-west-2.amazonaws.com/edgi-wm-versionista/ https://edgi-versionista-archive.s3.amazonaws.com/ https://edgi-versionista-archive.s3.amazonaws.com/edgi-versionista-archive/"
27+
# Set to "true" to be safe/conservative on what kinds of diffs to try
28+
# If we suddenly see the diff server going nuts, maybe uncomment this.
29+
# - name: ANALYSIS_REQUIRE_MEDIA_TYPE
30+
# value: "true"
31+
- name: AUTO_ANNOTATION_USER
32+
valueFrom:
33+
secretKeyRef:
34+
name: app-secrets
35+
key: auto_annotation_user
36+
- name: AWS_ACCESS_KEY_ID
37+
valueFrom:
38+
secretKeyRef:
39+
name: app-secrets
40+
key: aws_access_key_id
41+
- name: AWS_ARCHIVE_BUCKET
42+
value: edgi-wm-archive-staging
43+
- name: AWS_REGION
44+
value: us-west-2
45+
- name: AWS_SECRET_ACCESS_KEY
46+
valueFrom:
47+
secretKeyRef:
48+
name: app-secrets
49+
key: aws_secret_access_key
50+
- name: AWS_WORKING_BUCKET
51+
value: edgi-wm-db-internal-staging
52+
- name: CACHE_DATE_DIFFER
53+
value: "2018-10-23T07:00:00Z"
54+
- name: DATABASE_RDS
55+
valueFrom:
56+
secretKeyRef:
57+
name: app-secrets
58+
key: database_rds
59+
- name: DIFFER_DEFAULT
60+
value: http://diffing:80
61+
- name: HOST_URL
62+
valueFrom:
63+
secretKeyRef:
64+
name: app-secrets
65+
key: host_url
66+
- name: LANG
67+
value: en_US.UTF-8
68+
- name: MAIL_SENDER
69+
value: website.monitoring@envirodatagov.org
70+
- name: MAX_COLLECTION_PAGE_SIZE
71+
value: "1000"
72+
- name: NEW_RELIC_AGENT_ENABLED
73+
value: "false"
74+
- name: POSTMARK_API_TOKEN
75+
valueFrom:
76+
secretKeyRef:
77+
name: app-secrets
78+
key: postmark_api_token
79+
- name: RACK_ENV
80+
value: production
81+
- name: RAILS_ENV
82+
value: production
83+
- name: RAILS_LOG_TO_STDOUT
84+
value: enabled
85+
- name: RAILS_SERVE_STATIC_FILES
86+
value: enabled
87+
- name: REDIS_URL
88+
value: redis://redis-master:6379
89+
- name: SECRET_KEY_BASE
90+
valueFrom:
91+
secretKeyRef:
92+
name: app-secrets
93+
key: secret_key_base
94+
- name: SENTRY_DSN
95+
valueFrom:
96+
secretKeyRef:
97+
name: app-secrets
98+
key: sentry_dsn
99+
- name: TOKEN_PRIVATE_KEY
100+
valueFrom:
101+
secretKeyRef:
102+
name: app-secrets
103+
key: token_private_key
104+
- name: INCREMENTAL_UPDATE
105+
value: "4"
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
apiVersion: extensions/v1beta1
2+
kind: Deployment
3+
metadata:
4+
name: redis-master
5+
namespace: staging
6+
spec:
7+
replicas: 1
8+
template:
9+
metadata:
10+
labels:
11+
app: redis
12+
role: master
13+
tier: backend
14+
spec:
15+
containers:
16+
- name: master
17+
image: gcr.io/google_containers/redis:e2e # or just image: redis
18+
resources:
19+
requests:
20+
cpu: 100m
21+
memory: 100Mi
22+
ports:
23+
- containerPort: 6379
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
apiVersion: v1
2+
kind: Service
3+
metadata:
4+
name: redis-master
5+
namespace: staging
6+
labels:
7+
app: redis
8+
role: master
9+
tier: backend
10+
spec:
11+
ports:
12+
- port: 6379
13+
targetPort: 6379
14+
selector:
15+
app: redis
16+
role: master
17+
tier: backend
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
apiVersion: extensions/v1beta1
2+
kind: Deployment
3+
metadata:
4+
name: redis-slave
5+
namespace: staging
6+
spec:
7+
replicas: 2
8+
template:
9+
metadata:
10+
labels:
11+
app: redis
12+
role: slave
13+
tier: backend
14+
spec:
15+
containers:
16+
- name: slave
17+
image: gcr.io/google_samples/gb-redisslave:v1
18+
resources:
19+
requests:
20+
cpu: 100m
21+
memory: 100Mi
22+
env:
23+
- name: GET_HOSTS_FROM
24+
value: dns
25+
# If your cluster config does not include a dns service, then to
26+
# instead access an environment variable to find the master
27+
# service's host, comment out the 'value: dns' line above, and
28+
# uncomment the line below:
29+
# value: env
30+
ports:
31+
- containerPort: 6379
+16
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
apiVersion: v1
2+
kind: Service
3+
metadata:
4+
name: redis-slave
5+
namespace: staging
6+
labels:
7+
app: redis
8+
role: slave
9+
tier: backend
10+
spec:
11+
ports:
12+
- port: 6379
13+
selector:
14+
app: redis
15+
role: slave
16+
tier: backend

‎templates/staging/ui-deployment.yaml

+69
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
apiVersion: extensions/v1beta1
2+
kind: Deployment
3+
metadata:
4+
name: ui
5+
namespace: staging
6+
spec:
7+
replicas: 2
8+
template:
9+
metadata:
10+
labels:
11+
app: ui
12+
spec:
13+
containers:
14+
- name: ui
15+
image: envirodgi/ui:fd12f43b6818e06994aab358fc52ad3e74b5c6c4
16+
imagePullPolicy: Always
17+
ports:
18+
- containerPort: 3001
19+
resources:
20+
requests:
21+
memory: "256Mi"
22+
cpu: "100m"
23+
limits:
24+
memory: "1024Mi"
25+
cpu: "500m"
26+
readinessProbe:
27+
tcpSocket:
28+
port: 3001
29+
initialDelaySeconds: 5
30+
periodSeconds: 10
31+
livenessProbe:
32+
tcpSocket:
33+
port: 3001
34+
initialDelaySeconds: 5
35+
env:
36+
- name: FORCE_SSL
37+
value: "true"
38+
- name: GOOGLE_DICTIONARY_SHEET_ID
39+
valueFrom:
40+
secretKeyRef:
41+
name: ui-secrets
42+
key: google_dictionary_sheet_id
43+
- name: GOOGLE_IMPORTANT_CHANGE_SHEET_ID
44+
valueFrom:
45+
secretKeyRef:
46+
name: ui-secrets
47+
key: google_important_change_sheet_id
48+
- name: GOOGLE_SERVICE_CLIENT_EMAIL
49+
valueFrom:
50+
secretKeyRef:
51+
name: ui-secrets
52+
key: google_service_client_email
53+
- name: GOOGLE_SHEETS_PRIVATE_KEY
54+
valueFrom:
55+
secretKeyRef:
56+
name: ui-secrets
57+
key: google_sheets_private_key
58+
- name: GOOGLE_TASK_SHEET_ID
59+
valueFrom:
60+
secretKeyRef:
61+
name: ui-secrets
62+
key: google_task_sheet_id
63+
- name: WEB_MONITORING_DB_URL
64+
valueFrom:
65+
secretKeyRef:
66+
name: ui-secrets
67+
key: web_monitoring_db_url
68+
- name: INCREMENTAL_UPDATE
69+
value: "1"

‎validate-certs.jq

+34
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
{
2+
"HostedZoneId": env.KUBE_ZONE,
3+
"ChangeBatch": {
4+
"Comment": "validate certificates for rails server and ui",
5+
"Changes": [
6+
{
7+
"Action": "CREATE",
8+
"ResourceRecordSet": {
9+
"Name": env.API_VALIDATE_NAME,
10+
"Type": "CNAME",
11+
"TTL": 300,
12+
"ResourceRecords": [
13+
{
14+
"Value": env.API_VALIDATE_VALUE
15+
}
16+
],
17+
}
18+
},
19+
{
20+
"Action": "CREATE",
21+
"ResourceRecordSet": {
22+
"Name": env.UI_VALIDATE_NAME,
23+
"Type": "CNAME",
24+
"TTL": 300,
25+
"ResourceRecords": [
26+
{
27+
"Value": env.UI_VALIDATE_VALUE
28+
}
29+
],
30+
}
31+
}
32+
]
33+
}
34+
}

0 commit comments

Comments
 (0)
Please sign in to comment.