Skip to content

[usage] Upload usage reports to cloud storage #11519

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 25, 2022

Conversation

andrew-farries
Copy link
Contributor

@andrew-farries andrew-farries commented Jul 21, 2022

Description

As part of the move towards usage based pricing (#9036), we'd like for the usage aggregator (components/usage) to be able to upload its usage reports to cloud storage. This will provide an audit trail of usage reports, allowing us to cross reference usage entries in the database with the usage reports that provided the data. In future, we may also allow access to these reports to users directly.

This is the third part of the PR stack:

#11474: Add a means of getting signed S3 upload URLs from content-service.
#11493: Request the signed upload URLs from the usage component.
This PR: Use the upload URL to upload compressed usage reports to object storage and stop writing the reports to the container filesystem.

Related Issue(s)

Part of #9036

How to test

  • Port forward to minio in the preview environment:
kubectl port-forward svc/minio 9000:9000
  • Get the access key and secret key for the preview minio instance:
kubectl exec deploy/content-service -- cat /config/config.json
mc alias set mm http://localhost:9000
  • Change the usage component to run reconciliation every 30 seconds or so:
kubectl edit cm usage
 # edit the `controllerSchedule` field
  • Tail the logs of the usage component:
kubectl logs -f deploy/usage
  • See that when the usage component runs reconciliation, the resulting usage report will be uploaded to minio in-cluster storage:
mc ls mm/usage-reports
  • Download and unzip the usage report to see that it has the expected contents:
mc cp mm/usage-reports/<some report file> .
gunzip <filename>.gz
cat <filename>

Release Notes

NONE

Werft options:

  • /werft with-preview

@andrew-farries andrew-farries requested a review from a team July 21, 2022 06:53
@github-actions github-actions bot added the team: webapp Issue belongs to the WebApp team label Jul 21, 2022
@werft-gitpod-dev-com
Copy link

started the job as gitpod-build-af-write-usage-reports-to-cloud-storage.11 because the annotations in the pull request description changed
(with .werft/ from main)

@andrew-farries
Copy link
Contributor Author

/hold because its based on #11493

@andrew-farries andrew-farries marked this pull request as draft July 21, 2022 07:46
@andrew-farries andrew-farries force-pushed the af/upload-usage-reports branch 2 times, most recently from 7d5fa5d to 3ef8e01 Compare July 21, 2022 08:42
@andrew-farries andrew-farries force-pushed the af/write-usage-reports-to-cloud-storage branch from 28eb811 to e66f8b7 Compare July 21, 2022 09:51
@andrew-farries andrew-farries force-pushed the af/upload-usage-reports branch from 3ef8e01 to 68c28fc Compare July 22, 2022 06:10
@andrew-farries andrew-farries force-pushed the af/write-usage-reports-to-cloud-storage branch 3 times, most recently from b1f0719 to cc18f67 Compare July 22, 2022 11:03
@andrew-farries andrew-farries force-pushed the af/upload-usage-reports branch from aed2c3f to f149203 Compare July 22, 2022 11:19
@andrew-farries andrew-farries force-pushed the af/write-usage-reports-to-cloud-storage branch from cc18f67 to 5bbc6ed Compare July 22, 2022 11:22
@andrew-farries andrew-farries marked this pull request as ready for review July 22, 2022 11:25
@werft-gitpod-dev-com
Copy link

started the job as gitpod-build-af-write-usage-reports-to-cloud-storage.17 because the annotations in the pull request description changed
(with .werft/ from main)

@werft-gitpod-dev-com
Copy link

started the job as gitpod-build-af-write-usage-reports-to-cloud-storage.18 because the annotations in the pull request description changed
(with .werft/ from main)

@geropl
Copy link
Member

geropl commented Jul 23, 2022

@andrew-farries Blocked by this PR: #11596

@andrew-farries andrew-farries force-pushed the af/upload-usage-reports branch from f149203 to 7d88600 Compare July 23, 2022 18:32
@andrew-farries andrew-farries force-pushed the af/write-usage-reports-to-cloud-storage branch from 5bbc6ed to eeb3bf9 Compare July 23, 2022 18:34
Base automatically changed from af/upload-usage-reports to main July 25, 2022 08:56
@roboquat roboquat added size/L and removed size/M labels Jul 25, 2022
Make the `GetSignedUploadUrl` method unexported and remove it from the
interface.

Compress reports before uploading to obj storage

* Stop writing the usage report to disk.
* Compress the report in memory.
* Upload the compressed report to object storage.
@andrew-farries andrew-farries force-pushed the af/write-usage-reports-to-cloud-storage branch from eeb3bf9 to b8c20bd Compare July 25, 2022 08:57
@roboquat roboquat added size/M and removed size/L labels Jul 25, 2022
return fmt.Errorf("failed to construct http request: %w", err)
}

req.Header.Set("Content-Encoding", "gzip")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this cause the content to be gzipped on the server? I would've expected you set this to indicate that the content you're uploading is gzip but I don't see us compressing in this implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See this comment.

Comment on lines 78 to 80
reportBytes := &bytes.Buffer{}
gz := gzip.NewWriter(reportBytes)
err = json.NewEncoder(gz).Encode(report)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the json report is gzipped.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend moving this into the Storage Client. As it stands it's confusing because the client sets the content type gzipped, but nothing in the API signature indicates it should be gzipped ahead of it.

Ideally, the Client takes a bytes buffer and wraps it in a gzip writer, then writes the whole thing as a stream

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even can take an io.Writer which is a much better interface for wrapping

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UploadFile method really should only take usage reports - this object storage signed upload URL that it obtains will place the file in the usage-report bucket.

As such, I've renamed the method on the interface and changed its argument to be a report rather than an io.Reader (aa9f2d4)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also moved the json encoding and compression of the report away from the caller and into the method.

The method will upload to the usage-records bucket so should not take
arbitrary inputs, only usage reports.

Do the encoding and gzipping of the report in the method rather than
the caller.
@roboquat roboquat added size/L and removed size/M labels Jul 25, 2022
@andrew-farries
Copy link
Contributor Author

/unhold

@roboquat roboquat merged commit 8ce9022 into main Jul 25, 2022
@roboquat roboquat deleted the af/write-usage-reports-to-cloud-storage branch July 25, 2022 14:09
@roboquat roboquat added deployed: webapp Meta team change is running in production deployed Change is completely running in production labels Jul 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deployed: webapp Meta team change is running in production deployed Change is completely running in production release-note-none size/L team: webapp Issue belongs to the WebApp team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants