Skip to content

Conversation

daytime-saxophone
Copy link

Azure is just really awful and slow to work with, and using sops with a azkv key to edit secrets from e.g. your laptop where you are authenticating with your user and not through a SP/MSI is nightmarishly slow.

To improve the situation this PR implements:

  • Allow explicitly specifying auth method through environement variable, this bypass the need to go through the default authentication chain.
  • Adds two cachable auth methods InteractiveBrowserCredential and DeviceCodeCredential which adds a very significant speedup compared to AzureCliCredential. See below.

The time to unlock a secret using sops v3.9.4 and authenticating as a user through azure-cli:

time sops -d ./path/to/secret
4.77s user 1.17s system 82% cpu 7.217 total

The time to unlock a secret using my PR and authenticating as a user with InteractiveBrowserCredential after the initial authentication:

> time SOPS_AZURE_AUTH_METHOD="BROWSER" sops -d ./path/to/secret
0.08s user 0.05s system 25% cpu 0.511 total

Related issues: #885, #1606

I'm happy to add documentation, tests or whatever is needed. I just don't want to go through the trouble before I know if the design is acceptable, and that there is a chance that this could be merged.

@daytime-saxophone daytime-saxophone marked this pull request as ready for review February 28, 2025 14:20
@felixfontein
Copy link
Contributor

Thanks for your contribution! I've been looking at this and thinking about this since yesterday, and I think the general idea is sound. (I'm not an Azure user though... I only use AZP for another open source project, and if you're writing that Azure is awful and slow to work with, I can guess what you mean since AZP definitely also falls into that category :D)

I'm a bit hesitant on the caching parts, since that writes credentials to disk (I guess not in an encrypted form, and even if its encrypted, it likely won't be anything safe). This should definitely be explicitly mentioned in the documentation, and maybe it would be better to rename the corresponding values for SOPS_AZURE_AUTH_METHOD to include "caching on disk" (I first just wanted to suggest to include that it's caching, but then I thought that later one might want to cache it somewhere else, like in the OS'es keyring, or some other agent, so including the method where it's cached is probably a good idea). WDYT?

I also have some smaller comments, I'm adding them here (even though they're not relevant to the general design) before I forget about them:

  • Please undo the changes in go.mod to go and toolchain, that's likely the reason that the builds fails in CI.
  • I'd probably move most of the new functions to a separate file, say azkv/credentials.go.
  • Is there a reason the environment variable values are upper-case? I think lower-case is more common. (They also used to be lower-case with the AZURE_AUTH_METHOD env variable that was removed in azkv: update SDK to latest, add tests, tidy #1067.)
  • The error message in the switch's default case should mention that it's an environment variable. Someone might have forgotten that the environment variable is set and be wondering where to find this SOPS_AZURE_AUTH_METHOD setting.
  • For the cached credential filenames, I'd use dashes instead of underscores.
  • Generally I'd move the strings used (env variable values, cache filenames) into constants.

CC @hiddeco, since you created #1067 which removed the AZURE_AUTH_METHOD environment variable.

@felixfontein
Copy link
Contributor

(And I think it would be better if Azure's SDK would take care of things like this in its default auth flow, or would offer another auth flow that allows the user to configure this via env variables. But if it doens't offer that, I guess we have to do it ourselves...)

@daytime-saxophone
Copy link
Author

The AuthenticationRecord that I am storing in the function cacheStoreRecord is not actually a secret, it only contains information like identifiers, version, username, etc.

The actual storing of the token is handled by the azure-sdk, the tokens are encrypted at rest via an os specific mechanisms:

Persistent caches are encrypted at rest using a mechanism that depends on the operating system see docs

  • Linux: kernel key retention service (keyctl)
  • Macos: Keychain (requires cgo and native build tools)
  • Windows: Data Protection API (DPAPI)

Their respective implementations can be found here:

This should definitely be explicitly mentioned in the documentation, and maybe it would be better to rename the corresponding values for SOPS_AZURE_AUTH_METHOD to include "caching on disk" (I first just wanted to suggest to include that it's caching, but then I thought that later one might want to cache it somewhere else, like in the OS'es keyring, or some other agent, so including the method where it's cached is probably a good idea). WDYT?

Since the token is encrypted using a method specfic to the OS it makes it difficult to think of what to add as a contextual information in SOPS_AZURE_AUTH_METHOD, the best I can come up with is: browser_cache_in_keyring and device_code_cache_in_keyring. Does that seem fine?

I'll submit an update in the next few days with the fixes to the comments, adding some docs and tests (if I can figure something out to get around the interactive parts).

I just have one question: should I keep the environment variable SOPS_AZURE_AUTH_METHOD or should I change it back to AZURE_AUTH_METHOD? The context for me changing it is that it was not clear to me when I was using AZURE_AUTH_METHOD that it was a sops specific variable until digging through the code. That's why I figured it might be good to make that clear by prefixing it, kind of like SOPS_AGE_KEY is prefixed with sops. I'm happy to defer to you judgement on this though.

@felixfontein
Copy link
Contributor

The AuthenticationRecord that I am storing in the function cacheStoreRecord is not actually a secret, it only contains information like identifiers, version, username, etc.

The actual storing of the token is handled by the azure-sdk, the tokens are encrypted at rest via an os specific mechanisms:

Persistent caches are encrypted at rest using a mechanism that depends on the operating system see docs

Thanks for clarifying that! In that case, I don't mind the caching on disk.

This should definitely be explicitly mentioned in the documentation, and maybe it would be better to rename the corresponding values for SOPS_AZURE_AUTH_METHOD to include "caching on disk" (I first just wanted to suggest to include that it's caching, but then I thought that later one might want to cache it somewhere else, like in the OS'es keyring, or some other agent, so including the method where it's cached is probably a good idea). WDYT?

Since the token is encrypted using a method specfic to the OS it makes it difficult to think of what to add as a contextual information in SOPS_AZURE_AUTH_METHOD, the best I can come up with is: browser_cache_in_keyring and device_code_cache_in_keyring. Does that seem fine?

Since no sensitive information is cached, I think it's OK to not use a prefix, or simply use cached- or something like that (without more details).

I just have one question: should I keep the environment variable SOPS_AZURE_AUTH_METHOD or should I change it back to AZURE_AUTH_METHOD? The context for me changing it is that it was not clear to me when I was using AZURE_AUTH_METHOD that it was a sops specific variable until digging through the code. That's why I figured it might be good to make that clear by prefixing it, kind of like SOPS_AGE_KEY is prefixed with sops. I'm happy to defer to you judgement on this though.

I would keep the prefix. I think using env variables without a SOPS_ prefix is only a good idea if they use same convention as the tool they hint at (which is usually best handled by a library provided by that tool). Also the new environment variable uses different values and semantics than the previous one, so its previous existence is no reason to re-use that name.

@daytime-saxophone
Copy link
Author

I've updated the minor comments and added docs for the feature. However I am not sure what to do about the go and toolchain version in go.mod file. This auto upgrading behaviour of the go toolchain is a bit new to me.

If I run go mod tidy with the toolchain and go version that is specified in the go.mod file it gives me this error:

> GOTOOLCHAIN=go1.23.6 go mod tidy -go=1.22
go: cloud.google.com/go/[email protected] requires [email protected], but 1.22 is requested

If I run go mod tidy with the toolchain as the latest go1.22 patch release it gives me this error:

> GOTOOLCHAIN=go1.22.12 go mod tidy 
go: golang.org/x/[email protected] requires go >= 1.23.0 (running go 1.22.12; GOTOOLCHAIN=go1.22.12)

If I run go mod tidy with the toolchain that is specified in the go.mod but don't specify -go=<version> it succeeds but it modifies the go version in the go.mod file.

GOTOOLCHAIN=go1.23.6 go mod tidy

@felixfontein
Copy link
Contributor

Please note that you must sign off every single commit, otherwise we cannot merge this (see the failing DCO test).

@daytime-saxophone daytime-saxophone force-pushed the main branch 2 times, most recently from 5515ec0 to 081c0f1 Compare April 11, 2025 10:55
@daytime-saxophone
Copy link
Author

Please note that you must sign off every single commit, otherwise we cannot merge this (see the failing DCO test).
I've rebased all the changes to a single commit, will sign the DCO.

Since the main branch has updated the go version in the go.mod file that also fixed the issues I was having where go wanted to upgrade to 1.23. So hopefully the tests will pass now.

Since we've been using the forked version for a little while now I've realized that on the initial authentication the browser and device-code login auth methods would pollute the stdout, I've provided a fix to this issue in this PR.

…user authentication methods

Signed-off-by: Isak Johansson <[email protected]>
Copy link
Member

@hiddeco hiddeco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To actually get this into a shape where it is mergable, changes have to be made to build the macOS binaries with CGO enabled. This will likely raise some issues around cross-compilation.

If we go the CGO route, we should probably plan out how this affects our CI, release process, and broader contributor experience.

Comment on lines +266 to +279
func sopsCacheDir() (string, error) {
userCacheDir, err := os.UserCacheDir()
if err != nil {
return "", err
}

cacheDir := filepath.Join(userCacheDir, "/sops")

if err = os.MkdirAll(cacheDir, 0o700); err != nil {
return "", err
}

return cacheDir, nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to create a cache package, with something like:

var (
	cacheDir     string
	cacheDirErr  error
	cacheDirOnce sync.Once
)

// Directory returns the path to the SOPS cache directory, creating it if it
// does not exist. The directory with the name "sops" is created in the user's
// cache directory (see [os.UserCacheDir]). Once created, it will not be
// recreated, even if the function is called multiple times. If an error occurs
// while creating the directory, it will be returned on later calls.
func Directory() (string, error) {
	cacheDirOnce.Do(func() {
		base, err := os.UserCacheDir()
		if err != nil {
			cacheDirErr = err
			return
		}

		cacheDir = filepath.Join(base, "sops")
		cacheDirErr = os.MkdirAll(cacheDir, 0o700)
	})
	if cacheDirErr != nil {
		return "", cacheDirErr
	}
	return cacheDir, nil
}

// Put writes the provided data to a file in the SOPS cache directory.
// At present, no validation is performed on the file name, and it is the
// caller's responsibility to ensure that the file name is safe (i.e., does not
// contain path traversal characters or other unsafe elements).
func Put(fileName string, data []byte) error {
	dir, err := Directory()
	if err != nil {
		return err
	}

	filePath := filepath.Join(dir, fileName)
	if err = os.WriteFile(filePath, data, 0o600); err != nil {
		return fmt.Errorf("failed to write to cache file %s: %w", filePath, err)
	}
	return nil
}

// Get reads the contents of a file from the SOPS cache directory.
// If the file does not exist or cannot be read, an error will be returned.
func Get(fileName string) ([]byte, error) {
	dir, err := Directory()
	if err != nil {
		return nil, err
	}

	filePath := filepath.Join(dir, fileName)
	data, err := os.ReadFile(filePath)
	if err != nil {
		return nil, fmt.Errorf("failed to read cache file %s: %w", filePath, err)
	}
	return data, nil
}

// Exists checks if a file exists in the SOPS cache directory.
func Exists(fileName string) (bool, error) {
	dir, err := Directory()
	if err != nil {
		return false, err
	}

	filePath := filepath.Join(dir, fileName)
	if _, err = os.Stat(filePath); err != nil {
		if os.IsNotExist(err) {
			return false, nil
		}
		return false, err
	}
	return true, nil
}

Comment on lines 9 to +18
"context"
"encoding/base64"
"fmt"
"path/filepath"
"regexp"
"strings"
"time"

"encoding/json"
"os"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These imports aren't properly ordered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants