- 
                Notifications
    
You must be signed in to change notification settings  - Fork 263
 
Enable the attestation key generation and registration #2147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
In order to support new clevis pin, either they need to be added each time in the hardcoded list of pins or ignition can allow any name for the pin. This is required in order to enable the clevis trustee pin used for confidential clusters. Signed-off-by: Alice Frosi <[email protected]>
The attestation includes the fields necessary to attest the machine. For example, in confidential clusters. The registration of the attestation key enables to extract the AK of the TPM and register it before using it to sign the TPM quote. Signed-off-by: Alice Frosi <[email protected]>
Generate and persistent the AK in the TPM if the key registration is request in the config. The public AK is then registered at the url specified in the config. The AK is generate only once and if the /var/tpm/ak.pub exists then it means that this step has already been performed and it is skipped of the next stages. The registration of the AK required networking hence, during the fetch-offline stage it signals that the networking is necessary. The retries mechanism ensures that the registration is tried multiple times for allowing the parallel network configuration in the fetch phase. Signed-off-by: Alice Frosi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces attestation key generation and registration functionality for confidential clusters using raw TPMs. It includes changes to config translation, clevis validation, schema definitions, and the addition of attestation handling logic. The code introduces a new Attestation type to the configuration schema, and implements the logic to generate, persist, and register the attestation key with a remote service. The review focuses on error handling, code clarity, and potential security considerations.
| 
               | 
          ||
| type Registration struct { | ||
| Url *string `json:"url,omitempty"` | ||
| Certificate *string `json:"certificat,omitempty"` | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if util.NilOrEmpty(cu.Pin) { | ||
| r.AddOnError(c.Append("pin"), errors.ErrClevisPinRequired) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If cu.Pin is nil or empty, the validation logic adds an error indicating that a Clevis pin is required. However, the code does not validate the value of cu.Pin against a list of known pins, potentially leading to runtime errors if an invalid pin is provided. It's better to validate the pin against known values to provide early feedback to the user.
Consider adding a validation step to ensure that the pin value is one of the supported types (e.g., "tpm2", "tang", "sss").
| if _, err := os.Stat(AKRegisterd); err == nil { | ||
| return nil | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if !util.NilOrEmpty(registration.Certificate) { | ||
| tlsConfig, err := createTLSConfig(*registration.Certificate) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to create TLS config: %w", err) | ||
| } | ||
| 
               | 
          ||
| client.Transport = &http.Transport{ | ||
| TLSClientConfig: tlsConfig, | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code creates a TLS configuration only if registration.Certificate is not nil or empty. However, it does not handle the case where the certificate string is invalid or malformed. This could lead to a crash or unexpected behavior at runtime.
Consider adding error handling to check if the certificate is valid and return an error if it is not.
| if resp.StatusCode < 200 || resp.StatusCode >= 300 { | ||
| return fmt.Errorf("registration failed with status code: %d", resp.StatusCode) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code checks if the response status code is outside the range of 200-299. However, it does not provide any context or information about the specific status code. This makes it difficult to debug registration failures.
Consider logging the status code and response body to provide more context for debugging.
| if err := os.WriteFile(AKRegisterd, []byte{}, 0644); err != nil { | ||
| return fmt.Errorf("failed to create AK registered file: %w", err) | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code creates an empty file AKRegisterd to indicate successful registration. However, it does not handle potential errors during file creation. This could lead to a false positive if the file creation fails.
Consider adding error handling to check if the file creation was successful and return an error if it was not.
| func createTLSConfig(certPEM string) (*tls.Config, error) { | ||
| block, _ := pem.Decode([]byte(certPEM)) | ||
| if block == nil { | ||
| return nil, fmt.Errorf("failed to decode PEM certificate") | ||
| } | ||
| 
               | 
          ||
| cert, err := x509.ParseCertificate(block.Bytes) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("failed to parse certificate: %w", err) | ||
| } | ||
| 
               | 
          ||
| certPool := x509.NewCertPool() | ||
| certPool.AddCert(cert) | ||
| 
               | 
          ||
| return &tls.Config{ | ||
| RootCAs: certPool, | ||
| }, nil | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if err := attestation.HandleAttestation(e.Logger, &cfg, e.PlatformConfig.Name(), e.NeedNet); err != nil { | ||
| if err == resource.ErrNeedNet && stageName == "fetch-offline" { | ||
| err = e.signalNeedNet() | ||
| if err != nil { | ||
| e.Logger.Crit("failed to signal neednet: %v", err) | ||
| } | ||
| return cfg, resource.ErrNeedNet | ||
| } | ||
| return types.Config{}, err | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code calls HandleAttestation and checks for resource.ErrNeedNet specifically in the fetch-offline stage. If this error occurs in other stages, it will not be handled in the same way, potentially leading to unexpected behavior or failures. It's important to ensure that network-related errors are handled consistently across all stages.
Consider handling resource.ErrNeedNet in a more generic way, or ensuring that signalNeedNet is called in all relevant stages.
| 
           👋  hey @alicefr! I wanted to ask, so firstly  Second I am a bit worried about the impact this could have on all distros we are assuming these constants const (
	TPMDir      = "/var/tpm"
	AKPath      = "/var/tpm/ak.pub"
	AKCtxPath   = "/var/tpm/ak.ctx"
	AKRegisterd = "/var/tpm/ak.registerd"
	AKHandle    = "0x81010002"
	EKHandle    = "0x81010001"
)Would the TPM directory always be mounted at /var/tpm? and are teh AK / EK handles standard. If not it might make sense to expose those configs as configs not constants, and let sugar take the burden (butane) that way ignition does not work differently on some distros?  | 
    
          
 Right now, in our operator we release the ignition configuration with the clevis pin configuration without protection with a merge derective. The idea would be that we would release those only after a valid attestation. An attestation produce an attestation token that can be used to release resources, and in that can it will be the ignition configuration for the clevis pin. That's why I have tried to generate the AK after the fetch of the provisioner config but before all the merge directive. The flow will be: 
 
 I tried to the keep the configuration changes as minimal as possible. But if you think, they can be configurable, I can add them.  | 
    
          
 Im sorry for this if its a silly question and I am just missing it. I can see step 1, step 2 in the changes, could you help me by pointing me to where steps 3, 4, and 5 are handled? 
 Okay yes! if thats the case we can def reduce the amount of config. I was incorrectly under the impression that the location could change but since its us handling the folder it does not matter. Lets keep the constants.  | 
    
          
 We haven't implemented that yet, sorry if it wasn't clear. Maybe it will be better if I write a design document for the 2 phase attestations so we can reference it here and the entire flow becomes clearer  | 
    
| 
           @travier @prestist what worries me more is that the information passed to register the AK isn't enough. Right now, I'm simply passing the platform, but this isn't enough for getting the endorsement key of the VM instance. From the google cloud docu, the EK can be fetched with a get request: but we need the   | 
    
| 
           @prestist this PR drafts confidential-clusters/cocl-operator#68 the idea of having 2 attestation phase for protecting the ignition fetch directive with the clevis pin  | 
    
In confidential clusters, using raw TPMs support, we need to register the attestation key in Trustee in order to be able to verify the attestation quote. The key is trusted at first use (TOFU), and the registration point passed with ignition allows the operator to get the key and configure trustee properly.
The AK registration is done before any merge/replace directives because in a second step we want to be able to protect the ignition config with an attestation phase. In this way, the key can be registered before any fetching.
We assume that only one AK is used and can be registered for the attestation per system.
This PR includes the commit from #2145 because it has been tested with the clevis pin for Trustee
Example:
Design document including the AK registration for the confidential cluster operator