Skip to content

feat: Hive listener integration #605

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 44 commits into
base: main
Choose a base branch
from

Conversation

Maleware
Copy link
Member

@Maleware Maleware commented May 27, 2025

Description

Adds listener Support

Definition of Done Checklist

  • Not all of these items are applicable to all PRs, the author should update this template to only leave the boxes in that are relevant
  • Please make sure all these things are done and tick the boxes

Author

  • Changes are OpenShift compatible
  • CRD changes approved
  • CRD documentation for all fields, following the style guide.
  • Helm chart can be installed and deployed operator works
  • Integration tests passed (for non trivial changes)
  • Changes need to be "offline" compatible

Reviewer

  • Code contains useful comments
  • Code contains useful logging statements
  • (Integration-)Test cases added
  • Documentation added or updated. Follows the style guide.
  • Changelog updated
  • Cargo.toml only contains references to git tags (not specific commits or branches)

Acceptance

  • Feature Tracker has been updated
  • Proper release label has been added
  • Roadmap has been updated

@Maleware
Copy link
Member Author

=== NAME  kuttl
    harness.go:403: run tests finished
    harness.go:510: cleaning up
    harness.go:567: removing temp folder: ""
--- PASS: kuttl (2381.48s)
    --- PASS: kuttl/harness (0.00s)
        --- PASS: kuttl/harness/smoke_postgres-12.5.6_hive-3.1.3_openshift-false_s3-use-tls-false (109.29s)
        --- PASS: kuttl/harness/resources_hive-4.0.1_openshift-false (25.30s)
        --- PASS: kuttl/harness/kerberos-hdfs_postgres-12.5.6_hive-4.0.1_hdfs-latest-3.4.1_zookeeper-latest-3.9.3_krb5-1.21.1_openshift-false_kerberos-realm-PROD.MYCORP_kerberos-backend-mit (174.61s)
        --- PASS: kuttl/harness/kerberos-hdfs_postgres-12.5.6_hive-4.0.0_hdfs-latest-3.4.1_zookeeper-latest-3.9.3_krb5-1.21.1_openshift-false_kerberos-realm-PROD.MYCORP_kerberos-backend-mit (173.43s)
        --- PASS: kuttl/harness/kerberos-hdfs_postgres-12.5.6_hive-3.1.3_hdfs-latest-3.4.1_zookeeper-latest-3.9.3_krb5-1.21.1_openshift-false_kerberos-realm-PROD.MYCORP_kerberos-backend-mit (191.31s)
        --- PASS: kuttl/harness/kerberos-s3_postgres-12.5.6_hive-4.0.1_krb5-1.21.1_openshift-false_s3-use-tls-true_kerberos-realm-PROD.MYCORP_kerberos-backend-mit (109.84s)
        --- PASS: kuttl/harness/kerberos-s3_postgres-12.5.6_hive-4.0.1_krb5-1.21.1_openshift-false_s3-use-tls-false_kerberos-realm-PROD.MYCORP_kerberos-backend-mit (107.09s)
        --- PASS: kuttl/harness/kerberos-s3_postgres-12.5.6_hive-4.0.0_krb5-1.21.1_openshift-false_s3-use-tls-true_kerberos-realm-PROD.MYCORP_kerberos-backend-mit (132.82s)
        --- PASS: kuttl/harness/kerberos-s3_postgres-12.5.6_hive-4.0.0_krb5-1.21.1_openshift-false_s3-use-tls-false_kerberos-realm-PROD.MYCORP_kerberos-backend-mit (110.03s)
        --- PASS: kuttl/harness/kerberos-s3_postgres-12.5.6_hive-3.1.3_krb5-1.21.1_openshift-false_s3-use-tls-true_kerberos-realm-PROD.MYCORP_kerberos-backend-mit (119.68s)
        --- PASS: kuttl/harness/kerberos-s3_postgres-12.5.6_hive-3.1.3_krb5-1.21.1_openshift-false_s3-use-tls-false_kerberos-realm-PROD.MYCORP_kerberos-backend-mit (118.54s)
        --- PASS: kuttl/harness/upgrade_postgres-12.5.6_hive-old-3.1.3_hive-new-4.0.1_openshift-false (70.49s)
        --- PASS: kuttl/harness/cluster-operation_hive-latest-4.0.1_openshift-false (61.16s)
        --- PASS: kuttl/harness/logging_postgres-12.5.6_hive-4.0.0_openshift-false (79.92s)
        --- PASS: kuttl/harness/resources_hive-4.0.0_openshift-false (26.53s)
        --- PASS: kuttl/harness/resources_hive-3.1.3_openshift-false (26.19s)
        --- PASS: kuttl/harness/external-access_hive-latest-4.0.1_openshift-false (36.33s)
        --- PASS: kuttl/harness/orphaned-resources_hive-latest-4.0.1_openshift-false (37.07s)
        --- PASS: kuttl/harness/logging_postgres-12.5.6_hive-4.0.1_openshift-false (78.39s)
        --- PASS: kuttl/harness/smoke_postgres-12.5.6_hive-4.0.1_openshift-false_s3-use-tls-false (102.36s)
        --- PASS: kuttl/harness/logging_postgres-12.5.6_hive-3.1.3_openshift-false (87.95s)
        --- PASS: kuttl/harness/smoke_postgres-12.5.6_hive-4.0.1_openshift-false_s3-use-tls-true (96.98s)
        --- PASS: kuttl/harness/smoke_postgres-12.5.6_hive-4.0.0_openshift-false_s3-use-tls-false (102.11s)
        --- PASS: kuttl/harness/smoke_postgres-12.5.6_hive-4.0.0_openshift-false_s3-use-tls-true (100.83s)
        --- PASS: kuttl/harness/smoke_postgres-12.5.6_hive-3.1.3_openshift-false_s3-use-tls-true (103.19s)
PASS

@Maleware Maleware marked this pull request as ready for review May 28, 2025 13:22
@Maleware
Copy link
Member Author

Maleware commented Jun 4, 2025

I might have run into stackabletech/hdfs-operator#686 during development.

I recognized that I can have an empty string in my discovery configMap from time to time. Behaviour appears to be flaky, but yet more often then not the emtpy string appears:

Expected

apiVersion: v1
data:
  HIVE: thrift://hive-postgres-s3-metastore-default.default.svc.cluster.local:9083
kind: ConfigMap
metadata:

Flaky faulty one

apiVersion: v1
data:
  HIVE: ""
kind: ConfigMap
metadata:

@lfrancke lfrancke moved this to Development: In Progress in Stackable Engineering Jun 4, 2025
@Maleware Maleware changed the title WIP: Listener integration feat: Hive listener integration Jun 5, 2025
@Maleware
Copy link
Member Author

Maleware commented Jun 5, 2025

🟢

=== NAME  kuttl
    harness.go:403: run tests finished
    harness.go:510: cleaning up
    harness.go:567: removing temp folder: ""
--- PASS: kuttl (38.21s)
    --- PASS: kuttl/harness (0.00s)
        --- PASS: kuttl/harness/external-access_hive-latest-4.0.1_openshift-false (38.17s)
PASS

@Maleware Maleware moved this from Development: In Progress to Development: Waiting for Review in Stackable Engineering Jun 5, 2025
@maltesander maltesander linked an issue Jun 11, 2025 that may be closed by this pull request
Comment on lines 135 to 140
listener_ref: Listener,
rolegroup: &String,
chroot: Option<&str>,
) -> Result<String, Error> {
// We only need the first address corresponding to the rolegroup
let listener_address = listener_ref
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be called listener, as it isn't a ref.

Suggested change
listener_ref: Listener,
rolegroup: &String,
chroot: Option<&str>,
) -> Result<String, Error> {
// We only need the first address corresponding to the rolegroup
let listener_address = listener_ref
listener: Listener,
rolegroup: &String,
chroot: Option<&str>,
) -> Result<String, Error> {
// We only need the first address corresponding to the rolegroup
let listener_address = listener

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is gone as not needed anymore

@@ -448,6 +417,10 @@ pub struct MetaStoreConfig {
/// Time period Pods have to gracefully shut down, e.g. `30m`, `1h` or `2d`. Consult the operator documentation for details.
#[fragment_attrs(serde(default))]
pub graceful_shutdown_timeout: Option<Duration>,

/// This field controls which [ListenerClass](DOCS_BASE_URL_PLACEHOLDER/listener-operator/listenerclass.html) is used to expose the webserver.
#[serde(default)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All other fields use fragment_attr, is that needed here too?

Suggested change
#[serde(default)]
#[fragment_attr(serde(default))]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do we want the String::default() to be called? Doesn't it have to be a valid class, or it falls back to cluster-internal?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also outdated I think.

Yes, now it falls back to cluster-internal if listenerClass is not given.

hive,
&resolved_product_image.app_version_label,
&HiveRole::MetaStore.to_string(),
"discovery",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this cause a collision if there are multiple role groups defined?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also outdated as it's on role level now

@Maleware Maleware moved this from Development: In Review to Development: In Progress in Stackable Engineering Jun 18, 2025
Comment on lines 508 to 530
// Init listener struct. Collect listener after applied to cluster_resources
// to use listener object in later created discovery configMap
let mut listener = Listener::new("name", ListenerSpec::default());
let role_config = hive.role_config(&hive_role);
if let Some(GenericRoleConfig {
pod_disruption_budget: pdb,
if let Some(HiveMetastoreRoleConfig {
common: GenericRoleConfig {
pod_disruption_budget: pdb,
},
listener_class,
}) = role_config
{
add_pdbs(pdb, hive, &hive_role, client, &mut cluster_resources)
.await
.context(FailedToCreatePdbSnafu)?;

let group_listener: Listener =
build_group_listener(hive, &resolved_product_image, &hive_role, listener_class)?;
listener = cluster_resources
.add(client, group_listener)
.await
.with_context(|_| ApplyGroupListenerSnafu {
role: hive_role.to_string(),
})?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would split the pdbs and listener stuff like so:

    let role_config = hive.role_config(&hive_role);
    if let Some(HiveMetastoreRoleConfig {
        common: GenericRoleConfig {
            pod_disruption_budget: pdb,
        },
        ..
    }) = role_config
    {
        add_pdbs(pdb, hive, &hive_role, client, &mut cluster_resources)
            .await
            .context(FailedToCreatePdbSnafu)?;
    }

    // std's SipHasher is deprecated, and DefaultHasher is unstable across Rust releases.
    // We don't /need/ stability, but it's still nice to avoid spurious changes where possible.
    let mut discovery_hash = FnvHasher::with_key(0);
    if let Some(HiveMetastoreRoleConfig { listener_class, .. }) = role_config {
        let group_listener: Listener =
            build_group_listener(hive, &resolved_product_image, &hive_role, listener_class)?;
        let listener = cluster_resources
            .add(client, group_listener)
            .await
            .context(ApplyGroupListenerSnafu {
                role: hive_role.to_string(),
            })?;

        for discovery_cm in discovery::build_discovery_configmaps(
            hive,
            hive,
            hive_role,
            &resolved_product_image,
            None,
            listener,
        )
        .await
        .context(BuildDiscoveryConfigSnafu)?
        {
            let discovery_cm = cluster_resources
                .add(client, discovery_cm)
                .await
                .context(ApplyDiscoveryConfigSnafu)?;
            if let Some(generation) = discovery_cm.metadata.resource_version {
                discovery_hash.write(generation.as_bytes())
            }
        }
    }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done with aa56092

Comment on lines 990 to 991
let recommended_object_labels: ObjectLabels<'_, v1alpha1::HiveCluster> =
build_recommended_labels(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why type that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overlooked it. Probably IDE autocompletion whenever you double click the type :(

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done with cae1450

obj_ref: ObjectRef<Service>,
},
#[snafu(display("could not find port [{port_name}] for rolegroup listener {role}"))]
NoServicePort { port_name: String, role: String },
#[snafu(display("service [{obj_ref}] port [{port_name}] does not have a nodePort "))]
NoNodePort {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NoNodePort, FindEndpoints, InvalidNodePort etc. unused

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done with aa56092

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Development: In Progress
Development

Successfully merging this pull request may close these issues.

Integrate Hive Operator with Listener Operator
3 participants