Apache Kudu 1.17.0
Upgrade Notes
-
TLSv1.2 is the minimum TLS protocol version that newer Kudu clients are able to use for secure Kudu RPC. The newer clients are not able to communicate with servers built and run with OpenSSL of versions prior to 1.0.1. If such a Kudu cluster is running on a deprecated OS versions (e.g., RHEL/CentOS 6.4), the following options are available to work around the incompatibility:
- use Kudu clients of 1.14 or earlier versions to communicate with such cluster
- disable RPC encryption and authentication for Kudu RPC, setting
--rpc_authentication=disabledand--rpc_encryption=disabledfor all masters and tablet servers in the cluster to allow the new client to work with the old cluster
-
TLSv1.2 is the minimum TLS protocol version that newer Kudu servers are able to use for secure Kudu RPC. The newer servers are not able to communicate using secure Kudu RPC with Kudu C++ client applications linked with libkudu_client library built against OpenSSL of versions prior to 1.0.1 or with Java client applications run with outdated Java runtime that doesn't support TLSv1.2. The following options are available to work around this incompatibility:
- customize settings for the
--rpc_tls_min_protocoland--rpc_tls_ciphersflags on all masters and tablet servers in the cluster, setting--rpc_tls_min_protocol=TLSv1and adding TLSv1-capable cipher suites (e.g. AES128-SHA and AES256-SHA) into the list - disable RPC encryption and authentication for Kudu RPC, setting
--rpc_authentication=disabledand--rpc_encryption=disabledfor all masters and tablet servers in the cluster to allow such Kudu clients to work with newer clusters
- customize settings for the
Obsoletions
Deprecations
Support for Python 2.x and Python 3.4 and earlier is deprecated and may be removed in the next minor release.
New features
-
Kudu now supports encrypting data at rest. Kudu supports
AES-128-CTR,AES-192-CTR, andAES-256-CTRciphers to encrypt data, supports Apache Ranger KMS and Apache Hadoop KMS. See Data at rest for more details. -
Kudu now supports range-specific hash schemas for tables. It's now possible to add ranges with their own unique hash schema independent of the table-wide hash schema. This can be done at table creation time and while altering the table. It’s controlled by the
--enable_per_range_hash_schemasmaster flag which is enabled by default (see KUDU-2671). -
Kudu now supports soft-deleted tables. Kudu keeps a soft-deleted table aside for a period of time (a.k.a. reservation), not purging the data yet. The table can be restored/recalled back before its reservation expires. The reservation period can be customized via Kudu client API upon soft-deleting the table. The default reservation period is controlled by the
--default_deleted_table_reserve_secondsmaster's flag. NOTE: As of Kudu 1.17 release, the soft-delete functionality is not supported when HMS integration is enabled, but this should be addressed in a future release (see KUDU-3326). -
Introduced
Auto-Incrementingcolumn. An auto-incrementing column is populated on the server side with a monotonically increasing counter. The counter is local to every tablet, i.e. each tablet has a separate auto incrementing counter (see KUDU-1945). -
Kudu now supports experimental non-unique primary key. When a table with non-unique primary key is created, an
Auto-Incrementingcolumn namedauto_incrementing_idis added automatically to the table as the key column. The non-unique key columns and theAuto-Incrementingcolumn together form the effective primary key (see KUDU-1945). -
Introduced
Immutablecolumn. It's useful to represent a semantically constant entity (see KUDU-3353). -
An experimental feature is added to Kudu that allows it to automatically rebalance tablet leader replicas among tablet servers. The background task can be enabled by setting the
--auto_leader_rebalancing_enabledflag on the Kudu masters. By default, the flag is set to 'false' (see KUDU-3390). -
Introduced an experimental feature: authentication of Kudu client applications to Kudu servers using JSON Web Tokens (JWT). The JWT-based authentication can be used as an alternative to Kerberos authentication for Kudu applications running at edge nodes where configuring Kerberos might be cumbersome. Similar to Kerberos credentials, a JWT is considered a primary client's credentials. The server-side capability of JWT-based authentication is controlled by the
--enable_jwt_token_authflag (set 'false' by default). When the flat set to 'true', a Kudu server is capable of authenticating Kudu clients using the JWT provided by the client during RPC connection negotiation. From its side, a Kudu client authenticates a Kudu server by verifying its TLS certificate. For the latter to succeed, the client should use Kudu client API to add the cluster's IPKI CA certificate into the list of trusted certificates. -
The C++ client scan token builder can now create multiple tokens per tablet. So, it's now possible to dynamically scale the set of readers/scanners fetching data from a Kudu table in parallel. To use this functionality, use the newly introduced
SetSplitSizeBytes()method of the Kudu client API to specify how many bytes of data each token should scan (see KUDU-3393). -
Kudu's default replica placement algorithm is now range and table aware to prevent hotspotting unlike the old power of two choices algorithm. New replicas from the same range are spread evenly across available tablet servers, the table the range belongs to is used as a tiebreaker (see KUDU-3476).
-
Statistics on various write operations is now available via Kudu client API at the session level (see KUDU-3351, KUDU-3365).
-
Kudu now exposes all its metrics except for string gauges in Prometheus format via the embedded webserver's
/metrics_prometheusendpoint (see KUDU-3375). -
It’s now possible to deploy Kudu clusters in an internal network (e.g. in K8S environment) and avoid internal traffic (i.e. tservers and masters) using advertised addresses and allow Kudu clients running in external networks. This can be achieved by customizing the setting for the newly introduced
--rpc_proxy_advertised_addressesand--rpc_proxied_addressesserver flags. This might be useful in various scenarios where Kudu cluster is running in an internal network behind a firewall, but Kudu clients are running at the other side of the firewall using JWT to authenticate to Kudu servers, and the RPC traffic between to the Kudu cluster is forwarded through a TCP/SOCKS proxy (see KUDU-3357). -
It’s now possible to clean up metadata for deleted tables/tablets from Kudu master's in-memory map and the
sys.catalogtable. This is useful in reducing the memory consumption and bootstrap time for masters. This can be achieved by customizing the setting for the newly introduced--enable_metadata_cleanup_for_deleted_tables_and_tabletsand--metadata_for_deleted_table_and_tablet_reserved_secskudu-master’s flags. -
It’s now possible to perform range rebalancing for a single table per run in the
kudu cluster rebalanceCLI tool by setting the newly introduced--enable_range_rebalancingtool flag. This is useful to address various hot-spotting issues when too many tablet replicas from the same range (but different hash buckets) were placed at the same tablet server. The hot-spotting issue in tablet replica placement should be address in a follow-up releases, see KUDU-3476 for details. -
It’s now possible to compact log container metadata files at runtime. This is useful in reclaiming the disk space once the container becomes full. This feature can be turned on/off by customizing the setting for the newly introduced
--log_container_metadata_runtime_compactkudu-tserver flag (see KUDU-3318). -
New CLI tools
kudu master/tserver set_flag_for_allare added to update flags for all masters and tablet servers in a Kudu cluster at once. -
A new CLI tool
kudu local_replica copy_from_localis added to copy tablet replicas' data at the filesystem level. It can be used when adding disks and for quick rebalancing of data between disks, or can be used when migrating data from one data directory to the other. It will make data more dense than data on old data directories too. -
A new CLI tool
kudu diagnose parse_metricsis added to parse metrics out of diagnostic logs (see KUDU-2353). -
A new CLI tool
kudu local_replica tmeta delete_rowsetsis added to delete rowsets from the tablet. -
A sanity check has been added to detect wall clock jumps, it is controlled by the newly introduced
--wall_clock_jump_detectionand--wall_clock_jump_threshold_secflags. That should help to address issues reported in KUDU-2906.
Optimizations and improvements
-
Reduce the memory consumption if there are frequent alter schema operations for tablet servers (see KUDU-3197).
-
Reduce the memory consumption by implementing memory budgeting for performing RowSet merge compactions (i.e. CompactRowSetsOp maintenance operations). Several flags have been introduced, while the
--rowset_compaction_memory_estimate_enabledflag indicates whether to check for available memory necessary to run CompactRowSetsOp maintenance operations (see KUDU-3406). -
Optimized evaluating in-list predicates based on RowSet PK bounds. A tablet server can now effectively skip rows when the predicate is on a non-prefix part of the primary key and the leading columns' cardinality is 1 (see KUDU-1644).
-
Speed up CLI tool
kudu cluster rebalanceto run intra-location rebalancing in parallel for location-aware Kudu cluster. Theoretically, running intra-location rebalancing in parallel might shorten the runtime by N times compared with running sequentially, where N is the number of locations in a Kudu cluster. This can be achieved by customizing the setting for the newly introduced--intra_location_rebalancing_concurrencyflag. -
Two new flags
--show_tablet_partition_infoand--show_hash_partition_infohave been introduced for thekudu table listCLI tool to show the corresponding relationship between partitions and tablet ids, and it's possible to specify the output format by specifying
--list_table_output_formatflag. -
A new flag
--create_table_replication_factorhas been introduced for thekudu table copyCLI tool to specify the replication factor for the destination table. -
A new flag
--create_table_hash_bucket_numshas been introduced for thekudu table copyCLI tool to specify the number of hash buckets in each hash dimension for the destination table. -
A new flag
--tableshas been introduced for thekudu master unsafe_rebuildCLI tool to rebuild the metadata of specified tables on Kudu master, and it has no effect on the other tables. -
A new flag
--fault_toleranthas been introduced for thekudu table copy/scanandkudu perf table_scanCLI tool to make the scanner fault-tolerant and the results returned in primary key order per-tablet. -
A new flag
--show_column_commenthas been introduced for thekudu table describeCLI tool to show column comments. -
A new flag
--current_leader_uuidhas been introduced for thekudu tablet leader_step_downCLI tool to conveniently step down leader replica using a given UUID. -
A new flag
--use_readable_formathas been introduced for thekudu local_replica dump rowsetCLI tool to indicate whether to dump the primary key in human readable format. Besides, another flag--dump_primary_key_bounds_onlyhas been introduced to this tool to indicate whether to dump rowset primary key bounds only. -
A new flag
--tableshas been introduced for thekudu local_replica deleteCLI tool to conveniently delete multiple tablets by table name. -
It’s now possible to specify
ownerandcommentfields when using thekudu table createCLI tool to create tables. -
It’s now possible to use the
kudu local_replica copy_from_remoteCLI tool to copy tablets in a batch. -
It’s now possible to enable or disable auto rebalancer by setting
--auto_rebalancing_enabledflag to Kudu master at runtime. -
It’s now possible for
kudu tserver/master get_flagsCLI tool to filter flags even if the server side doesn’t support flags filter function (the latter is for Kudu servers of releases prior to 1.12). -
Added a CSP (Content Security Policy) header to prevent security scanners flagging Kudu's web UI as vulnerable.
-
A separated section has been introduced to include all non-default flags specially on path
/varzof Kudu's web UI. -
A separated section has been introduced to show slow scans on path
/scansof Kudu's web UI, it can be enabled by tweaking the--show_slow_scansflag for tablet servers. A scan is called 'slow' if it takes more time than defined by--slow_scanner_threshold_ms. -
A new
Data retainedcolumn has been introduced to theNon-running operationssection to indicate the approximate amount of disk space that would be freed on path/maintenance-managerof Kudu's web UI. -
The default value of tablet history retention time (controlled by
--tablet_history_max_age_secflag) on Kudu master has been reduced from 7 days to 5 minutes. It's not necessary to keep such a long history of the system tablet since masters always scan data at the latest available snapshot. -
Kudu can now be built and run on Apple M chips and macOS 11, 12. As with prior releases, Kudu's support for macOS is experimental, and should only be used for development.
Fixed Issues
-
Fixed an issue where historical MVCC data older than the ancient history mark (configured by
--tablet_history_max_age_sec) that had only DELETE operations wouldn't be compacted correctly. As a result, the ancient history data could not be GCed if the tablet had been created by Kudu servers of releases prior to 1.10 (those versions did not support live row counting) (see KUDU-3367). -
Fixed an issue where the Kudu server could potentially crash on malicious negotiation attempts.
-
Fixed a bug when a Kudu tablet server started under an OS account that had no permission to access tablet metadata files would stuck in the tablet bootstrapping phase (see KUDU-3419).
-
Fixed a bug in the C++ client where toggling
SetFaultTolerant(false)would not work. -
Fixed a bug in the C++ client where toggling
KuduScanner::SetSelection()would not work. -
Fixed a bug in the Java client where under certain conditions same rows would be returned multiple times even if the scanner was configured to be fault-tolerant.
-
Fixed a bug in the Java client where the last propagated timestamp and resource metrics would not be updated in subsequent scan responses.
-
Fixed a bug in the Java client where it would not invalidate stale locations of the leader master.
-
Fixed a bug in the Kudu HMS client that was causing failures when scanning Kudu tables from Hive (see KUDU-3401).
-
Fixed a bug where the
kudu table copyCLI tool would fail copying an unpartitioned table. -
Fixed a bug where the
kudu master unsafe_rebuildCLI tool would rebuild the system catalog with outdated schemas of tables that were unhealthy during the rebuild process. -
Fixed a bug where
kudu table copyfailed to copy tables that had STRING, BINARY or VARCHAR type of columns in their range keys (see KUDU-3306). -
Fixed a bug of the
kudu table copyCLI tool crashing if encountering an error while copying rows to the destination table. The tool now exits gracefully and provides additional information for troubleshooting in such a condition. -
Fixed a bug where the
kudu local_replica listCLI tool would crash if the--list_detailflag was enabled. -
Fixed a bug when a sub-process running Ranger client would crash when receiving a oversized message from Kudu master. With the fix, each peer communicating via the Subprocess protocol now discards an oversized message, logs about the issue, and clears the channel, and is able to receive further messages after encountering such a condition.
-
Fixed a bug when a Kudu application linked with kudu_client library would crash with SIGILL if running on a machine lacking SSE4.2 support (see KUDU-3248).
-
Fixed a bug where the subprocess crashes in case of receiving large messages from the Kudu master when the pipe gets full to transport the entire message in one go or when there is a delay in sending from the master (see KUDU-3489).
Wire Protocol compatibility
Kudu 1.17.0 is wire-compatible with previous versions of Kudu:
- Kudu 1.17 clients may connect to servers running Kudu 1.0 or later. If the client uses features that are not available on the target server, an error will be returned.
- Rolling upgrade between Kudu 1.16 and Kudu 1.17 servers is believed to be possible though has not been sufficiently tested. Users are encouraged to shut down all nodes in the cluster, upgrade the software, and then restart the daemons on the new version.
- Kudu 1.0 clients may connect to servers running Kudu 1.17 with the exception of the below-mentioned restrictions regarding secure clusters.
The authentication features introduced in Kudu 1.3 place the following limitations on wire compatibility between Kudu 1.17 and versions earlier than 1.3:
- If a Kudu 1.17 cluster is configured with authentication or encryption set to "required", clients older than Kudu 1.3 will be unable to connect.
- If a Kudu 1.17 cluster is configured with authentication and encryption set to "optional" or "disabled", older clients will still be able to connect.
Incompatible Changes in Kudu 1.17.0
Client Library Compatibility
-
The Kudu 1.17 Java client library is API- and ABI-compatible with Kudu 1.16. Applications written against Kudu 1.16 will compile and run against the Kudu 1.17 client library. Applications written against Kudu 1.17 will compile and run against the Kudu 1.16 client library unless they use the API newly introduced in Kudu 1.17.
-
The Kudu 1.17 {cpp} client is API- and ABI-forward-compatible with Kudu 1.16. Applications written and compiled against the Kudu 1.16 client library will run without modification against the Kudu 1.17 client library. Applications written and compiled against the Kudu 1.17 client library will run without modification against the Kudu 1.16 client library unless they use the API newly introduced in Kudu 1.17.
-
The Kudu 1.17 Python client is API-compatible with Kudu 1.16. Applications written against Kudu 1.16 will continue to run against the Kudu 1.17 client and vice-versa.
Known Issues and Limitations
Please refer to the Known Issues and Limitations section of the documentation.
Contributors
Kudu 1.17.0 includes contributions from 26 people, including 12 first-time contributors:
- Ashwani Raina
- Hari Reddy
- Kurt Deschler
- Marton Greber
- Song Jiacheng
- Zoltan Martonka
- bsglz
- mammadli.khazar
- wzhou-code
- xinghuayu007
- xlwh
- Ádám Bakai