Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Package Request] - zlib-ng-compat and all related zlib-ng enhancements for core OS, from Fedora 40/CentOS 10/Redhat 10 #872

Open
plasticity-cloud opened this issue Jan 4, 2025 · 7 comments
Labels
enhancement New feature or request performance Performance related issues

Comments

@plasticity-cloud
Copy link

plasticity-cloud commented Jan 4, 2025

zlib-ng-compat

obsoleting core zlib and providing zlib-ng (with CloudFlare and Intel optimizations)?

package is available in EPEL, Fedora 9

https://packages.fedoraproject.org/pkgs/zlib-ng/zlib-ng/epel-9.html

from speeding up loading kernel, to amazon corretto support for gzip and python and Amazon EMR via libhadoop, to accelerating MySQL InnoDB with gzip support

Hi Team
We are currently testing and trying to backport core Fedora 40/42,
that by default ship with operating system. that use zlib-ng and zlib-ng-compat,

will be sharing the build of rpm for AL2003.

We noticed significant reduction of CPU usage and speed up in processing of gzip content, especially when using directly zstd binary to decompress gzip content to disk,
2 times faster than standard Gzip support in JDK,
Gzip support in JDK with zlib-ng-compat is 2 times faster than standard Gzip in JDK.

We believe this will be a game changer for Amazon Linux 2023 users,
considering it would become mainstream OS for EKS Hybrid on premises
and also for core AWS services, like Lambda, Fargate, Aurora and ECS.

Same applies to S3 clients and SOCI snapshotter for containerd, compiled with CGO,

Amazon Corretto is benefiting from it automatically,
as one of the rare official JDK distributions it is sourcing zlib from operating system.
tested with Corretto JDK 21 on Fedora 40/42.

Official documents from Fedora and zlib-ng:

https://fedoraproject.org/wiki/Changes/ZlibNGTransition

https://github.com/zlib-ng/zlib-ng/blob/develop/PORTING.md

@plasticity-cloud
Copy link
Author

plasticity-cloud commented Jan 4, 2025

Tested with official gz files, public datasets

https://dumps.wikimedia.org/commonswiki/latest/

Amazon Corretto 21

Code for test tool, requires at least java 17 to compile through maven

https://github.com/plasticity-cloud/aws-next-gen/tree/main/al2023/zlib-ng-testing/zlibNextGen

Each test was executed

  1. on fresh VM, to avoid situations with disk caching:

  2. on same VM, with pruning all caches, to make tests reliable:

sync; echo 3 > /proc/sys/vm/drop_caches

standard zlib

time java -jar target/zlibNextGen-1.0.jar $HOME/datasets/data_commons_wiki/commonswiki-latest-page.sql.gz $HOME/datasets/data_commons_wiki/commonswiki-latest-page.sql

real 1m19.458s
user 1m0.409s
sys 0m17.593s

time java -jar target/zlibNextGen-1.0.jar $HOME/datasets/data_commons_wiki/commonswiki-latest-pages-logging2.xml.gz $HOME/datasets/data_commons_wiki/commonswiki-latest-pages-logging2.xml

real 0m20.122s
user 0m13.025s
sys 0m6.726s

zlib-ng

time java -jar target/zlibNextGen-1.0.jar $HOME/datasets/data_commons_wiki/commonswiki-latest-page.sql.gz $HOME/datasets/data_commons_wiki/commonswiki-latest-page.sql

real 0m59.927s
user 0m40.901s
sys 0m17.308s

time java -jar target/zlibNextGen-1.0.jar $HOME/datasets/data_commons_wiki/commonswiki-latest-pages-logging2.xml.gz $HOME/datasets/data_commons_wiki/commonswiki-latest-pages-logging2.xml

real 0m14.648s
user 0m7.545s
sys 0m6.978s

@plasticity-cloud
Copy link
Author

Builder for standalone zlib-ng version that can be bundled for e.g. Lambda,
is provided in the repo:

https://github.com/plasticity-cloud/aws-next-gen/tree/main/al2023/zlib-ng-testing/zlib-ng-al2023-integration/standalone

@stewartsmith stewartsmith added performance Performance related issues enhancement New feature or request labels Jan 9, 2025
@stewartsmith
Copy link
Member

I have also run some of these experiments, and have found good performance improvements in a number of places.

I can't commit to making the change within Amazon Linux 2023, as we do need to balance risks of such a change within a major version of the Operating System.

I am really interested in your experiences with using zlib-ng in place of zlib on AL2023, as that's great input into our decision making.

@plasticity-cloud
Copy link
Author

Hi @stewartsmith, much appreciate for your initial feedback,
and definitely I do understand that having core OS library substituted
requires extra regression testing.

If you could direct me to public pipelines or regression test suite,
that your Team executes for every release, I would really like to execute those.

For zlib-ng tests, I will be able to share feedback by early next week for Lambda Container based deployment and EMR on classic EC2 and ECS AMI.

Regards,
Karol

@plasticity-cloud
Copy link
Author

Hi Stewart,
apologies for the delays.

Test setup: m6g.xlarge, 80GB GP3, Throughput 125MB/s, Standard IOPS,

  1. SOCI Snapshotter,
    it seems even with stock AL2023, we are not getting expected results when pulling
    large images in regular mode, when SOCI index is not available,

we are getting 3 seconds difference in favour

and in terms of pulling using SOCI index,
we had to transfer and generate ztoc indexes for e.g. public EMR images,

public.ecr.aws/emr-on-eks/spark/emr-7.5.0:latest

When first pulling image:
To boostrap container with SOCI index it takes on average with and without zlib-ng 7 seconds,
To boostrap container without SOCI index it takes 60 seconds.

We are suspecting that by default SOCI snapshotter doesn't using CGO bindings (with standard zlib and zlib-ng) and relies on only go bindings, despite having build requirement to use zlib-devel and zlib-static, or equivalent or zlib-ng-compat-devel and zlib-ng-compat-static

Will be doing investigation on that, as this would be really beneficial for ECS/EKS users on Fargate users.

Official performance benchmarks:

https://github.com/awslabs/soci-snapshotter/blob/main/docs/benchmark.md#prerequisites

executed on both setups:

soci-snapshotter-performanceTest_zlib_standard.zip
soci-snapshotter-performanceTest_zlib_ng.zip

@plasticity-cloud
Copy link
Author

Test on m6.large, decompressing on same volume using zstd with zlib/zlib-ng-compat bindings,

cpu wise zlib-ng is performing at least 2 times better compared to stock zlib,
leaving enough bandwidth for another preprocessing tasks:

cpu_stats_decompress_gp3_m6_large.zip

@plasticity-cloud
Copy link
Author

Code to build zlib-ng rpms is hosted in following repository:

https://github.com/plasticity-cloud/aws-next-gen/tree/main/al2023/zlib-ng-testing/zlib-ng-al2023-integration/os-bundle

after checkout to build rpms locally:

cd al2023/zlib-ng-testing/zlib-ng-al2023-integration/os-bundle/

./rpm-builder-standalone.sh
./zlib-ng-build-standalone.sh

This will output tar.gz bundle with rpms

./releases/latest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance Performance related issues
Projects
None yet
Development

No branches or pull requests

2 participants