Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coreos-assembler breaks on ppc64le #194

Closed
manojnkumar opened this issue Nov 6, 2018 · 74 comments
Closed

coreos-assembler breaks on ppc64le #194

manojnkumar opened this issue Nov 6, 2018 · 74 comments

Comments

@manojnkumar
Copy link

manojnkumar commented Nov 6, 2018

This is the error I see when trying to run this on the ppc64le architecture.

[root@rhel-ocpapp1 coreos]# coreos-assembler init https://github.com/coreos/fedora-coreos-config
Trying to pull quay.io/cgwalters/coreos-assembler...Getting image source signatures
Copying blob sha256:e69e955c514f72cef9e9e7db408266a32ba48133f0e2a27d5e4b52ed1995d864
 85.99 MB / 85.99 MB [=====================================================] 10s
Copying blob sha256:a8a4d821e15658a276717ed65d865469e12dabbde7cf1871c755025118b58f3b
 160 B / 160 B [============================================================] 0s
Copying blob sha256:99a83b419d3cb4c515d32e0af4ca354f48dd4fdc40872f7bd0e4b939a04afee8
 2.53 KB / 2.53 KB [========================================================] 0s
Copying blob sha256:8faffdba319aeddefcd0a7598774221bf6ac3e7ba136bc147e0231a74e6166fc
 1.16 KB / 1.16 KB [========================================================] 0s
Copying blob sha256:a417b5e014d6bfb6b81ce057971d82f67739a0c9ebd47dd615939a63255cd7a7
 561.86 MB / 561.86 MB [===================================================] 56s
Copying blob sha256:acf8382988a511a2e77d66b6db5d6ab926198625b1d451a21040458b984184ba
 3.51 MB / 3.51 MB [========================================================] 0s
Copying blob sha256:b4118eadddddafb8e15b73ac929c6cb0e40a53036d49d87c611982c9c66adf96
 169.87 MB / 169.87 MB [===================================================] 19s
Copying blob sha256:134fd40227f49150593822289e2fd1efdf359ec2730aaf6ab01488001eab3b65
 3.32 KB / 3.32 KB [========================================================] 0s
Copying blob sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
 32 B / 32 B [==============================================================] 0s
Copying blob sha256:710caf03ad410b4ffd7ae487eb56e51e6a3b58f16a9711f551f03cbed8a40973
 137 B / 137 B [============================================================] 0s
Skipping fetch of repeat blob sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
Skipping fetch of repeat blob sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
Skipping fetch of repeat blob sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
Skipping fetch of repeat blob sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
Writing manifest to image destination
Storing signatures
standard_init_linux.go:190: exec user process caused "exec format error"
@manojnkumar manojnkumar changed the title coreos-assembler brbeaks on ppc64le coreos-assembler breaks on ppc64le Nov 6, 2018
@dustymabe
Copy link
Member

hi @manojnkumar. Thanks for trying out coreos-assembler!

This is probably not going to work out of the box right now because we haven't done any enablement work to get it to happen yet. One example would be the container runtime is pulling an x86_64 container from quay.io (we haven't built it for non x86 architectures yet).

In order to get this to work you'll need to build the container locally on ppc64le hardware first and then use the container you just built as the assembler container instead of quay.io/cgwalters/coreos-assembler. Does that make sense?

You'll probably hit other issues along the way, but let's document them and we can try to get it working.

@manojnkumar
Copy link
Author

I cloned this project and ran the coreos-assembler script directly and running into the following pre-requisites:

/home/cloud-user//coreos-assembler/coreos-assembler init https://github.com/coreos/fedora-coreos-config
...
error: Failed to find expected dependencies: rpm-ostree dnf-utils fedpkg rpmdistro-gitoverlay distribution-gpg-keys python2-gobject-base python3-gobject-base podman buildah jq awscli ignition python3-dateutil

@dustymabe : Could you provide me instructions on building a ppc64le container?

@manojnkumar
Copy link
Author

Should I be building on RHEL, or somewhere else (CentOS etc.)?

@dustymabe
Copy link
Member

I cloned this project and ran the coreos-assembler script directly

the tool primarily expects to be run from within a container. So you'd need to clone this repo and then do something like podman build -t coreos-assembler . (you can use docker there if you prefer). Then set up your alias to use coreos-assembler rather than quay.io/cgwalters/coreos-assembler and run it again.

Does that make sense?

Should I be building on RHEL, or somewhere else (CentOS etc.)?

I run on Fedora usually, but it shouldn't matter that much I don't think because the environment is in the container.

@manojnkumar
Copy link
Author

docker build fails at this stage:

=False

  • read line
  • mv /etc/yum.repos.d/fedora.repo.new /etc/yum.repos.d/fedora.repo
  • cat
  • cat
    ---> fe56827f4074
    Removing intermediate container ca31db715dba
    Step 5/12 : RUN ./build.sh install_rpms
    ---> Running in 72e39808c1a3

++ pwd

  • srcdir=/root/containerbuild
  • install_rpms
  • dnf -y distro-sync
    Repository 'fahc' is missing name in configuration, using id.
    Error: Failed to synchronize cache for repo 'updates'
    The command '/bin/sh -c ./build.sh install_rpms' returned a non-zero code: 1

@dustymabe
Copy link
Member

* Error: Failed to synchronize cache for repo 'updates'

That's typically a network intermittent failure. If you run it multiple times does it continue to die in the same place?

cc @sinnykumari who might be able to try this on some ppc64le hardware she has.

@manojnkumar
Copy link
Author

I tried this on several places, where I kept seeing similar errors. Finally I hit a server where I was able to go further:

Installed:
findutils.ppc64le 1:4.6.0-19.fc28

Complete!
++ grep -v '^#' /root/containerbuild/build-deps.txt

  • self_builddeps=golang
  • echo golang
  • grep -v '^#' /root/containerbuild/deps.txt
  • xargs dnf -y install
    Repository 'fahc' is missing name in configuration, using id.
    Failed to synchronize cache for repo 'dustymabe-ignition', disabling.
    Failed to synchronize cache for repo 'walters-buildtools-fedora', disabling.
    Last metadata expiration check: 0:00:00 ago on Wed Nov 7 19:05:26 2018.
    No match for argument: rpmdistro-gitoverlay
    Package python3-gobject-base-3.28.3-1.fc28.ppc64le is already installed, skipping.
    Error: Unable to find a match
    The command '/bin/sh -c ./build.sh install_rpms' returned a non-zero code: 123

@dustymabe
Copy link
Member

findutils.ppc64le 1:4.6.0-19.fc28

hmm. this seems odd. Are you trying to build an f28 image? We switched to f29 some time ago.

* Failed to synchronize cache for repo 'dustymabe-ignition', disabling.

There aren't any ppc64le builds for this for f28, there is for f29

  Failed to synchronize cache for repo 'walters-buildtools-fedora', disabling.

@cgwalters can you enable ppc64le for your copr repo? https://copr.fedorainfracloud.org/coprs/walters/buildtools-fedora/

@dustymabe
Copy link
Member

findutils.ppc64le 1:4.6.0-19.fc28

hmm. this seems odd. Are you trying to build an f28 image? We switched to f29 some time ago.

actually. i was mistaken.. this is the container build itself. I have opened #195 for us to switch to f29 for the assembler container.

@dustymabe
Copy link
Member

For now, once colin has enabled ppc64le builds for his copr (wait to hear back from him here in this issue), you can switch to f29 in the Dockerfile and try again.

@manojnkumar
Copy link
Author

Switching to f29 in the Dockerfile, hit the same issue:

Complete!

  • dnf -y install /usr/bin/xargs
    Repository 'fahc' is missing name in configuration, using id.
    fahc 4.5 kB/s | 1.5 kB 00:00
    Copr repo for buildtools-fedora owned by walter 980 B/s | 341 B 00:00
    Failed to synchronize cache for repo 'walters-buildtools-fedora', ignoring this repo.
    Package findutils-1:4.6.0-21.fc29.ppc64le is already installed.
    Dependencies resolved.
    Nothing to do.
    Complete!
    ++ grep -v '^#' /root/containerbuild/build-deps.txt
  • self_builddeps=golang
  • echo golang
  • grep -v '^#' /root/containerbuild/deps.txt
  • xargs dnf -y install
    Repository 'fahc' is missing name in configuration, using id.
    Copr repo for buildtools-fedora owned by walter 1.1 kB/s | 341 B 00:00
    Failed to synchronize cache for repo 'walters-buildtools-fedora', ignoring this repo.
    Last metadata expiration check: 0:00:04 ago on Wed Nov 7 21:22:02 2018.
    No match for argument: rpmdistro-gitoverlay
    Error: Unable to find a match
    The command '/bin/sh -c ./build.sh install_rpms' returned a non-zero code: 123

@cgwalters
Copy link
Member

This is the same issue as #30 (comment) which was for aarch64.

Although, since COPR does support ppc64le I added that to the buildroots and did the "update revision" dance and started a new build: https://copr.fedorainfracloud.org/coprs/walters/buildtools-fedora/build/820725/

Like I said in that ticket though...it may be simplest to drop rdgo from this container for now.

@dustymabe
Copy link
Member

now that colin has enabled ppc64le and the build succeeded, you can try it again now and see where it breaks this time :)

@manojnkumar
Copy link
Author

Thanks @cgwalters @dustymabe . It now fails here:

Package findutils-1:4.6.0-21.fc29.ppc64le is already installed.
Dependencies resolved.
Nothing to do.
Complete!
++ grep -v '^#' /root/containerbuild/build-deps.txt

  • self_builddeps=golang
  • echo golang
  • grep -v '^#' /root/containerbuild/deps.txt
  • xargs dnf -y install
    Repository 'fahc' is missing name in configuration, using id.
    Last metadata expiration check: 0:00:03 ago on Wed Nov 7 22:22:28 2018.
    Error:
    Problem 1: conflicting requests
    • package rpm-ostree-2018.9.38-2c4231b3769a99af36fdd68d03c1b24468f3f5b2.d8a5bf5d7acae1bdd57001f2a39aa8e890f79c0f.fc28.x86_64 does not have a compatible architecture
    • nothing provides libpthread.so.0(GLIBC_2.2.5)(64bit) needed by rpm-ostree-2018.9.38-2c4231b3769a99af36fdd68d03c1b24468f3f5b2.d8a5bf5d7acae1bdd57001f2a39aa8e890f79c0f.fc28.x86_64
    • nothing provides rpm-ostree-libs(x86-64) = 2018.9.38-2c4231b3769a99af36fdd68d03c1b24468f3f5b2.d8a5bf5d7acae1bdd57001f2a39aa8e890f79c0f.fc28 needed by rpm-ostree-2018.9.38-2c4231b3769a99af36fdd68d03c1b24468f3f5b2.d8a5bf5d7acae1bdd57001f2a39aa8e890f79c0f.fc28.x86_64
      Problem 2: package skopeo-1:0.1.32-2.dev.gite814f96.fc29.ppc64le requires libostree-1.so.1()(64bit), but none of the providers can be installed
    • package skopeo-1:0.1.32-2.dev.gite814f96.fc29.ppc64le requires libostree-1.so.1(LIBOSTREE_2016.3)(64bit), but none of the providers can be installed
    • package skopeo-1:0.1.32-2.dev.gite814f96.fc29.ppc64le requires libostree-1.so.1(LIBOSTREE_2016.8)(64bit), but none of the providers can be installed
    • conflicting requests
    • package ostree-libs-2018.8-1.fc29.ppc64le is excluded
      Problem 3: package buildah-1.4-1.dev.git0a7389c.fc29.ppc64le requires libostree-1.so.1()(64bit), but none of the providers can be installed
    • package buildah-1.4-1.dev.git0a7389c.fc29.ppc64le requires libostree-1.so.1(LIBOSTREE_2016.3)(64bit), but none of the providers can be installed
    • package buildah-1.4-1.dev.git0a7389c.fc29.ppc64le requires libostree-1.so.1(LIBOSTREE_2016.8)(64bit), but none of the providers can be installed
    • conflicting requests
    • package ostree-libs-2018.8-1.fc29.ppc64le is excluded
      Problem 4: conflicting requests
    • package podman-1:0.10.1.3-4.gitdb08685.fc29.ppc64le requires libostree-1.so.1()(64bit), but none of the providers can be installed
    • package podman-1:0.10.1.3-4.gitdb08685.fc29.ppc64le requires libostree-1.so.1(LIBOSTREE_2016.3)(64bit), but none of the providers can be installed
    • package podman-1:0.10.1.3-4.gitdb08685.fc29.ppc64le requires libostree-1.so.1(LIBOSTREE_2016.8)(64bit), but none of the providers can be installed
    • package podman-1:0.10.1-1.gite4a1553.fc29.ppc64le requires libostree-1.so.1()(64bit), but none of the providers can be installed
    • package podman-1:0.10.1-1.gite4a1553.fc29.ppc64le requires libostree-1.so.1(LIBOSTREE_2016.3)(64bit), but none of the providers can be installed
    • package podman-1:0.10.1-1.gite4a1553.fc29.ppc64le requires libostree-1.so.1(LIBOSTREE_2016.8)(64bit), but none of the providers can be installed
    • package ostree-libs-2018.8-1.fc29.ppc64le is excluded
      The command '/bin/sh -c ./build.sh install_rpms' returned a non-zero code: 123

@cgwalters
Copy link
Member

The fahc repo is generated by https://ci.centos.org/view/Atomic/job/fahc-rdgo/ which is indeed currently x86_64 only.

It's tricky since today c-a tends to rely on rpm-ostree built from git master. This all loops back to sadly Koji (and to a lesser degree COPR) "owning" the multi-arch hardware; makes it harder for other projects that want to do CI or builds differently to use it too. This is https://github.com/projectatomic/rpmdistro-gitoverlay/blob/master/doc/reworking-fedora-releng.md#blend-upstream-testing-and-downstream-testing

Anyways so...you could probably remove the bits at the top that enable fahc, or we could switch to COPR I guess, although that takes us back to manual integration...

@manojnkumar
Copy link
Author

manojnkumar commented Nov 7, 2018

@cgwalters: Commenting out fahc does get the build much further. It now fails at:

Step 7/13 : RUN ./build.sh make_and_makeinstall
 ---> Running in 62f640557d8c

++ pwd
+ srcdir=/root/containerbuild
+ make_and_makeinstall
+ test -d .git
+ mkdir -p /usr/app/
+ rsync -rlv /root/containerbuild/ostree-releng-scripts/ /usr/app/ostree-releng-scripts/
sending incremental file list
created directory /usr/app/ostree-releng-scripts
./

sent 39 bytes  received 72 bytes  222.00 bytes/sec
total size is 0  speedup is 0.00
+ test -f mantle/README.md
+ echo 'Run: git submodule update --init'
Run: git submodule update --init
+ exit 1
The command '/bin/sh -c ./build.sh make_and_makeinstall' returned a non-zero code: 1

@manojnkumar
Copy link
Author

manojnkumar commented Nov 8, 2018

OK, finally I got the docker build to complete. I had to add the git submodule update --init to the Dockerfile. Here are my changes so far:

[root@rhel-ocpinfra1 coreos-assembler]# git diff
diff --git a/Dockerfile b/Dockerfile
index 0599f8c..be5c830 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -1,4 +1,4 @@
-FROM registry.fedoraproject.org/fedora:28
+FROM registry.fedoraproject.org/fedora:29
 WORKDIR /root/containerbuild
 
 
 # Only need a few of our scripts for the first few steps
@@ -8,6 +8,7 @@ RUN ./build.sh install_rpms
 
 # Ok copy in the rest of them for the next few steps
 COPY ./ /root/containerbuild/
+RUN git submodule update --init
 RUN ./build.sh make_and_makeinstall
 RUN ./build.sh configure_user
 
diff --git a/build.sh b/build.sh
index 64f0703..a0d0299 100755
--- a/build.sh
+++ b/build.sh
@@ -5,19 +5,6 @@ srcdir=$(pwd)
 
 configure_yum_repos() {
 
-    # Enable FAHC https://pagure.io/fedora-atomic-host-continuous
-    # so we have ostree/rpm-ostree git master for our :latest
-    # NOTE: The canonical copy of this code lives in rpm-ostree's CI:
-    # https://github.com/projectatomic/rpm-ostree/blob/d2b0e42bfce972406ac69f8e2136c98f22b85fb2/ci/build.sh#L13
-    # Please edit there first
-    echo -e '[fahc]\nmetadata_expire=1m\nbaseurl=https://ci.centos.org/artifacts/sig-atomic/fahc/rdgo/build/\ngpgcheck=0\n' > /etc/yum.repo
-    # Until we fix https://github.com/rpm-software-management/libdnf/pull/149
-    excludes='exclude=ostree ostree-libs ostree-grub2 rpm-ostree'
-    for repo in /etc/yum.repos.d/fedora*.repo; do
-        cat ${repo} | (while read line; do if echo "$line" | grep -qE -e '^enabled=1'; then echo "${excludes}"; fi; echo $line; done) > ${r
-        mv ${repo}.new ${repo}
-    done
-
     # enable `walters/buildtools-fedora` copr
        # pulled from https://copr.fedorainfracloud.org/coprs/walters/buildtools-fedora/repo/fedora-28/walters-buildtools-fedora-fedora-28.r
     cat > /etc/yum.repos.d/walters-buildtools-fedora-fedora-28.repo  <<'EOF'

@manojnkumar
Copy link
Author

Also looking at the logs I need to add something here for ppc64le?

  • make
    cd mantle && ./build ore kola kolet
    Building ore
    Building kola
    Building amd64/kolet
    Building arm64/kolet

@dustymabe
Copy link
Member

hey @manojnkumar - I edited your earlier comment to make it easier to read. Here is a guide on markdown I've been using for github: https://guides.github.com/features/mastering-markdown/

Also looking at the logs I need to add something here for ppc64le?

I don't know if we've looked at building the mantle codebase on other architectures. @bgilbert, @arithx, can you comment on that?

@manojnkumar, for now you can comment out the cd mantle && ./build ore kola kolet as it's not explicitly needed for build.

@manojnkumar
Copy link
Author

manojnkumar commented Nov 8, 2018

Thanks @dustymabe. I proceeded with the coreos-assembler init and build steps with my built container for ppc64le. The build steps fails with:

[root@rhel-ocpinfra1 coreos]# coreos-assembler build
Using manifest: /srv/src/config/manifest.yaml
bwrap: Creating new namespace failed, likely because the kernel does not support user namespaces. bwrap must be installed setuid on such systems.
error: bwrap test failed, see <https://github.com/projectatomic/rpm-ostree/pull/429>: Executing bwrap(true): Child process killed by signal 1

Any clues @cgwalters ?

@dustymabe
Copy link
Member

Did you set up an alias for coreos-assembler ? If so what did you set it to? What is the OS that is on your laptop ?

@cgwalters
Copy link
Member

What's your host system? Is it RHEL7? You need to enable user namespaces. See...

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux_atomic_host/7/html-single/getting_started_with_containers/index#user_namespaces_options

Specifically sysctl -w user.max_user_namespaces=255888 should be all you need on RHEL (or RHELAH) 7.5+.

@manojnkumar
Copy link
Author

alias coreos-assembler='docker run --rm --net=host -ti --privileged --userns=host -v $(pwd):/srv --workdir /srv coreos-assembler'

Yes RHEL 7.5.

[root@rhel-ocpinfra1 coreos-assembler]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.5 (Maipo)

@cgwalters
Copy link
Member

@manojnkumar Please try the sysctl command as root in #194 (comment)

It's quite safe in general, most particularly if you don't have any potentially hostile users with ssh access to the system - user namespaces are disabled by default on RHEL7 out of conservatism.

@manojnkumar
Copy link
Author

Doesn't seem to help @cgwalters. Have been trying to do the rest of the steps in the link you provided, and I get into a circular issue with the userns setting in /etc/sysconfig/docker and the lines in /etc/subuid

[root@manoj-ocp-service-catalog coreos]# coreos-assembler build
Using manifest: /srv/src/config/manifest.yaml
bwrap: Creating new namespace failed, likely because the kernel does not support user namespaces. bwrap must be installed setuid on such systems.
error: bwrap test failed, see coreos/rpm-ostree#429: Executing bwrap(true): Child process killed by signal 1
[root@manoj-ocp-service-catalog coreos]# sysctl -w user.max_user_namespaces=255888
user.max_user_namespaces = 255888
[root@manoj-ocp-service-catalog coreos]# coreos-assembler build
Using manifest: /srv/src/config/manifest.yaml
error: This command requires root privileges

@bgilbert
Copy link
Contributor

bgilbert commented Nov 8, 2018

I'm not sure if ore and kola currently build on non-x86, but if not, the fixes should be minor. And yes, we'd need to add a kolet build for ppc64le.

@manojnkumar
Copy link
Author

manojnkumar commented Nov 9, 2018

@cgwalters : When I try to pull a container built on another system, to a system that has userns enabled I see this error. I seem to be stuck, with now way to proceed to the build phase:

56bfe5829eb1: Download complete
0bc8291d9890: Download complete
failed to register layer: Error processing tar file(exit status 1): Container ID 1000 cannot be mapped to a host ID

@cgwalters
Copy link
Member

I haven't played much with userns in Docker...can you try turning it off? To clarify, are you using the upstream Docker or the one in RHEL Extras?

@sinnykumari
Copy link
Contributor

error: --ex-unified-core requires a bare-user repository

weird.. could you try to delete your working directory (/srv/ for most of us) and try again?

I have tried this after cleaning up /srv/ and then was getting the error. Note that I am using rpm-ostree-2018.9-3.fc28.1 on ppc64le. If I switch back to rpm-ostree-2018.9-3.fc28.1 on x6_64, I get same error here as well. That's why I believe that this issue should disappear on ppc64le when we start using latest rpm-ostree (coreos/rpm-ostree#1657 looks to me as one of the related change)

@jlebon
Copy link
Member

jlebon commented Nov 20, 2018

For the purposes of the exercise for now, I'd recommend just building rpm-ostree from source in the container. I can make a ppc64le scratch build in Koji if you'd like? Let me do that.

@jlebon
Copy link
Member

jlebon commented Nov 20, 2018

@jlebon
Copy link
Member

jlebon commented Nov 20, 2018

Ahh heh, it needs a newer OSTree as well. Hmm unless... yes I can patch out that dependency. New build: https://koji.fedoraproject.org/koji/taskinfo?taskID=31022867

@sinnykumari
Copy link
Contributor

Ahh heh, it needs a newer OSTree as well. Hmm unless... yes I can patch out that dependency. New build: https://koji.fedoraproject.org/koji/taskinfo?taskID=31022867

Thanks a lot!

@sinnykumari
Copy link
Contributor

I have created repo with the latest rpm-ostree scratch build rpms which @jlebon built and is available at https://sinnykumari.fedorapeople.org/custom-build/rpm-ostree/f28/ . Will update it in future with more recent builds if needed (until we have some better solution)

@cgwalters
Copy link
Member

I'm elaborating on this point here mainly because I want to be able to link to this thread later as a rationale elsewhere.

To build on/rephrase #194 (comment) a bit more:

Loosely coupled components in general are good. I think it's worked out well to have rpm-ostree separate from libostree, which is in turn separate from e.g. grub, and on a higher level an operator/ansible/gnome-software or whatever.

However: sometimes we really do want to - for a period of time - tightly couple selected components and iterate on them quickly together to solve a problem. This definitely has happened a lot with rpm-ostree/libostree, and the fact that rpm-ostree can (if we need to) depend on libostree git master in CI has been absolutely critical.

On the flip side...for example, I think the libostree/grub border is too strong, and if we had the capability to e.g. fork grub for a time and tightly couple it with libostree it would have helped solve some problems.

Same thing will happen with coreos-assembler + [orchestrating pipeline]. Ideally they are loosely coupled, but some times you just need to make coordinated changes to both.

What I'm saying here is this isn't just about coreos-assembler + rpm-ostree - it's about how we think about software delivery, packaging, and the Koji model of "components can only use other's releases" is too limiting and also discourages people from hacking on higher level or lower level parts.

@dustymabe
Copy link
Member

the Koji model of "components can only use other's releases" is too limiting and also discourages people from hacking on higher level or lower level parts.

In coreos/fedora-coreos-tracker#84 I'm advocating for builds of upstream dev branches, not just releases. I think we are agreeing with one another. Am I mistaken?

@sinnykumari
Copy link
Contributor

Another issue found during coreos-asembler build during Preparing kernel kernel phase:

dracut: *** Including module: ignition ***
dracut-install: Failed to find module 'qemu_fw_cfg'
dracut: FAILED:  /usr/lib/dracut/dracut-install -D /tmp/dracut/dracut.3ymflN/initramfs --kerneldir /lib/modules/4.19.4-300.fc29.ppc64le/ -m qemu_fw_cfg
dracut: installkernel failed in module ignition
error: Finalizing rootfs: During kernel processing: Executing bwrap(rpmostree-dracut-wrapper): Child process killed by signal 1

From PR coreos/ignition-dracut#25 , it seems install of qemu_fw_cfg module was made explicit. qemu_fw_cfg module is provided by kernel-core sub-package which is available for x86_64 but not for ppc64le .

Is this module essential or we can make it on demand loading depending upon availability?

@bgilbert
Copy link
Contributor

Is this module essential or we can make it on demand loading depending upon availability?

It's currently needed for Ignition to read userdata from QEMU; see coreos/ignition#656.

@dustymabe
Copy link
Member

@sinnykumari
qemu_fw_cfg module is provided by kernel-core sub-package which is available for x86_64 but not for ppc64le

does that mean the functionality just doesn't exist on ppc64le ?

@bgilbert
It's currently needed for Ignition to read userdata from QEMU; see coreos/ignition#656.

Might be worth investigating if SMBIOS OEM string would work on ppc64le

@sinnykumari
Copy link
Contributor

@sinnykumari
qemu_fw_cfg module is provided by kernel-core sub-package which is available for x86_64 but not for ppc64le

does that mean the functionality just doesn't exist on ppc64le ?

May have to check with Fedora kernel folks. https://github.com/torvalds/linux/blob/master/drivers/firmware/qemu_fw_cfg.c#L9 talks about x86_64 and arm but mention nothing about Power.

@bgilbert
It's currently needed for Ignition to read userdata from QEMU; see coreos/ignition#656.

Might be worth investigating if SMBIOS OEM string would work on ppc64le

@jlebon
Copy link
Member

jlebon commented Nov 26, 2018

Well, we don't really need qemu_fw_cfg. It just makes it really convenient to test locally, right?

From PR coreos/ignition-dracut#25 , it seems install of qemu_fw_cfg module was made explicit.

Note this logic was tweaked in coreos/ignition-dracut#28 to support the feature baked in. See also specifically coreos/ignition-dracut#28 (comment) re. not necessarily needing it.

Might be worth investigating if SMBIOS OEM string would work on ppc64le

Ahh that'd be neat. Otherwise I guess we could recommend config drives or coreos.config.url for local testing on those platforms? (And e.g. adapt coreos-assembler run as well).

@bgilbert
Copy link
Contributor

It turns out that SMBIOS is only available on x86 and ARM. fw_cfg is available on more architectures, but not ppc64:

https://github.com/torvalds/linux/blob/ef78e5ec9214376c5cb989f5da70b02d0c117b66/drivers/firmware/Kconfig#L199-L201

Well, we don't really need qemu_fw_cfg. It just makes it really convenient to test locally, right?

At present, if we don't have it, there's no reasonable way to pass an Ignition config to a ppc64 QEMU VM (coreos/ignition#666). (It'd have to be done by netbooting the guest or modifying the disk image.) That would also affect production virtualization.

Otherwise I guess we could recommend config drives or coreos.config.url for local testing on those platforms?

By policy, Ignition doesn't support config drives.

@stellirin
Copy link

qemu_fw_cfg also does not exist in the kernel-core package on ARM, and AFAICT the default ARM kernel does not enable virtualization at all.

I hit the same error last week trying to build armhfp. I didn't yet report due to traveling, I wanted to dig a little deeper when back home.

@cgwalters
Copy link
Member

modifying the disk image

This is cheap with qcow2 backing files (or reflinks if available). The main thing is to make it not an ergonomic hit, but I see no issue with unilaterally changing e.g. coreos-assembler run to do it. However there are other projects that use it too, like the openshift installer on libvirt.

We may end up needing to ship a script to inject ignition configs which would be used by run -and allow projects like the installer to vendor it?

@bgilbert
Copy link
Contributor

We can ship a wrapper script (as we do in Container Linux as a convenience) but that doesn't really help users who want to run VMs directly in existing systems, e.g. libvirt. In CL we've generally felt that the right approach is to introduce proper userdata support into the various virtualization systems, rather than forcing users to go through an extra mangling step between "download VM image" and "run VM in the usual way".

In this instance, see coreos/ignition#656 (comment).

@manojnkumar
Copy link
Author

@sinnykumari : How far do you get with building for ppc64le? And what changes did you have to make? I am still not getting any further with the changes I had staged. Are any of your changes worth pushing?

@sinnykumari
Copy link
Contributor

@sinnykumari : How far do you get with building for ppc64le? And what changes did you have to make? I am still not getting any further with the changes I had staged. Are any of your changes worth pushing?

arch-specific changes which can be pushed upstream has already been done:

Other than that, few changes are there which I did locally to see how far we can go with build on ppc64le, but these changes are not meant to get in officially:

After doing above said changes coreos-assembler build fails during virt-install, full logs is available here .

Please note that my last attempt was around 1 week back and I didn't get time to try further(hoping to get back to it soon).

@manojnkumar
Copy link
Author

@sinnykumari : I am also hitting build failures. I could not find your pastebin to compare.

Also your rpm-ostree comment indicates you are trying f28, whereas the ignition comment indicates f29. I am having issues with both.

@cgwalters
Copy link
Member

qemu_fw_cfg module is provided by kernel-core sub-package which is available for x86_64 but not for ppc64le

The QEMU Ignition config is a very nice to have for development but not currently a targeted production path. Production would be bare metal (PXE HTTP fetch of Ignition config or mounting the /boot partition and looking for Ignition there).

And the other production path would be OpenStack, fetching Ignition over user data just like happens on x86_64.

@bgilbert
Copy link
Contributor

The QEMU Ignition config is a very nice to have for development but not currently a targeted production path. Production would be bare metal (PXE HTTP fetch of Ignition config or mounting the /boot partition and looking for Ignition there).

And the other production path would be OpenStack, fetching Ignition over user data just like happens on x86_64.

kola support should be a prerequisite for supporting an architecture. Running kola tests in OpenStack meets this requirement, but not in a very satisfactory way; individual developers may not have access to a ppc64le OpenStack cluster.

@cgwalters
Copy link
Member

This is fixed right?

@dustymabe
Copy link
Member

This is fixed right?

@jcajka might be able to say

@sinnykumari
Copy link
Contributor

@jcajka Can you confirm?

@jcajka
Copy link
Collaborator

jcajka commented May 29, 2019

@cgwalters @sinnykumari I would say as running, init, fetch and build it should be working. I'm currently working on beyond that. Run PR #408 and started to look at oscontainer build atm.
I would defer declaring this fixed on you and @manojnkumar

@cgwalters
Copy link
Member

Should be long since fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants