Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jinterface: Build determenistic OtpErlang.jar #5580

Open
wants to merge 1 commit into
base: maint
Choose a base branch
from

Conversation

avtobiff
Copy link
Contributor

@avtobiff avtobiff commented Jan 5, 2022

Handroll lib/jinterface/priv/OtpErlang.jar to support a deterministic
build.

Method used: Manually craft META-INF/MANIFEST.MF, touch all JAR file
contents with the same timestamp across builds, generate a deterministic
ZIP file (the JAR file).

Deterministic build not support on win32 target for now.

See #4417

Signed-off-by: Per Andersson [email protected]

@avtobiff avtobiff changed the base branch from master to maint January 5, 2022 01:51
Handroll lib/jinterface/priv/OtpErlang.jar to support a deterministic
build.

Method used: Manually craft META-INF/MANIFEST.MF, touch all JAR file
contents with the same timestamp across builds, generate a deterministic
ZIP file (the JAR file).

Deterministic build not support on win32 target for now.

See erlang#4417

Signed-off-by: Per Andersson <[email protected]>
@avtobiff avtobiff force-pushed the avtobiff/jinterface/build-deterministic-jar branch from a93d9c1 to fb5644f Compare January 5, 2022 04:05
@IngelaAndin IngelaAndin added the team:VM Assigned to OTP team VM label Jan 8, 2022
@sverker
Copy link
Contributor

sverker commented Jan 10, 2022

With my limited java knowledge, it looks a little bit iffy to create your own jar file like this with a hard coded manifest. Would it not be less hackish to let jar create the .jar file, then unzip it, touch --date of all files within, and then (re)zip it.

@avtobiff
Copy link
Contributor Author

Before commenting on the method suggested in this PR,
it is possible to have the timestamp of the files set to known value
instead of something arbitrary. This is standardised in the
environment variable SOURCE_DATE_EPOCH [0].

Is this preferable? If so, reproducible builds should probably be
documented somewhere.

[0] https://reproducible-builds.org/docs/source-date-epoch/

With my limited java knowledge, it looks a little bit iffy to create your own jar file like this with a hard coded manifest. Would it not be less hackish to let jar create the .jar file, then unzip it, touch --date of all files within, and then (re)zip it.

The manifest generated by jar on my machine is

Manifest-Version: 1.0
Created-By: 18-ea (Debian)

The only additional thing that would be included is Created-By (which is
generated by the jar tool, showing what java implementation was used
to generate the jar) [1].

A default manifest only includes Manifest-Version and Created-By [2].

However, the manifest specification includes a required main-section,
which in turn, only requires version-info and all other attributes are
optional. [3]

manifest-file: main-section newline *individual-section
main-section: version-info newline *main-attribute
version-info: Manifest-Version : version-number

My reasoning was that it was not important information to convey,
I might be wrong. It seemed wasteful to first generate a jar (i.e. zip file)
then unzip it, and then again recreate it; when a jar file can be
created from scratch without an extra jar/unzip step.

What do others do?

Created-By is stripped by the reproducible-build-maven-plugin. [4]

The method suggested in this PR was inspired by Gary Rowe's blogpost
on How to create a deterministic JAR. [5]

I don't think creating a JAR file with jar, unzipping, fixing timestamps,
then zipping it again will add much.

Generating the jar file with zip will not add the JAR file magic (0xCAFE),
so the file will effectively be a zip file while a file generated by jar
would have the magic and present itself as such

$ file OtpErlang.jar
OtpErlang.jar: Zip archive data, at least v1.0 to extract, compression method=store
$ file OtpErlang.jar.jar
OtpErlang.jar.jar: Java archive data (JAR)

However, this doesn't seem to bother neither jar or javac, which
understands it fine

$ jar -tf OtpErlang.jar
META-INF/
META-INF/MANIFEST.MF
com/ericsson/otp/erlang/
com/ericsson/otp/erlang/OtpMD5.class
(...)
$ cat Test.java
import com.ericsson.otp.erlang.*;
public class Test { public static void main(String args[]) { ; } }
$ javac -classpath OtpErlang.jar Test.java
$ echo $?
0
$ file Test.class
Test.class: compiled Java class data, version 62.0

Trying to use an empty (i.e. corrupted jar) file generates the
following error

$ touch Empty.jar
$ javac -classpath Empty.jar Test.java
error: error reading Empty.jar; zip file is empty

Another option is to use strip-nondeterminism [6] if available,
which will produce the same result basically. This will add another
build dependency though. The JAR file magic will be present and
the MANIFEST.MF will be kept as generated by jar.

[1] https://docs.oracle.com/en/java/javase/17/docs/specs/jar/jar.html#main-attributes
[2] https://docs.oracle.com/javase/tutorial/deployment/jar/defman.html
[3] https://docs.oracle.com/en/java/javase/17/docs/specs/jar/jar.html#manifest-specification
[4] https://github.com/Zlika/reproducible-build-maven-plugin/blob/master/src/main/java/io/github/zlika/reproducible/ManifestStripper.java#L24-L32
[5] https://gary-rowe.com/2013-08-08-how-to-create-a-deterministic-jar/
[6] https://reproducible-builds.org/tools/

@sverker
Copy link
Contributor

sverker commented Jan 11, 2022

Ok, I yield about handrolled manifest.

However, I discovered a disadvantage with the current use of touch --date on the existing class files. It disables fast incremental builds. Repeated invocations of make in lib/jinterface/ will recompile all java files as they all look newer than the "old" class files.

@sverker sverker added testing currently being tested, tag is used by OTP internal CI and removed testing currently being tested, tag is used by OTP internal CI labels Jan 11, 2022
@avtobiff
Copy link
Contributor Author

Ok, I yield about handrolled manifest.

I know it was a handfull, but I had to do all this research myself
so might as well present it. :)

I would like to keep the jar file magic, not particularly content
with the jar file being identified as a zip file. I'll see if there is
another way, perhaps create a Java program which uses
java.util.jar.

However, I discovered a disadvantage with the current use of touch --date on the existing class files. It disables fast incremental builds. Repeated invocations of make in lib/jinterface/ will recompile all java files as they all look newer than the "old" class files.

I'll investigate if the jar can be assembled so fast incremental builds
are not affected. The jar contents can perhaps be copied to a temporary
build directory where the timestamps are fixed.

I would also like to raise that I do not have any possibility to test this on
mac, win32, or e.g. any BSD. I know date can be different across
platforms. Is it ok to disable reproducible builds on win32?

What about the outstanding question about SOURCE_DATE_EPOCH?
Should that be used, if set, instead of a hardcoded value? If so, where
should this documentation go?

@sverker
Copy link
Contributor

sverker commented Jan 11, 2022

I'm leaning towards using strip-nondeterminism for this. It retains the jar file magic and it doesn't touch the class files.

It could be an optional build dependency; if you want deterministic builds of jinterface, install strip-nondeterminism.

@avtobiff
Copy link
Contributor Author

avtobiff commented Jan 12, 2022

I can redo this PR to use strip-nondeterminism.

Maybe it should still be configurable to use it or not though?
I have it installed but might not want to uninstall just to skip
building a deterministic build.

Is it ok to check if SOURCE_DATE_EPOCH is set, and if it
is then use

strip-nondeterminism --timestamp $SOURCE_DATE_EPOCH OtpErlang.jar

@sverker
Copy link
Contributor

sverker commented Jan 13, 2022

Maybe it should still be configurable to use it or not though?
I have it installed but might not want to uninstall just to skip
building a deterministic build.

Why would you want to avoid deterministic build?

I don't really see the point with SOURCE_DATE_EPOCH here. Are the file timestamps in the jar file really used for anything? But if you see some use for it, ok.

@KennethL
Copy link
Contributor

KennethL commented Jan 14, 2022 via email

@avtobiff
Copy link
Contributor Author

When i Googled around I saw this https://bugs.openjdk.java.net/browse/JDK-8276667 which seems to be an update of the jar command to respect SOURCE_DATE_EPOCH in order to support reproducible builds of jar archives.

We could wait for that and this becomes a documentation
issue instead.

@avtobiff
Copy link
Contributor Author

It seems like jar will get a new option --date to set timestamp
on archived files. [0] If I understood correctly it will be released
with OpenJDK 17.0.3 in April 2022. [1]

There are more fixes related to [0], e.g. archive file ordering and
zip archive generation which will further help deterministic jar
generation.

I'll reiterate this PR to use jar --date if SOURCE_DATE_EPOCH
environment variable is set, once a released OpenJDK jar
supports it.

[0] https://bugs.openjdk.java.net/browse/JDK-8276766
[1] https://wiki.openjdk.java.net/display/JDKUpdates/JDK+17u

@KennethL KennethL added the stalled waiting for input by the Erlang/OTP team label Feb 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stalled waiting for input by the Erlang/OTP team team:VM Assigned to OTP team VM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants