Skip to content

Expand ARM Architecture Compatibility #5954

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
halibobo1205 opened this issue Aug 15, 2024 · 44 comments
Open

Expand ARM Architecture Compatibility #5954

halibobo1205 opened this issue Aug 15, 2024 · 44 comments

Comments

@halibobo1205
Copy link
Contributor

halibobo1205 commented Aug 15, 2024

Background

Java-Tron currently only supports the x86 architecture. Nevertheless, ARM architecture has gained significant traction recently, especially in cloud computing and mobile devices. ARM processors are known for their energy efficiency and cost-effectiveness, making them increasingly popular in data centers, cloud computing, and edge computing scenarios. It will be great to have an option to run Java-Tron using ARM architecture.

Key developments in ARM architecture:

ARM advantages:

Related Issues and PRs

Scope of Impact

  • Build and deployment processes
  • Core application code
  • Third-party dependencies
  • Development and testing environments
@angrynurd
Copy link

angrynurd commented Aug 15, 2024

I am totally in favor of extending ARM architecture compatibility. This will allow Java-Tron to run on more platforms and take advantage of the benefits of the ARM architecture, such as higher energy efficiency and lower cost.
In my opinion, we can start with the following:

  1. Prioritize ARM support for key dependencies: For example, RocksDB/LevelDB is an important database component in Java-Tron and it is critical to ensure its compatibility on ARM.
  2. Establish an ARM test environment: We need to establish a dedicated ARM test environment to ensure the stability and performance of Java-Tron on ARM.
  3. Collaborate with the community: We can work with the community to solve ARM compatibility issues and share experiences and best practices.

@tomatoishealthy
Copy link
Contributor

It sounds great, but I am a novice in ARM architecture. I am curious about the challenges of supporting ARM architecture.

Can you list something like a task list in the future? It is convenient to clearly understand the current status and future challenges.

@halibobo1205
Copy link
Contributor Author

halibobo1205 commented Aug 15, 2024

Here are some common considerations:

Important

  1. JDK version compatibility
    Ensure the JDK version supports ARM Architecture. Consider using ARM-optimized JDK distributions.
  • Linux got support in JDK 9(non-LTS) by JEP 237
  • Windows got support in JDK 16(non-LTS) by JEP 388
  • Macs got support in JDK 17(LTS) by JEP 391

Important

2. Native code
JNI (Java Native Interface) or other native code.
These native code components need to be recompiled or upgraded for ARM architecture.

  • LevelDBJni
  • RocksDBJni
  • zksnark-java-sdk

Tip

3. Endianness
x86 is little-endian, while some ARM processors may be big-endian.
Check if any operations in the code(such as TVM) depend on a specific endianness, especially when handling binary data.

Tip

4. Memory alignment:
ARM architecture may have different memory alignment requirements than x86.
Check for code(such as TVM) that assumes specific memory alignments.

Tip

5. Atomic operations and concurrency
Some atomic operations(TVM) may be implemented differently on different architectures.
Review concurrent code to ensure it works correctly on ARM as well.

Caution

6. Floating-point arithmetic
ARM and x86 may have subtle differences in floating-point precision and behavior.
For applications that rely on precise floating-point calculations, comprehensive testing is necessary.

Tip

7. Performance optimization
x86-specific performance optimizations(TVM) may no longer be applicable on ARM.
Consider using ARM-specific optimization techniques.

Important

8. Third-party dependencies
Ensure all third-party libraries and dependencies support ARM architecture.
Some incompatible dependencies may need to be updated or replaced.

  • protoc-gen-grpc-java

Important

9. Build and deployment process:

  • Update build scripts to support ARM architecture.
  • Ensure CI/CD pipelines can be built and tested in ARM environments.
  • Docker support

Tip

10. Hardware feature dependencies:
Check if the code(TVM) relies on x86-specific hardware features.
Alternatives may need to be found for ARM.

Tip

11. System calls and OS interactions
If the code makes direct system calls, adjustments may be needed for ARM.

Important

12. Cross-platform testing

  • Establish comprehensive test suites to ensure the functionality works correctly on ARM.
  • Conduct performance benchmarking to compare x86 and ARM performance differences.

@317787106
Copy link
Contributor

317787106 commented Aug 15, 2024

@halibobo1205 Do you want to support ARM Architecture and latest JVM version at the same time ? Or just support ARM Architecture using JDK8 ?

@halibobo1205
Copy link
Contributor Author

halibobo1205 commented Aug 15, 2024

@317787106 JVM officially supports ARM:

  • Linux got support in JDK 9 by JEP 237
  • Windows got support in JDK 16 by JEP 388
  • Macs got support in JDK 17 by JEP 391

According to Oracle Java SE Support Roadmap, JDK9 and JDK16 are non-LTS, and JDK 17 is LTS. Based on the above information, I propose that ARM support JDK17 as a minimum.

Warning

This is the last planned update of JDK 17 under the NFTC. Updates after September 2024 will be licensed under the Java SE OTN License (OTN) and production use beyond the limited free grants of the OTN license will require a fee.

@angrynurd
Copy link

Here are some common considerations:


Regarding JDK version compatibility.
You recommend using an ARM-optimized JDK distribution. What specific ARM-optimized JDK distributions do you recommend? What are their performance and stability advantages?

@abn2357
Copy link

abn2357 commented Aug 15, 2024

When is the expected completion time for this work? It sounds like a big project.

@halibobo1205
Copy link
Contributor Author

@endiaoekoe I propose that ARM support JDK17 as a minimum.

@halibobo1205
Copy link
Contributor Author

@abn2357 Tron currently only supports JDK 8, based on the above information, JDK17 supports ARM fully, perhaps Tron needs to upgrade JDK17 first, which is another big project.

@zeusoo001
Copy link
Contributor

@halibobo1205 It sounds great, and I look forward to your implementation. I see that there may be subtle differences in floating point precision and behavior between ARM and x86. When supporting it, be sure to ensure data consistency. Also investigate whether there are other places that may cause data inconsistency.

@Murphytron
Copy link

This issue has been added to the core devs community call #22, welcome to share the latest progress @halibobo1205, and discuss together with @endiaoekoe @tomatoishealthy @317787106 @zeusoo001 @abn2357.

@halibobo1205
Copy link
Contributor Author

halibobo1205 commented Aug 20, 2024

1. JDK version compatibility
After some brief research, I found ARM 64-bit versions of JDK 8 available. cc @endiaoekoe @317787106 @abn2357

Provider Linux Mac Windows Notes
Oracle • Official support
• Requires payment for commercial use
Eclipse Temurin Free OpenJDK
• Regularly updated and supported by the Adoptium community
Azul Zulu • Free OpenJDK
• Full enterprise version requires payment
BellSoft Liberica • Free OpenJDK for all users
• Relatively less well-known
Amazon Corretto • Free OpenJDK
• long-term support by Amazon
• Amazon runs Corretto internally on thousands of production services

@halibobo1205
Copy link
Contributor Author

halibobo1205 commented Aug 20, 2024

6. Floating-point arithmetic known issues:

Unfortunately, Tron does use Math.pow() for floating-point calculations for the Bancor trading pair in ExchangeProcessor:

private long exchangeToSupply(long balance, long quant) {
    logger.debug("balance: " + balance);
    long newBalance = balance + quant;
    logger.debug("balance + quant: " + newBalance);

    double issuedSupply = -supply * (1.0 - Math.pow(1.0 + (double) quant / newBalance, 0.0005));
    logger.debug("issuedSupply: " + issuedSupply);
    long out = (long) issuedSupply;
    supply += out;

    return out;
  }

  private long exchangeFromSupply(long balance, long supplyQuant) {
    supply -= supplyQuant;

    double exchangeBalance =
        balance * (Math.pow(1.0 + (double) supplyQuant / supply, 2000.0) - 1.0);
    logger.debug("exchangeBalance: " + exchangeBalance);

    return (long) exchangeBalance;
  }

Test case

 @Test
  public void testPow() {
    double x = 29218;
    double q = 4761432;
    double ret = Math.pow(1.0 + x / q, 0.0005);
    double ret2 = StrictMath.pow(1.0 + x / q, 0.0005);

    System.out.printf("%s%n", doubleToHex(ret)); //  3ff000033518c576
    System.out.printf("%s%n", doubleToHex(ret2)); // 3ff000033518c575
    Assert.assertEquals(0, Double.compare(ret, ret2)); // fail in jdk8_X86, success in jdk8_ARM64
  }

  public static String doubleToHex(double input) {
    // Convert the starting value to the equivalent value in a long
    long doubleAsLong = Double.doubleToRawLongBits(input);
    // and then convert the long to a hex string
    return Long.toHexString(doubleAsLong);
  }

Tron Should Use StrictMath to Avoid Cross-Platform Consistency Issues. To help ensure the portability on ARM for Java-Tron, I suggest a new proposal to convert Math to StrictMath. cc @zeusoo001

@317787106
Copy link
Contributor

@halibobo1205 First support JDK8 on mac ARM and then extend to support JDK17 on linux and mac ARM may be smooth.

@tomatoishealthy
Copy link
Contributor

JDK version compatibility After some brief research, I found ARM 64-bit versions of JDK 8 available. cc @endiaoekoe @317787106 @abn2357

Does this mean that there is no longer a dependency between ARM architecture upgrade and JDK upgrade?

In addition, TRON only focuses on Oracle JDK, right?

@halibobo1205
Copy link
Contributor Author

@halibobo1205
Copy link
Contributor Author

halibobo1205 commented Aug 21, 2024

2. Native code

JNI (Java Native Interface) or other native code.
These native code components(JNI) must be recompiled or upgraded for ARM64 architecture and may include, but not limited to, the following.

  • LevelDBJni fusesource has stopped maintaining this project, no updates since Oct 17, 2013, forked by @halibobor, thanks to @folowing

  • RocksDBJni 6.29.4.1+ RocksJava support for ARM64 architecture

    • current version is 5.15.10, It is recommended to upgrade to 7.7.3+, which has been verified by synced from 0.

    • 6.29.4.1+ is incompatible with leveldb: org.rocksdb.RocksDBException: bad block contents in file output-directory/database/asset-issue-v2/MANIFEST-000428, 5.15.10 is compatible with leveldb.

      • levelDB -> open it without doing anything or convert levelDB to rocksDB by Toolkit.jar db convert without safe mode -> ✅ rocksDB-5.15.10 -> ❌ rocksDB-6.29.4.1+ won't work
      • levelDB -> ✅ convert levelDB to rocksDB by DBConvert.jar or Toolkit.jar db convert --safe with safe mode -> ✅ rocksDB-5.15.10 -> ✅ rocksDB-6.29.4.1+ is ok
      • rocksDB(rocksDB-5.15.10) -> ✅ rocksDB-6.29.4.1+ is ok
    • TODO:

      • 1. Disable RocksDB to open LevelDB directly
      • 2. Toolkit.jar db convert should force safe mode.
      • 3. Provide rocksDB rewrite tool to fix rocksDB directly open LevelDB without doing anything or convert levelDB to rocksDB byToolkit.jar db convert without safe mode scenario.
  • zksnark-java-sdk is upgraded for ARM64 architecture since GreatVoyage-v4.7.0.1

@halibobo1205
Copy link
Contributor Author

8. Third-party dependencies

Ensure all third-party libraries and dependencies support ARM architecture.
Some incompatible dependencies may need to be updated or replaced, including, but not limited to, the following.

@halibobo1205
Copy link
Contributor Author

Warning

This is the last planned update of JDK 17 under the NFTC. Updates after September 2024 will be licensed under the Java SE OTN License (OTN) and production use beyond the limited free grants of the OTN license will require a fee.

To avoid subsequent charges for commercial use, I recommend switching to OpenJDK.

@halibobo1205
Copy link
Contributor Author

Caution

Strong data consistency and finality
Final data consistency is required for blockchain, and it's usually guaranteed by the world state. Unfortunately, Java-Tron doesn't have a world state.
We need to think about how to ensure final data consistency.

@halibobo1205
Copy link
Contributor Author

1. JDK version compatibility
Maybe try to support OpenJDK on ARM?

@halibobo1205
Copy link
Contributor Author

A hard fork solution will be introduced in 4.8.0, switching floating-point calculations from Math to StrictMath.

@halibobo1205
Copy link
Contributor Author

Currently, java-tron supports both LevelDB and RocksDB. On the ARM architecture, we intend to support only RocksDB, mainly due to the following considerations:

  1. Performance Advantages
    RocksDB, built on top of LevelDB, offers enhanced performance, reliability, and advanced features such as multi-threaded execution, compaction optimizations, and support for larger datasets, making it more suitable for high-throughput and low-latency use cases.

  2. Community Support

  • RocksDB has continuous investment and maintenance from Meta (Facebook)
  • Official support for RocksDB Java API
  • RocksDB community is more active in supporting ARM architecture
  • In comparison, LevelDB's community maintenance is relatively less active
  1. Feature Completeness
  • RocksDB offers richer features (e.g., column families, transaction support, TtlDB)
  • Built-in monitoring and performance diagnostic tools are more comprehensive
  • Provides more flexible configuration options for optimization on ARM architecture
  1. Future Development Trends
  • RocksDB is more widely used in blockchain domain
  • Continuously receives performance optimizations and feature updates
  • More timely support for new hardware features
  1. Ecosystem Integration
  • Better support for RocksDB in cloud-native environments
  • Better integration with modern monitoring tools
  • More mature support for containerized deployment
  1. Hardware Adaptation
  • Better optimization for new storage devices (e.g., NVMe SSD) in RocksDB
  • Better utilization of ARM architecture's specific instruction sets
  • Better support for large memory systems

This will ensure the best database usage experience on ARM architecture.

@halibobo1205
Copy link
Contributor Author

@halibobo1205
Copy link
Contributor Author

halibobo1205 commented Nov 21, 2024

Important

On the ARM architecture, we intend to support only RocksDB

When using RocksDB on the CI test, we found that some tests failed due to differences in behavior between LevelDB and RocksDB.

  • JVM core dump
image
  • RocksDB does not throw an error when it is closed
    • putData
    • deleteData
    • getData

@Murphytron
Copy link

This issue has been added to the tronprotocol/pm#105, welcome to share the latest progress @halibobo1205 , and discuss together with @endiaoekoe @317787106 @tomatoishealthy @zeusoo001 @abn2357 .

@sYNCHROtime
Copy link

When can Tron node run on ARM architecture,do you have an approximate time?

@halibobo1205
Copy link
Contributor Author

@sYNCHROtime, It's expected that java-tron would be able to run on ARM platform in Q2 2025 if the corresponding dependency updation and code adaption goes smoothly.

@dkatzan
Copy link

dkatzan commented Feb 18, 2025

Hi @halibobo1205 , my team is building an MPC enterprise wallet,
and are highly anticipating this feature in order to be able to run private Tron chains on arm, allowing us to run tests

are there any updates on this? are we still looking at Q2?
Thx

@halibobo1205
Copy link
Contributor Author

@dkatzan Yes, most feature adaptations have been completed and are expected to be online in Q2.

@dkatzan
Copy link

dkatzan commented Feb 19, 2025

@halibobo1205 Thx, that's great news
as a temporary mitigation, I'm trying to actually use the existing image, which seems to be able to run on my M1 mac, but am facing some difficulties, which might be a result of compatibility \ running with the qemu?

I posted my full issue here, any help would be greatly appreciated

@halibobo1205
Copy link
Contributor Author

@dkatzan You can try installing the x86_64 JDK on the m1, which will then use the Rosetta emulation directly.

@halibobo1205
Copy link
Contributor Author

halibobo1205 commented Apr 15, 2025

Trace Floating-point arithmetic :

Tron uses Math.pow() for floating-point calculations for the Bancor trading pair in ExchangeProcessor:

@halibobo1205
Copy link
Contributor Author

Trace Floating-point arithmetic :
For historical data(the Bancor trading pair), there are currently 2 solutions

  • Hardcoded Special Cases

  • Emulating x86 Math.pow

I'll go over the first option in detail, and then the pros and cons of the 2 options

Detailed Description of Hardcoded Special Cases:

Option 1 uses partial hardcoding to handle special cases, with the following implementation steps:

  1. Difference Identification: Through transaction replay on the X86 platform, comprehensively identify the set of input values where Math.pow and StrictMath.pow calculations produce inconsistent results on the X86 platform.

  2. Mapping Table Creation: Establish a mapping table for the identified special input values and their corresponding Math.pow calculation results from the x86 platform, which can be stored using data structures like hash tables.

  3. Conditional Processing Logic:

    • Before proposal takes effect: ARM platform defaults to using StrictMath.pow for calculations
    • For each input value, first check if it exists in the special case mapping table
    • If it exists, directly return the pre-calculated result from the mapping table (x86 platform's Math.pow result)
    • If not, use the StrictMath.pow calculation result
  4. Logging: Log all special cases handled using the mapping table for subsequent analysis and verification.

comparison the two options:

Aspect Hardcoded Special Cases Emulating x86 Math.pow
Implementation Complexity Low to Medium, only requires identifying differences and creating mappings Very High, requires a deep understanding of floating-point calculation implementation
Code Universality Lower, customized solution for specific problems High, applicable to all scenarios
Maintenance Cost Low, no need for continuous updates as the new proposal is in effect and historical data is fixed Low, minimal maintenance after initial implementation
Performance Impact Small, only slight overhead during table lookups Potentially significant, custom floating-point calculations typically slower than native ones
Implementation Risk Low, simple logic that's easy to test and verify High, may introduce new precision or performance issues
Scalability Medium, can be extended by updating the mapping table High, solves all cases at once
Cross-platform Compatibility High, only requires the mapping table to be consistent across platforms Uncertain, depends on implementation quality

@halibobo1205
Copy link
Contributor Author

halibobo1205 commented Apr 18, 2025

For historical data(the Bancor trading pair), until the new proposal 101 took effect, 48 POW calculation instances caused final block state inconsistencies for Main-Net.

Simulating the x86(x87 ) pow instruction requires the following work:

  1. Emulating 80-bit extended precision
  2. Emulating fyl2x and f2xm1 instructions

JDK8 pow JVM code: for more details: https://github.com/openjdk/jdk/blob/jdk8-b120/hotspot/src/cpu/x86/vm/macroAssembler_x86.cpp#L2269

void MacroAssembler::increase_precision() {
  subptr(rsp, BytesPerWord);
  fnstcw(Address(rsp, 0));
  movl(rax, Address(rsp, 0));
  orl(rax, 0x300);
  push(rax);
  fldcw(Address(rsp, 0));
  pop(rax);
}

void MacroAssembler::restore_precision() {
  fldcw(Address(rsp, 0));
  addptr(rsp, BytesPerWord);
}

void MacroAssembler::pow_exp_core_encoding() {
  // kills rax, rcx, rdx
  subptr(rsp,sizeof(jdouble));
  // computes 2^X. Stack: X ...
  // f2xm1 computes 2^X-1 but only operates on -1<=X<=1. Get int(X) and
  // keep it on the thread's stack to compute 2^int(X) later
  // then compute 2^(X-int(X)) as (2^(X-int(X)-1+1)
  // final result is obtained with: 2^X = 2^int(X) * 2^(X-int(X))
  fld_s(0);                 // Stack: X X ...
  frndint();                // Stack: int(X) X ...
  fsuba(1);                 // Stack: int(X) X-int(X) ...
  fistp_s(Address(rsp,0));  // move int(X) as integer to thread's stack. Stack: X-int(X) ...
  f2xm1();                  // Stack: 2^(X-int(X))-1 ...
  fld1();                   // Stack: 1 2^(X-int(X))-1 ...
  faddp(1);                 // Stack: 2^(X-int(X))
  // computes 2^(int(X)): add exponent bias (1023) to int(X), then
  // shift int(X)+1023 to exponent position.
  // Exponent is limited to 11 bits if int(X)+1023 does not fit in 11
  // bits, set result to NaN. 0x000 and 0x7FF are reserved exponent
  // values so detect them and set result to NaN.
  movl(rax,Address(rsp,0));
  movl(rcx, -2048); // 11 bit mask and valid NaN binary encoding
  addl(rax, 1023);
  movl(rdx,rax);
  shll(rax,20);
  // Check that 0 < int(X)+1023 < 2047. Otherwise set rax to NaN.
  addl(rdx,1);
  // Check that 1 < int(X)+1023+1 < 2048
  // in 3 steps:
  // 1- (int(X)+1023+1)&-2048 == 0 => 0 <= int(X)+1023+1 < 2048
  // 2- (int(X)+1023+1)&-2048 != 0
  // 3- (int(X)+1023+1)&-2048 != 1
  // Do 2- first because addl just updated the flags.
  cmov32(Assembler::equal,rax,rcx);
  cmpl(rdx,1);
  cmov32(Assembler::equal,rax,rcx);
  testl(rdx,rcx);
  cmov32(Assembler::notEqual,rax,rcx);
  movl(Address(rsp,4),rax);
  movl(Address(rsp,0),0);
  fmul_d(Address(rsp,0));   // Stack: 2^X ...
  addptr(rsp,sizeof(jdouble));
}


void MacroAssembler::fast_pow() {
  // computes X^Y = 2^(Y * log2(X))
  // if fast computation is not possible, result is NaN. Requires
  // fallback from user of this macro.
  // increase precision for intermediate steps of the computation
  increase_precision();
  fyl2x();                 // Stack: (Y*log2(X)) ...
  pow_exp_core_encoding(); // Stack: exp(X) ...
  restore_precision();
}

@NewOF
Copy link

NewOF commented Apr 18, 2025

Principle of X87 Instruction Simulation

  • Through relevant information and source code, it is known that the key to calculating $a ^ b$ in pow lies in two instructions, fyl2x and f2xm1. fyl2x is used to calculate $b * \log_2^a$, while f2xm1 is used to calculate $2^a -1$

  • Process:

    • Assume $y = a^b$,take the logarithm on both sides of the equation $\log_2^y = b * \log_2^a$
    • That is $\Large y = 2^{b * \log_2^a}$
    • By combining these two instructions, we can calculate $a^b$
  • At present, the implementation details of the fyl2x and f2xm1 instructions are still unknown. We will attempt to simulate the calculation through Taylor expansion

  • $\log_2^x$

    • $\log_2^x$ unfolds as $\frac{1}{\ln2}\sum_{n=1}^{\infty}\frac{(-1)^n}{n}(x-1)^n$
  • $2^x - 1$

    • $2^x - 1$ unfolds as $\sum_{n=1}^{\infty}\frac{(x\ln2)^n}{n!}$

Simulation implementation(c++)

  • Using SoftFloat library, supporting 80 bit extended dual precision(http://www.jhauser.us/arithmetic/SoftFloat.html)
  • Note 1: Float80 class is implemented based on SoftFloat library encapsulation
  • Note 2: Due to the special nature of the 48 mismatched data (with a power of 0.0005), the calculation process has been partially simplified
  Float80 ln_coe[] = {
  	Float80(0x3FFF, 0x8000000000000000), // 1.0
  	Float80(0x3FFD, 0xFFFFFFFFFFFFFFFF), // 0.5
  	Float80(0x3FFD, 0xAAAAAAAAAAAAAAAA), // 0.333...
  	Float80(0x3FFC, 0xFFFFFFFFFFFFFFFF), // 0.25
  	Float80(0x3FFC, 0xCCCCCCCCCCCCCCCC), // 0.2
  	Float80(0x3FFC, 0xAAAAAAAAAAAAAAAA), // 0.166...
  };

  Float80 taylor_ln2_float80(Float80 x) {
  	return x -
           x * x * ln_coe[1] +
           x * x * x * ln_coe[2] -
           x * x * x * x * ln_coe[3] +
           x * x * x * x * x * ln_coe[4] -
           x * x * x * x * x * x * ln_coe[5];
  }

  // ln(2)^n/n!
  Float80 exp_coe[] = {
  	Float80(0.69314718055994530942),
  	Float80(0.24022650695910071233),
  	Float80(0.05550410866482157995),
  	Float80(0.00961812910762847716),
  	Float80(0.00133335581464284434),
  };

  Float80 taylor_exp2_float80(Float80 x) {
  	return x * exp_coe[0] +
           x * x * exp_coe[1] +
           x * x * x * exp_coe[2] +
           x * x * x * x * exp_coe[3] +
           x * x * x * x * x * exp_coe[4];
  }

  double taylor_pow2_float80(double x, double y) {
  	Float80 x80(x), y80(y);
  	Float80 ln2(0x3FFE, 0xB17217F7D1CF7BBB);

  	Float80 y_lg2_x = y80 * taylor_ln2_float80(x80 - Float80(1)) / ln2;
    // For the test dataset, since y_lg2_x<1, this step omits the exponentiation of the integer part
  	Float80 exp_y_lg2_x = taylor_exp2_float80(y_lg2_x) + Float80(1);

  	return exp_y_lg2_x.to_double();
  }
  • Test data(base, exp, expected)
  Data("3ff0192278704be3", 0.0005, "3ff000033518c576"); //  4137160
  Data("3ff000002fc6a33f", 0.0005, "3ff0000000061d86"); //  4065476
  Data("3ff00314b1e73ecf", 0.0005, "3ff0000064ea3ef8"); //  4071538
  Data("3ff0068cd52978ae", 0.0005, "3ff00000d676966c"); //  4109544
  Data("3ff0032fda05447d", 0.0005, "3ff0000068636fe0"); //  4123826
  Data("3ff00051c09cc796", 0.0005, "3ff000000a76c20e"); //  4166806
  Data("3ff00bef8115b65d", 0.0005, "3ff0000186893de0"); //  4225778
  Data("3ff009b0b2616930", 0.0005, "3ff000013d27849e"); //  4251796
  Data("3ff00364ba163146", 0.0005, "3ff000006f26a9dc"); //  4257157
  Data("3ff019be4095d6ae", 0.0005, "3ff0000348e9f02a"); //  4260583
  Data("3ff0123e52985644", 0.0005, "3ff0000254797fd0"); //  4367125
  Data("3ff0126d052860e2", 0.0005, "3ff000025a6cde26"); //  4402197
  Data("3ff0001632cccf1b", 0.0005, "3ff0000002d76406"); //  4405788
  Data("3ff0000965922b01", 0.0005, "3ff000000133e966"); //  4490332
  Data("3ff00005c7692d61", 0.0005, "3ff0000000bd5d34"); //  4499056
  Data("3ff015cba20ec276", 0.0005, "3ff00002c84cef0e"); //  4518035
  Data("3ff00002f453d343", 0.0005, "3ff000000060cf4e"); //  4533215
  Data("3ff006ea73f88946", 0.0005, "3ff00000e26d4ea2"); //  4647814
  Data("3ff00a3632db72be", 0.0005, "3ff000014e3382a6"); //  4766695
  Data("3ff000c0e8df0274", 0.0005, "3ff0000018b0aeb2"); //  4771494
  Data("3ff00015c8f06afe", 0.0005, "3ff0000002c9d73e"); //  4793587
  Data("3ff00068def18101", 0.0005, "3ff000000d6c3cac"); //  4801947
  Data("3ff01349f3ac164b", 0.0005, "3ff000027693328a"); //  4916843
  Data("3ff00e86a7859088", 0.0005, "3ff00001db256a52"); //  4924111
  Data("3ff00000c2a51ab7", 0.0005, "3ff000000018ea20"); //  5098864
  Data("3ff020fb74e9f170", 0.0005, "3ff00004346fbfa2"); //  5133963
  Data("3ff00001ce277ce7", 0.0005, "3ff00000003b27dc"); //  5139389
  Data("3ff005468a327822", 0.0005, "3ff00000acc20750"); //  5151258
  Data("3ff00006666f30ff", 0.0005, "3ff0000000d1b80e"); //  5185021
  Data("3ff000045a0b2035", 0.0005, "3ff00000008e98e6"); //  5295829
  Data("3ff00e00380e10d7", 0.0005, "3ff00001c9ff83c8"); //  5380897
  Data("3ff00c15de2b0d5e", 0.0005, "3ff000018b6eaab6"); //  5400886
  Data("3ff00042afe6956a", 0.0005, "3ff0000008892244"); //  5864127
  Data("3ff0005b7357c2d4", 0.0005, "3ff000000bb48572"); //  6167339
  Data("3ff00033d5ab51c8", 0.0005, "3ff0000006a279c8"); //  6240974
  Data("3ff0000046d74585", 0.0005, "3ff0000000091150"); //  6279093
  Data("3ff0010403f34767", 0.0005, "3ff0000021472146"); //  6428736
  Data("3ff00496fe59bc98", 0.0005, "3ff000009650a4ca"); //  6432355,6493373
  Data("3ff0012e43815868", 0.0005, "3ff0000026af266e"); //  6555029
  Data("3ff00021f6080e3c", 0.0005, "3ff000000458d16a"); //  7092933
  Data("3ff000489c0f28bd", 0.0005, "3ff00000094b3072"); //  7112412
  Data("3ff00009d3df2e9c", 0.0005, "3ff00000014207b4"); //  7675535
  Data("3ff000def05fa9c8", 0.0005, "3ff000001c887cdc"); //  7860324
  Data("3ff0013bca543227", 0.0005, "3ff00000286a42d2"); //  8292427
  Data("3ff0021a2f14a0ee", 0.0005, "3ff0000044deb040"); //  8517311
  Data("3ff0002cc166be3c", 0.0005, "3ff0000005ba841e"); //  8763101
  Data("3ff0000cc84e613f", 0.0005, "3ff0000001a2da46"); //  9269124
  Data("3ff000057b83c83f", 0.0005, "3ff0000000b3a640"); //  9631452
  • Comparison of Results
exp:0.0005 base:1.00628495434413 3ff019be4095d6ae
expected: 1.0000031326481 3ff0000348e9f02a
result:   1.0000031326481 3ff0000348e9f029

exp:0.0005 base:1.00805230779141 3ff020fb74e9f170
expected: 1.00000401003852 3ff00004346fbfa2
result:   1.00000401003852 3ff00004346fbfa1
  • At the beginning of the test, the iterative calculation used directly had poor results. Later, by directly calculating the polynomial expansion and using pre calculated coefficients, the effect was significantly improved. (However, there are still two pieces of data that do not fully match expectations)

@halibobo1205
Copy link
Contributor Author

diff data(48 POW calculation instances) is the result of Math and StrictMath, and if algorithmic simulations are performed, the implementation needs to be fully tested for full equivalence with the Math library.

@NewOF
Copy link

NewOF commented Apr 21, 2025

  • Using simulated pow to calculate 48 mismatched floating-point data, the best result currently is 2 mismatches.
  • By comparing historical data (block height of 11 million 673496 pieces of data, including contextual numerical calculations, and comparing the final exchange balance), the simulated implementation of pow has 58072 mismatches compared to Math.pow. At the same time, using the pow of the C++standard library directly for calculation resulted in 48 mismatches. Compared to its implementation, it is consistent with StrictMath.pow (both with 48 mismatches).
  • By analyzing the relevant resources and source code currently found, it is speculated that the logarithmic and power operations implemented within the x87 instruction should also be polynomial expansions, while utilizing preprocessed coefficients for acceleration. At present, due to the lack of further implementation details and the instability of floating-point operations themselves, it is difficult to achieve accurate matching. In contrast, using hard coding is more feasible.

@halibobo1205
Copy link
Contributor Author

In the Java HotSpot virtual machine, do_intrinsic is an important concept related to intrinsic functions.

Intrinsics in the HotSpot JVM are special, optimized implementations of commonly used Java methods. When the JVM identifies specific method calls, it may replace the standard Java implementation of these methods with more efficient native code. Math.pow() is one such method that is commonly intrinsically optimized.

Specifically for the Math.pow() method, the HotSpot JVM handles it in the following ways:

  1. The JVM has a function called do_intrinsic that determines whether a method can be replaced with an intrinsic implementation, and how to perform the replacement.

  2. For Math.pow(), when the JIT (Just-In-Time compiler) compiles code containing this method call, the do_intrinsic mechanism examines the call and may replace it with native instructions or optimized algorithms corresponding to the processor architecture.

  3. Typically, the intrinsic implementation of Math.pow() directly utilizes instructions from the CPU's floating-point unit, such as FSIN, FCOS, FPTAN in x86 architecture, or calls to underlying math libraries (like Intel's MKL), thereby avoiding the slower pure Java implementation.

This optimization mechanism is usually managed by the vmIntrinsics namespace in the HotSpot source code, with relevant implementations distributed across files such as src/hotspot/share/classfile/vmSymbols.hpp, src/hotspot/share/opto/library_call.cpp, and others.

do_intrinsic(_dlog, java_lang_Math, log_name, double_double_signature, F_S)

Through this intrinsic function optimization, the HotSpot JVM allows Java programs to maintain platform independence while achieving performance close to native code, which is particularly beneficial for math-computation-intensive applications.

Here's the logic for log, and pow is similar:
Image

@317787106
Copy link
Contributor

@halibobo1205 When Math calculates pow(double,double), how can you determine if the result is inconsistent with that calculated by StrictMath? What to do when inconsistencies are found? And you can specify what's hardcoded.

@halibobo1205
Copy link
Contributor Author

@317787106

  1. Calculate the bancor transaction based on Math.pow and StrictMath.pow, respectively, in an x86 JDK8 environment, and record if the final buyTokenQuant is inconsistent
public class ExchangeCapsule implements ProtoCapsule<Exchange> {

  public long transaction(byte[] sellTokenID, long sellTokenQuant, boolean useStrictMath) {
    long supply = 1_000_000_000_000_000_000L;
    ExchangeProcessor processor = new ExchangeProcessor(supply, useStrictMath);
    ExchangeProcessor strictProcessor = new ExchangeProcessor(supply, true);

    long buyTokenQuant = 0;
    long strictBuyTokenQuant = 0;
    long firstTokenBalance = this.exchange.getFirstTokenBalance();
    long secondTokenBalance = this.exchange.getSecondTokenBalance();

    if (this.exchange.getFirstTokenId().equals(ByteString.copyFrom(sellTokenID))) {
      buyTokenQuant = processor.exchange(firstTokenBalance,
          secondTokenBalance,
          sellTokenQuant);
      strictBuyTokenQuant = strictProcessor.exchange(firstTokenBalance,
          secondTokenBalance,
          sellTokenQuant);
      if (!useStrictMath && buyTokenQuant != strictBuyTokenQuant) {
        logAndRecord("{}\t{}\t{}\t{}\t{}", buyTokenQuant, strictBuyTokenQuant, firstTokenBalance, secondTokenBalance, sellTokenQuant); // logAndRecord pow data
      }
      this.exchange = this.exchange.toBuilder()
          .setFirstTokenBalance(firstTokenBalance + sellTokenQuant)
          .setSecondTokenBalance(secondTokenBalance - buyTokenQuant)
          .build();
    } else {
      buyTokenQuant = processor.exchange(secondTokenBalance,
          firstTokenBalance,
          sellTokenQuant);
      strictBuyTokenQuant = strictProcessor.exchange(secondTokenBalance,
          firstTokenBalance,
          sellTokenQuant);
      if (!useStrictMath && buyTokenQuant != strictBuyTokenQuant) {
        logAndRecord("{}\t{}\t{}\t{}\t{}", buyTokenQuant, strictBuyTokenQuant,secondTokenBalance, firstTokenBalance, sellTokenQuant); // logAndRecord pow data
      }
      this.exchange = this.exchange.toBuilder()
          .setFirstTokenBalance(firstTokenBalance - buyTokenQuant)
          .setSecondTokenBalance(secondTokenBalance + sellTokenQuant)
          .build();
    }
    
    return buyTokenQuant;
  }
  1. Based on the data collected in step 1, calculate the pow data to be hardcoded: issuedSupply and exchangeBalance
     
public class ExchangeProcessor {

  private long supply;
  private final boolean useStrictMath;

  public ExchangeProcessor(long supply, boolean useStrictMath) {
    this.supply = supply;
    this.useStrictMath = useStrictMath;
  }

  private long exchangeToSupply(long balance, long quant) {
    long newBalance = balance + quant;
    double issuedSupply = -supply * (1.0 - Maths.pow(1.0 + (double) quant / newBalance, 0.0005, this.useStrictMath));
    long out = (long) issuedSupply;
    supply += out;
    return out;
  }

  private long exchangeFromSupply(long balance, long supplyQuant) {
    supply -= supplyQuant;
    double exchangeBalance = balance * (Maths.pow(1.0 + (double) supplyQuant / supply, 2000.0, this.useStrictMath) - 1.0);
    return (long) exchangeBalance;
  }

  public long exchange(long sellTokenBalance, long buyTokenBalance, long sellTokenQuant) {
    long relay = exchangeToSupply(sellTokenBalance, sellTokenQuant);
    return exchangeFromSupply(buyTokenBalance, relay);
  }

}
  1. Adjust StrictMathWrapper
   private static final Map<Double, Double> powData = Collections.synchronizedMap(new HashMap<>());

  public static double pow(double a, double b) {
    double strictResult = StrictMath.pow(a, b);
    return powData.getOrDefault(a, strictResult);
  }
}

@halibobo1205
Copy link
Contributor Author

If there are other ways to implement X87 Instruction Simulation, please discuss them.
Hard-coding for pow data is currently the better solution based on performance, implementation complexity, and verification difficulty.

@317787106
Copy link
Contributor

@halibobo1205 I noticed that the pow results in exchangeToSupply and exchangeFromSupply are converted to long. Could this precision loss impact the handling of hardcoded special cases?

@halibobo1205
Copy link
Contributor Author

@halibobo1205 I noticed that the pow results in exchangeToSupply and exchangeFromSupply are converted to long. Could this precision loss impact the handling of hardcoded special cases?

Yes, precision loss precisely reduces the amount of the pow data that needs to be hard-coded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In progress
Development

No branches or pull requests

10 participants