Expand ARM Architecture Compatibility #5954

halibobo1205 · 2024-08-15T03:31:39Z

Background

Java-Tron currently only supports the x86 architecture. Nevertheless, ARM architecture has gained significant traction recently, especially in cloud computing and mobile devices. ARM processors are known for their energy efficiency and cost-effectiveness, making them increasingly popular in data centers, cloud computing, and edge computing scenarios. It will be great to have an option to run Java-Tron using ARM architecture.

Key developments in ARM architecture:

Apple's transition from Intel to its ARM-based processors for Macs, dubbed Apple silicon
AWS Graviton, Microsoft Azure Ampere Altra, Alibaba Cloud Yitian 710 and Google Axion processors used in cloud computing
Increasing adoption of ARM in supercomputers and high-performance computing

ARM advantages:

Better performance per watt, low energy consumption, and low cost
- 23% Cost savings and 36% performance gain by deploying GitLab on Arm-based AWS Graviton2
Lower total cost of ownership for data centers
Growing ecosystem and software support

Related Issues and PRs

Scope of Impact

Build and deployment processes
Core application code
Third-party dependencies
Development and testing environments

angrynurd · 2024-08-15T03:41:24Z

I am totally in favor of extending ARM architecture compatibility. This will allow Java-Tron to run on more platforms and take advantage of the benefits of the ARM architecture, such as higher energy efficiency and lower cost.
In my opinion, we can start with the following:

Prioritize ARM support for key dependencies: For example, RocksDB/LevelDB is an important database component in Java-Tron and it is critical to ensure its compatibility on ARM.
Establish an ARM test environment: We need to establish a dedicated ARM test environment to ensure the stability and performance of Java-Tron on ARM.
Collaborate with the community: We can work with the community to solve ARM compatibility issues and share experiences and best practices.

tomatoishealthy · 2024-08-15T03:51:51Z

It sounds great, but I am a novice in ARM architecture. I am curious about the challenges of supporting ARM architecture.

Can you list something like a task list in the future? It is convenient to clearly understand the current status and future challenges.

halibobo1205 · 2024-08-15T04:16:21Z

Here are some common considerations:

Important

JDK version compatibility
Ensure the JDK version supports ARM Architecture. Consider using ARM-optimized JDK distributions.

Linux got support in JDK 9(non-LTS) by JEP 237
Windows got support in JDK 16(non-LTS) by JEP 388
Macs got support in JDK 17(LTS) by JEP 391

Important

2. Native code
JNI (Java Native Interface) or other native code.
These native code components need to be recompiled or upgraded for ARM architecture.

LevelDBJni
RocksDBJni
zksnark-java-sdk

Tip

3. Endianness
x86 is little-endian, while some ARM processors may be big-endian.
Check if any operations in the code(such as TVM) depend on a specific endianness, especially when handling binary data.

Tip

4. Memory alignment:
ARM architecture may have different memory alignment requirements than x86.
Check for code(such as TVM) that assumes specific memory alignments.

Tip

5. Atomic operations and concurrency
Some atomic operations(TVM) may be implemented differently on different architectures.
Review concurrent code to ensure it works correctly on ARM as well.

Caution

6. Floating-point arithmetic
ARM and x86 may have subtle differences in floating-point precision and behavior.
For applications that rely on precise floating-point calculations, comprehensive testing is necessary.

Tip

7. Performance optimization
x86-specific performance optimizations(TVM) may no longer be applicable on ARM.
Consider using ARM-specific optimization techniques.

Important

8. Third-party dependencies
Ensure all third-party libraries and dependencies support ARM architecture.
Some incompatible dependencies may need to be updated or replaced.

protoc-gen-grpc-java

Important

9. Build and deployment process:

Update build scripts to support ARM architecture.
Ensure CI/CD pipelines can be built and tested in ARM environments.
Docker support

Tip

10. Hardware feature dependencies:
Check if the code(TVM) relies on x86-specific hardware features.
Alternatives may need to be found for ARM.

Tip

11. System calls and OS interactions
If the code makes direct system calls, adjustments may be needed for ARM.

Important

12. Cross-platform testing

Establish comprehensive test suites to ensure the functionality works correctly on ARM.
Conduct performance benchmarking to compare x86 and ARM performance differences.

317787106 · 2024-08-15T05:38:29Z

@halibobo1205 Do you want to support ARM Architecture and latest JVM version at the same time ? Or just support ARM Architecture using JDK8 ?

halibobo1205 · 2024-08-15T07:42:16Z

@317787106 JVM officially supports ARM:

Linux got support in JDK 9 by JEP 237
Windows got support in JDK 16 by JEP 388
Macs got support in JDK 17 by JEP 391

According to Oracle Java SE Support Roadmap, JDK9 and JDK16 are non-LTS, and JDK 17 is LTS. Based on the above information, I propose that ARM support JDK17 as a minimum.

Warning

This is the last planned update of JDK 17 under the NFTC. Updates after September 2024 will be licensed under the Java SE OTN License (OTN) and production use beyond the limited free grants of the OTN license will require a fee.

angrynurd · 2024-08-15T07:58:14Z

Here are some common considerations:

Regarding JDK version compatibility.
You recommend using an ARM-optimized JDK distribution. What specific ARM-optimized JDK distributions do you recommend? What are their performance and stability advantages?

abn2357 · 2024-08-15T10:03:28Z

When is the expected completion time for this work? It sounds like a big project.

halibobo1205 · 2024-08-15T10:40:35Z

@endiaoekoe I propose that ARM support JDK17 as a minimum.

halibobo1205 · 2024-08-15T10:51:21Z

@abn2357 Tron currently only supports JDK 8, based on the above information, JDK17 supports ARM fully, perhaps Tron needs to upgrade JDK17 first, which is another big project.

zeusoo001 · 2024-08-20T06:21:27Z

@halibobo1205 It sounds great, and I look forward to your implementation. I see that there may be subtle differences in floating point precision and behavior between ARM and x86. When supporting it, be sure to ensure data consistency. Also investigate whether there are other places that may cause data inconsistency.

Murphytron · 2024-08-20T06:45:09Z

This issue has been added to the core devs community call #22, welcome to share the latest progress @halibobo1205, and discuss together with @endiaoekoe @tomatoishealthy @317787106 @zeusoo001 @abn2357.

halibobo1205 · 2024-08-20T09:19:41Z

1. JDK version compatibility
After some brief research, I found ARM 64-bit versions of JDK 8 available. cc @endiaoekoe @317787106 @abn2357

Provider	Linux	Mac	Windows	Notes
Oracle	✓	✓	✗	• Official support • Requires payment for commercial use
Eclipse Temurin	✓	✗	✗	• Free OpenJDK • Regularly updated and supported by the Adoptium community
Azul Zulu	✓	✓	✗	• Free OpenJDK • Full enterprise version requires payment
BellSoft Liberica	✓	✓	✗	• Free OpenJDK for all users • Relatively less well-known
Amazon Corretto	✓	✓	✗	• Free OpenJDK • long-term support by Amazon • Amazon runs Corretto internally on thousands of production services

halibobo1205 · 2024-08-20T10:33:44Z

6. Floating-point arithmetic known issues:

x87 FPU for some Java Math
- Don't use x87 FPU on x86-64
- Misuse of Math library without strictfp

Unfortunately, Tron does use Math.pow() for floating-point calculations for the Bancor trading pair in ExchangeProcessor:

private long exchangeToSupply(long balance, long quant) {
    logger.debug("balance: " + balance);
    long newBalance = balance + quant;
    logger.debug("balance + quant: " + newBalance);

    double issuedSupply = -supply * (1.0 - Math.pow(1.0 + (double) quant / newBalance, 0.0005));
    logger.debug("issuedSupply: " + issuedSupply);
    long out = (long) issuedSupply;
    supply += out;

    return out;
  }

  private long exchangeFromSupply(long balance, long supplyQuant) {
    supply -= supplyQuant;

    double exchangeBalance =
        balance * (Math.pow(1.0 + (double) supplyQuant / supply, 2000.0) - 1.0);
    logger.debug("exchangeBalance: " + exchangeBalance);

    return (long) exchangeBalance;
  }

Test case

 @Test
  public void testPow() {
    double x = 29218;
    double q = 4761432;
    double ret = Math.pow(1.0 + x / q, 0.0005);
    double ret2 = StrictMath.pow(1.0 + x / q, 0.0005);

    System.out.printf("%s%n", doubleToHex(ret)); //  3ff000033518c576
    System.out.printf("%s%n", doubleToHex(ret2)); // 3ff000033518c575
    Assert.assertEquals(0, Double.compare(ret, ret2)); // fail in jdk8_X86, success in jdk8_ARM64
  }

  public static String doubleToHex(double input) {
    // Convert the starting value to the equivalent value in a long
    long doubleAsLong = Double.doubleToRawLongBits(input);
    // and then convert the long to a hex string
    return Long.toHexString(doubleAsLong);
  }

Tron Should Use StrictMath to Avoid Cross-Platform Consistency Issues. To help ensure the portability on ARM for Java-Tron, I suggest a new proposal to convert Math to StrictMath. cc @zeusoo001

317787106 · 2024-08-20T10:48:10Z

@halibobo1205 First support JDK8 on mac ARM and then extend to support JDK17 on linux and mac ARM may be smooth.

tomatoishealthy · 2024-08-21T02:55:36Z

JDK version compatibility After some brief research, I found ARM 64-bit versions of JDK 8 available. cc @endiaoekoe @317787106 @abn2357

Does this mean that there is no longer a dependency between ARM architecture upgrade and JDK upgrade?

In addition, TRON only focuses on Oracle JDK, right?

halibobo1205 · 2024-08-21T05:31:46Z

@tomatoishealthy, Oops! I'm sorry. I missed that java-tron currently only supports the Oracle JDK.

I found some issues that seem to indicate that OpenJDK is missing the JavaFX module and that Tron uses javafx.util.Pair , Odyssey-v3.7 released Mar 17, 2020, fix this by #2510. Until then, it seems to work with OpenJDK + openjfx. After that, Tron seems to be able to support OpenJDK. But Tron still claims to support only Oracle JDK8, is there any other reason? cc @317787106 @zeusoo001

halibobo1205 · 2024-08-21T08:42:37Z

halibobo1205 · 2024-08-21T09:07:13Z

8. Third-party dependencies

Ensure all third-party libraries and dependencies support ARM architecture.
Some incompatible dependencies may need to be updated or replaced, including, but not limited to, the following.

protoc-gen-grpc-java: see Update protoc-gen-grpc-java dependency to > v1.26.0 to be able to build in ARM64 #4248
- GreatVoyage-v4.7.3 released Oct 25, 2023, upgraded io.grpc:protoc-gen-grpc-java from 1.9.0 to 1.52.1 for ARM64 architecture by feat(net):update com.google.protobuf and io.grpc version #5254

halibobo1205 · 2024-08-27T04:21:31Z

Warning

This is the last planned update of JDK 17 under the NFTC. Updates after September 2024 will be licensed under the Java SE OTN License (OTN) and production use beyond the limited free grants of the OTN license will require a fee.

To avoid subsequent charges for commercial use, I recommend switching to OpenJDK.

halibobo1205 · 2024-09-04T08:47:06Z

Caution

Strong data consistency and finality
Final data consistency is required for blockchain, and it's usually guaranteed by the world state. Unfortunately, Java-Tron doesn't have a world state.
We need to think about how to ensure final data consistency.

halibobo1205 · 2024-09-05T10:39:36Z

1. JDK version compatibility
Maybe try to support OpenJDK on ARM?

halibobo1205 · 2024-10-22T09:25:58Z

A hard fork solution will be introduced in 4.8.0, switching floating-point calculations from Math to StrictMath.

halibobo1205 · 2024-10-25T09:26:10Z

Currently, java-tron supports both LevelDB and RocksDB. On the ARM architecture, we intend to support only RocksDB, mainly due to the following considerations:

Performance Advantages
RocksDB, built on top of LevelDB, offers enhanced performance, reliability, and advanced features such as multi-threaded execution, compaction optimizations, and support for larger datasets, making it more suitable for high-throughput and low-latency use cases.
Community Support

RocksDB has continuous investment and maintenance from Meta (Facebook)
Official support for RocksDB Java API
RocksDB community is more active in supporting ARM architecture
In comparison, LevelDB's community maintenance is relatively less active

Feature Completeness

RocksDB offers richer features (e.g., column families, transaction support, TtlDB)
Built-in monitoring and performance diagnostic tools are more comprehensive
Provides more flexible configuration options for optimization on ARM architecture

Future Development Trends

RocksDB is more widely used in blockchain domain
Continuously receives performance optimizations and feature updates
More timely support for new hardware features

Ecosystem Integration

Better support for RocksDB in cloud-native environments
Better integration with modern monitoring tools
More mature support for containerized deployment

Hardware Adaptation

Better optimization for new storage devices (e.g., NVMe SSD) in RocksDB
Better utilization of ARM architecture's specific instruction sets
Better support for large memory systems

This will ensure the best database usage experience on ARM architecture.

halibobo1205 · 2024-11-21T02:59:03Z

Mac spends much time on CI test due to this:

RocksDBStore.openDB performance degradation after version 6.27.3 on Mac facebook/rocksdb#11035

halibobo1205 · 2024-11-21T03:04:44Z

Important

On the ARM architecture, we intend to support only RocksDB

When using RocksDB on the CI test, we found that some tests failed due to differences in behavior between LevelDB and RocksDB.

JVM core dump

RocksDB does not throw an error when it is closed
- putData
- deleteData
- getData

Murphytron · 2024-11-21T03:46:50Z

This issue has been added to the tronprotocol/pm#105, welcome to share the latest progress @halibobo1205 , and discuss together with @endiaoekoe @317787106 @tomatoishealthy @zeusoo001 @abn2357 .

sYNCHROtime · 2024-12-27T05:11:00Z

When can Tron node run on ARM architecture，do you have an approximate time?

halibobo1205 · 2024-12-27T05:18:15Z

@sYNCHROtime, It's expected that java-tron would be able to run on ARM platform in Q2 2025 if the corresponding dependency updation and code adaption goes smoothly.

dkatzan · 2025-02-18T14:50:15Z

Hi @halibobo1205 , my team is building an MPC enterprise wallet,
and are highly anticipating this feature in order to be able to run private Tron chains on arm, allowing us to run tests

are there any updates on this? are we still looking at Q2?
Thx

halibobo1205 · 2025-02-19T02:33:57Z

@dkatzan Yes, most feature adaptations have been completed and are expected to be online in Q2.

dkatzan · 2025-02-19T23:07:29Z

@halibobo1205 Thx, that's great news
as a temporary mitigation, I'm trying to actually use the existing image, which seems to be able to run on my M1 mac, but am facing some difficulties, which might be a result of compatibility \ running with the qemu?

I posted my full issue here, any help would be greatly appreciated

halibobo1205 · 2025-02-20T02:00:04Z

@dkatzan You can try installing the x86_64 JDK on the m1, which will then use the Rosetta emulation directly.

halibobo1205 · 2025-04-15T13:11:51Z

Trace Floating-point arithmetic :

x87 FPU for some Java Math
- Don't use x87 FPU on x86-64
- Misuse of Math library without strictfp

Tron uses Math.pow() for floating-point calculations for the Bancor trading pair in ExchangeProcessor:

a new proposal to convert Math to StrictMath for pow: is activated on 2025-01-20 14:00:00 by proposal 101 in 4.7.7
feat(math): migrate all operations from java.lang.Math to java.lang.strictMath #6182, will be released in 4.8.0
Before proposal 101, for historical data(the Bancor trading pair), adaptations also need to be made so that blocks can be synchronized from 0.

halibobo1205 · 2025-04-16T10:22:48Z

Trace Floating-point arithmetic :
For historical data(the Bancor trading pair), there are currently 2 solutions

Hardcoded Special Cases
Emulating x86 Math.pow

I'll go over the first option in detail, and then the pros and cons of the 2 options

Detailed Description of Hardcoded Special Cases:

Option 1 uses partial hardcoding to handle special cases, with the following implementation steps:

Difference Identification: Through transaction replay on the X86 platform, comprehensively identify the set of input values where Math.pow and StrictMath.pow calculations produce inconsistent results on the X86 platform.
Mapping Table Creation: Establish a mapping table for the identified special input values and their corresponding Math.pow calculation results from the x86 platform, which can be stored using data structures like hash tables.
Conditional Processing Logic:
- Before proposal takes effect: ARM platform defaults to using StrictMath.pow for calculations
- For each input value, first check if it exists in the special case mapping table
- If it exists, directly return the pre-calculated result from the mapping table (x86 platform's Math.pow result)
- If not, use the StrictMath.pow calculation result
Logging: Log all special cases handled using the mapping table for subsequent analysis and verification.

comparison the two options:

Aspect	Hardcoded Special Cases	Emulating x86 Math.pow
Implementation Complexity	Low to Medium, only requires identifying differences and creating mappings	Very High, requires a deep understanding of floating-point calculation implementation
Code Universality	Lower, customized solution for specific problems	High, applicable to all scenarios
Maintenance Cost	Low, no need for continuous updates as the new proposal is in effect and historical data is fixed	Low, minimal maintenance after initial implementation
Performance Impact	Small, only slight overhead during table lookups	Potentially significant, custom floating-point calculations typically slower than native ones
Implementation Risk	Low, simple logic that's easy to test and verify	High, may introduce new precision or performance issues
Scalability	Medium, can be extended by updating the mapping table	High, solves all cases at once
Cross-platform Compatibility	High, only requires the mapping table to be consistent across platforms	Uncertain, depends on implementation quality

halibobo1205 · 2025-04-18T06:16:35Z

For historical data(the Bancor trading pair), until the new proposal 101 took effect, 48 POW calculation instances caused final block state inconsistencies for Main-Net.

Simulating the x86(x87 ) pow instruction requires the following work:

Emulating 80-bit extended precision
Emulating fyl2x and f2xm1 instructions

JDK8 pow JVM code: for more details: https://github.com/openjdk/jdk/blob/jdk8-b120/hotspot/src/cpu/x86/vm/macroAssembler_x86.cpp#L2269

void MacroAssembler::increase_precision() {
  subptr(rsp, BytesPerWord);
  fnstcw(Address(rsp, 0));
  movl(rax, Address(rsp, 0));
  orl(rax, 0x300);
  push(rax);
  fldcw(Address(rsp, 0));
  pop(rax);
}

void MacroAssembler::restore_precision() {
  fldcw(Address(rsp, 0));
  addptr(rsp, BytesPerWord);
}

void MacroAssembler::pow_exp_core_encoding() {
  // kills rax, rcx, rdx
  subptr(rsp,sizeof(jdouble));
  // computes 2^X. Stack: X ...
  // f2xm1 computes 2^X-1 but only operates on -1<=X<=1. Get int(X) and
  // keep it on the thread's stack to compute 2^int(X) later
  // then compute 2^(X-int(X)) as (2^(X-int(X)-1+1)
  // final result is obtained with: 2^X = 2^int(X) * 2^(X-int(X))
  fld_s(0);                 // Stack: X X ...
  frndint();                // Stack: int(X) X ...
  fsuba(1);                 // Stack: int(X) X-int(X) ...
  fistp_s(Address(rsp,0));  // move int(X) as integer to thread's stack. Stack: X-int(X) ...
  f2xm1();                  // Stack: 2^(X-int(X))-1 ...
  fld1();                   // Stack: 1 2^(X-int(X))-1 ...
  faddp(1);                 // Stack: 2^(X-int(X))
  // computes 2^(int(X)): add exponent bias (1023) to int(X), then
  // shift int(X)+1023 to exponent position.
  // Exponent is limited to 11 bits if int(X)+1023 does not fit in 11
  // bits, set result to NaN. 0x000 and 0x7FF are reserved exponent
  // values so detect them and set result to NaN.
  movl(rax,Address(rsp,0));
  movl(rcx, -2048); // 11 bit mask and valid NaN binary encoding
  addl(rax, 1023);
  movl(rdx,rax);
  shll(rax,20);
  // Check that 0 < int(X)+1023 < 2047. Otherwise set rax to NaN.
  addl(rdx,1);
  // Check that 1 < int(X)+1023+1 < 2048
  // in 3 steps:
  // 1- (int(X)+1023+1)&-2048 == 0 => 0 <= int(X)+1023+1 < 2048
  // 2- (int(X)+1023+1)&-2048 != 0
  // 3- (int(X)+1023+1)&-2048 != 1
  // Do 2- first because addl just updated the flags.
  cmov32(Assembler::equal,rax,rcx);
  cmpl(rdx,1);
  cmov32(Assembler::equal,rax,rcx);
  testl(rdx,rcx);
  cmov32(Assembler::notEqual,rax,rcx);
  movl(Address(rsp,4),rax);
  movl(Address(rsp,0),0);
  fmul_d(Address(rsp,0));   // Stack: 2^X ...
  addptr(rsp,sizeof(jdouble));
}


void MacroAssembler::fast_pow() {
  // computes X^Y = 2^(Y * log2(X))
  // if fast computation is not possible, result is NaN. Requires
  // fallback from user of this macro.
  // increase precision for intermediate steps of the computation
  increase_precision();
  fyl2x();                 // Stack: (Y*log2(X)) ...
  pow_exp_core_encoding(); // Stack: exp(X) ...
  restore_precision();
}

NewOF · 2025-04-18T08:50:10Z

Principle of X87 Instruction Simulation

Through relevant information and source code, it is known that the key to calculating $a ^ b$ in pow lies in two instructions, fyl2x and f2xm1. fyl2x is used to calculate $b * \log_2^a$, while f2xm1 is used to calculate $2^a -1$。
Process：
- Assume $y = a^b$，take the logarithm on both sides of the equation $\log_2^y = b * \log_2^a$
- That is $\Large y = 2^{b * \log_2^a}$
- By combining these two instructions, we can calculate $a^b$，
At present, the implementation details of the fyl2x and f2xm1 instructions are still unknown. We will attempt to simulate the calculation through Taylor expansion
$\log_2^x$
- $\log_2^x$ unfolds as $\frac{1}{\ln2}\sum_{n=1}^{\infty}\frac{(-1)^n}{n}(x-1)^n$
$2^x - 1$
- $2^x - 1$ unfolds as $\sum_{n=1}^{\infty}\frac{(x\ln2)^n}{n!}$

Simulation implementation(c++)

Using SoftFloat library, supporting 80 bit extended dual precision（http://www.jhauser.us/arithmetic/SoftFloat.html）
Note 1: Float80 class is implemented based on SoftFloat library encapsulation
Note 2: Due to the special nature of the 48 mismatched data (with a power of 0.0005), the calculation process has been partially simplified

  Float80 ln_coe[] = {
  	Float80(0x3FFF, 0x8000000000000000), // 1.0
  	Float80(0x3FFD, 0xFFFFFFFFFFFFFFFF), // 0.5
  	Float80(0x3FFD, 0xAAAAAAAAAAAAAAAA), // 0.333...
  	Float80(0x3FFC, 0xFFFFFFFFFFFFFFFF), // 0.25
  	Float80(0x3FFC, 0xCCCCCCCCCCCCCCCC), // 0.2
  	Float80(0x3FFC, 0xAAAAAAAAAAAAAAAA), // 0.166...
  };

  Float80 taylor_ln2_float80(Float80 x) {
  	return x -
           x * x * ln_coe[1] +
           x * x * x * ln_coe[2] -
           x * x * x * x * ln_coe[3] +
           x * x * x * x * x * ln_coe[4] -
           x * x * x * x * x * x * ln_coe[5];
  }

  // ln(2)^n/n!
  Float80 exp_coe[] = {
  	Float80(0.69314718055994530942),
  	Float80(0.24022650695910071233),
  	Float80(0.05550410866482157995),
  	Float80(0.00961812910762847716),
  	Float80(0.00133335581464284434),
  };

  Float80 taylor_exp2_float80(Float80 x) {
  	return x * exp_coe[0] +
           x * x * exp_coe[1] +
           x * x * x * exp_coe[2] +
           x * x * x * x * exp_coe[3] +
           x * x * x * x * x * exp_coe[4];
  }

  double taylor_pow2_float80(double x, double y) {
  	Float80 x80(x), y80(y);
  	Float80 ln2(0x3FFE, 0xB17217F7D1CF7BBB);

  	Float80 y_lg2_x = y80 * taylor_ln2_float80(x80 - Float80(1)) / ln2;
    // For the test dataset, since y_lg2_x<1, this step omits the exponentiation of the integer part
  	Float80 exp_y_lg2_x = taylor_exp2_float80(y_lg2_x) + Float80(1);

  	return exp_y_lg2_x.to_double();
  }

Test data(base, exp, expected)

  Data("3ff0192278704be3", 0.0005, "3ff000033518c576"); //  4137160
  Data("3ff000002fc6a33f", 0.0005, "3ff0000000061d86"); //  4065476
  Data("3ff00314b1e73ecf", 0.0005, "3ff0000064ea3ef8"); //  4071538
  Data("3ff0068cd52978ae", 0.0005, "3ff00000d676966c"); //  4109544
  Data("3ff0032fda05447d", 0.0005, "3ff0000068636fe0"); //  4123826
  Data("3ff00051c09cc796", 0.0005, "3ff000000a76c20e"); //  4166806
  Data("3ff00bef8115b65d", 0.0005, "3ff0000186893de0"); //  4225778
  Data("3ff009b0b2616930", 0.0005, "3ff000013d27849e"); //  4251796
  Data("3ff00364ba163146", 0.0005, "3ff000006f26a9dc"); //  4257157
  Data("3ff019be4095d6ae", 0.0005, "3ff0000348e9f02a"); //  4260583
  Data("3ff0123e52985644", 0.0005, "3ff0000254797fd0"); //  4367125
  Data("3ff0126d052860e2", 0.0005, "3ff000025a6cde26"); //  4402197
  Data("3ff0001632cccf1b", 0.0005, "3ff0000002d76406"); //  4405788
  Data("3ff0000965922b01", 0.0005, "3ff000000133e966"); //  4490332
  Data("3ff00005c7692d61", 0.0005, "3ff0000000bd5d34"); //  4499056
  Data("3ff015cba20ec276", 0.0005, "3ff00002c84cef0e"); //  4518035
  Data("3ff00002f453d343", 0.0005, "3ff000000060cf4e"); //  4533215
  Data("3ff006ea73f88946", 0.0005, "3ff00000e26d4ea2"); //  4647814
  Data("3ff00a3632db72be", 0.0005, "3ff000014e3382a6"); //  4766695
  Data("3ff000c0e8df0274", 0.0005, "3ff0000018b0aeb2"); //  4771494
  Data("3ff00015c8f06afe", 0.0005, "3ff0000002c9d73e"); //  4793587
  Data("3ff00068def18101", 0.0005, "3ff000000d6c3cac"); //  4801947
  Data("3ff01349f3ac164b", 0.0005, "3ff000027693328a"); //  4916843
  Data("3ff00e86a7859088", 0.0005, "3ff00001db256a52"); //  4924111
  Data("3ff00000c2a51ab7", 0.0005, "3ff000000018ea20"); //  5098864
  Data("3ff020fb74e9f170", 0.0005, "3ff00004346fbfa2"); //  5133963
  Data("3ff00001ce277ce7", 0.0005, "3ff00000003b27dc"); //  5139389
  Data("3ff005468a327822", 0.0005, "3ff00000acc20750"); //  5151258
  Data("3ff00006666f30ff", 0.0005, "3ff0000000d1b80e"); //  5185021
  Data("3ff000045a0b2035", 0.0005, "3ff00000008e98e6"); //  5295829
  Data("3ff00e00380e10d7", 0.0005, "3ff00001c9ff83c8"); //  5380897
  Data("3ff00c15de2b0d5e", 0.0005, "3ff000018b6eaab6"); //  5400886
  Data("3ff00042afe6956a", 0.0005, "3ff0000008892244"); //  5864127
  Data("3ff0005b7357c2d4", 0.0005, "3ff000000bb48572"); //  6167339
  Data("3ff00033d5ab51c8", 0.0005, "3ff0000006a279c8"); //  6240974
  Data("3ff0000046d74585", 0.0005, "3ff0000000091150"); //  6279093
  Data("3ff0010403f34767", 0.0005, "3ff0000021472146"); //  6428736
  Data("3ff00496fe59bc98", 0.0005, "3ff000009650a4ca"); //  6432355,6493373
  Data("3ff0012e43815868", 0.0005, "3ff0000026af266e"); //  6555029
  Data("3ff00021f6080e3c", 0.0005, "3ff000000458d16a"); //  7092933
  Data("3ff000489c0f28bd", 0.0005, "3ff00000094b3072"); //  7112412
  Data("3ff00009d3df2e9c", 0.0005, "3ff00000014207b4"); //  7675535
  Data("3ff000def05fa9c8", 0.0005, "3ff000001c887cdc"); //  7860324
  Data("3ff0013bca543227", 0.0005, "3ff00000286a42d2"); //  8292427
  Data("3ff0021a2f14a0ee", 0.0005, "3ff0000044deb040"); //  8517311
  Data("3ff0002cc166be3c", 0.0005, "3ff0000005ba841e"); //  8763101
  Data("3ff0000cc84e613f", 0.0005, "3ff0000001a2da46"); //  9269124
  Data("3ff000057b83c83f", 0.0005, "3ff0000000b3a640"); //  9631452

Comparison of Results

exp:0.0005 base:1.00628495434413 3ff019be4095d6ae
expected: 1.0000031326481 3ff0000348e9f02a
result:   1.0000031326481 3ff0000348e9f029

exp:0.0005 base:1.00805230779141 3ff020fb74e9f170
expected: 1.00000401003852 3ff00004346fbfa2
result:   1.00000401003852 3ff00004346fbfa1

At the beginning of the test, the iterative calculation used directly had poor results. Later, by directly calculating the polynomial expansion and using pre calculated coefficients, the effect was significantly improved. (However, there are still two pieces of data that do not fully match expectations)

halibobo1205 · 2025-04-18T09:12:35Z

diff data(48 POW calculation instances) is the result of Math and StrictMath, and if algorithmic simulations are performed, the implementation needs to be fully tested for full equivalence with the Math library.

NewOF · 2025-04-21T03:24:38Z

Using simulated pow to calculate 48 mismatched floating-point data, the best result currently is 2 mismatches.
By comparing historical data (block height of 11 million 673496 pieces of data, including contextual numerical calculations, and comparing the final exchange balance), the simulated implementation of pow has 58072 mismatches compared to Math.pow. At the same time, using the pow of the C++standard library directly for calculation resulted in 48 mismatches. Compared to its implementation, it is consistent with StrictMath.pow (both with 48 mismatches).
By analyzing the relevant resources and source code currently found, it is speculated that the logarithmic and power operations implemented within the x87 instruction should also be polynomial expansions, while utilizing preprocessed coefficients for acceleration. At present, due to the lack of further implementation details and the instability of floating-point operations themselves, it is difficult to achieve accurate matching. In contrast, using hard coding is more feasible.

halibobo1205 · 2025-04-21T04:02:17Z

In the Java HotSpot virtual machine, do_intrinsic is an important concept related to intrinsic functions.

Intrinsics in the HotSpot JVM are special, optimized implementations of commonly used Java methods. When the JVM identifies specific method calls, it may replace the standard Java implementation of these methods with more efficient native code. Math.pow() is one such method that is commonly intrinsically optimized.

Specifically for the Math.pow() method, the HotSpot JVM handles it in the following ways:

The JVM has a function called do_intrinsic that determines whether a method can be replaced with an intrinsic implementation, and how to perform the replacement.
For Math.pow(), when the JIT (Just-In-Time compiler) compiles code containing this method call, the do_intrinsic mechanism examines the call and may replace it with native instructions or optimized algorithms corresponding to the processor architecture.
Typically, the intrinsic implementation of Math.pow() directly utilizes instructions from the CPU's floating-point unit, such as FSIN, FCOS, FPTAN in x86 architecture, or calls to underlying math libraries (like Intel's MKL), thereby avoiding the slower pure Java implementation.

This optimization mechanism is usually managed by the vmIntrinsics namespace in the HotSpot source code, with relevant implementations distributed across files such as src/hotspot/share/classfile/vmSymbols.hpp, src/hotspot/share/opto/library_call.cpp, and others.

do_intrinsic(_dlog, java_lang_Math, log_name, double_double_signature, F_S)

Through this intrinsic function optimization, the HotSpot JVM allows Java programs to maintain platform independence while achieving performance close to native code, which is particularly beneficial for math-computation-intensive applications.

Here's the logic for log, and pow is similar:

317787106 · 2025-04-21T08:21:19Z

@halibobo1205 When Math calculates pow(double,double), how can you determine if the result is inconsistent with that calculated by StrictMath? What to do when inconsistencies are found? And you can specify what's hardcoded.

halibobo1205 · 2025-04-21T09:49:00Z

@317787106

Calculate the bancor transaction based on Math.pow and StrictMath.pow, respectively, in an x86 JDK8 environment, and record if the final buyTokenQuant is inconsistent

public class ExchangeCapsule implements ProtoCapsule<Exchange> {

  public long transaction(byte[] sellTokenID, long sellTokenQuant, boolean useStrictMath) {
    long supply = 1_000_000_000_000_000_000L;
    ExchangeProcessor processor = new ExchangeProcessor(supply, useStrictMath);
    ExchangeProcessor strictProcessor = new ExchangeProcessor(supply, true);

    long buyTokenQuant = 0;
    long strictBuyTokenQuant = 0;
    long firstTokenBalance = this.exchange.getFirstTokenBalance();
    long secondTokenBalance = this.exchange.getSecondTokenBalance();

    if (this.exchange.getFirstTokenId().equals(ByteString.copyFrom(sellTokenID))) {
      buyTokenQuant = processor.exchange(firstTokenBalance,
          secondTokenBalance,
          sellTokenQuant);
      strictBuyTokenQuant = strictProcessor.exchange(firstTokenBalance,
          secondTokenBalance,
          sellTokenQuant);
      if (!useStrictMath && buyTokenQuant != strictBuyTokenQuant) {
        logAndRecord("{}\t{}\t{}\t{}\t{}", buyTokenQuant, strictBuyTokenQuant, firstTokenBalance, secondTokenBalance, sellTokenQuant); // logAndRecord pow data
      }
      this.exchange = this.exchange.toBuilder()
          .setFirstTokenBalance(firstTokenBalance + sellTokenQuant)
          .setSecondTokenBalance(secondTokenBalance - buyTokenQuant)
          .build();
    } else {
      buyTokenQuant = processor.exchange(secondTokenBalance,
          firstTokenBalance,
          sellTokenQuant);
      strictBuyTokenQuant = strictProcessor.exchange(secondTokenBalance,
          firstTokenBalance,
          sellTokenQuant);
      if (!useStrictMath && buyTokenQuant != strictBuyTokenQuant) {
        logAndRecord("{}\t{}\t{}\t{}\t{}", buyTokenQuant, strictBuyTokenQuant,secondTokenBalance, firstTokenBalance, sellTokenQuant); // logAndRecord pow data
      }
      this.exchange = this.exchange.toBuilder()
          .setFirstTokenBalance(firstTokenBalance - buyTokenQuant)
          .setSecondTokenBalance(secondTokenBalance + sellTokenQuant)
          .build();
    }
    
    return buyTokenQuant;
  }

Based on the data collected in step 1, calculate the pow data to be hardcoded: issuedSupply and exchangeBalance

     
public class ExchangeProcessor {

  private long supply;
  private final boolean useStrictMath;

  public ExchangeProcessor(long supply, boolean useStrictMath) {
    this.supply = supply;
    this.useStrictMath = useStrictMath;
  }

  private long exchangeToSupply(long balance, long quant) {
    long newBalance = balance + quant;
    double issuedSupply = -supply * (1.0 - Maths.pow(1.0 + (double) quant / newBalance, 0.0005, this.useStrictMath));
    long out = (long) issuedSupply;
    supply += out;
    return out;
  }

  private long exchangeFromSupply(long balance, long supplyQuant) {
    supply -= supplyQuant;
    double exchangeBalance = balance * (Maths.pow(1.0 + (double) supplyQuant / supply, 2000.0, this.useStrictMath) - 1.0);
    return (long) exchangeBalance;
  }

  public long exchange(long sellTokenBalance, long buyTokenBalance, long sellTokenQuant) {
    long relay = exchangeToSupply(sellTokenBalance, sellTokenQuant);
    return exchangeFromSupply(buyTokenBalance, relay);
  }

}

Adjust StrictMathWrapper

   private static final Map<Double, Double> powData = Collections.synchronizedMap(new HashMap<>());

  public static double pow(double a, double b) {
    double strictResult = StrictMath.pow(a, b);
    return powData.getOrDefault(a, strictResult);
  }
}

halibobo1205 · 2025-04-21T09:53:36Z

If there are other ways to implement X87 Instruction Simulation, please discuss them.
Hard-coding for pow data is currently the better solution based on performance, implementation complexity, and verification difficulty.

317787106 · 2025-04-22T02:37:01Z

@halibobo1205 I noticed that the pow results in exchangeToSupply and exchangeFromSupply are converted to long. Could this precision loss impact the handling of hardcoded special cases?

halibobo1205 · 2025-04-22T15:15:16Z

@halibobo1205 I noticed that the pow results in exchangeToSupply and exchangeFromSupply are converted to long. Could this precision loss impact the handling of hardcoded special cases?

Yes, precision loss precisely reduces the amount of the pow data that needs to be hard-coded.

halibobo1205 added the type:feature label Aug 15, 2024

halibobo1205 mentioned this issue Aug 15, 2024

Core Devs Community Call 22 tronprotocol/pm#99

Closed

This was referenced Aug 28, 2024

Upgrade to JDK 17 for ARM Architecture #5976

Open

Core Devs Community Call 23 tronprotocol/pm#100

Closed

halibobo1205 mentioned this issue Oct 30, 2024

TIP-697: Migrate Floating-Point Calculations from Math to StrictMath tronprotocol/tips#697

Closed

halibobo1205 mentioned this issue Nov 21, 2024

Core Devs Community Call 27 tronprotocol/pm#105

Closed

halibobo1205 mentioned this issue Nov 28, 2024

RocksDB Linux ARM support. #5845

Open

halibobo1205 added this to java-tron Jan 17, 2025

BlueHoopor moved this to In progress in java-tron Feb 22, 2025

halibobo1205 mentioned this issue Mar 11, 2025

Core Devs Community Call 32 tronprotocol/pm#114

Closed

halibobo1205 mentioned this issue Apr 1, 2025

Core Devs Community Call 33 tronprotocol/pm#116

Closed

Expand ARM Architecture Compatibility #5954

Expand ARM Architecture Compatibility #5954

Comments

halibobo1205 commented Aug 15, 2024 • edited Loading

Background

Key developments in ARM architecture:

ARM advantages:

Related Issues and PRs

Scope of Impact

angrynurd commented Aug 15, 2024 • edited Loading

tomatoishealthy commented Aug 15, 2024

halibobo1205 commented Aug 15, 2024 • edited Loading

317787106 commented Aug 15, 2024 • edited Loading

halibobo1205 commented Aug 15, 2024 • edited Loading

angrynurd commented Aug 15, 2024

abn2357 commented Aug 15, 2024

halibobo1205 commented Aug 15, 2024

halibobo1205 commented Aug 15, 2024

zeusoo001 commented Aug 20, 2024

Murphytron commented Aug 20, 2024

halibobo1205 commented Aug 20, 2024 • edited Loading

halibobo1205 commented Aug 20, 2024 • edited Loading

317787106 commented Aug 20, 2024

tomatoishealthy commented Aug 21, 2024

halibobo1205 commented Aug 21, 2024

halibobo1205 commented Aug 21, 2024 • edited Loading

halibobo1205 commented Aug 21, 2024

halibobo1205 commented Aug 27, 2024

halibobo1205 commented Sep 4, 2024

halibobo1205 commented Sep 5, 2024

halibobo1205 commented Oct 22, 2024

halibobo1205 commented Oct 25, 2024

halibobo1205 commented Nov 21, 2024

halibobo1205 commented Nov 21, 2024 • edited Loading

Murphytron commented Nov 21, 2024

sYNCHROtime commented Dec 27, 2024

halibobo1205 commented Dec 27, 2024

dkatzan commented Feb 18, 2025

halibobo1205 commented Feb 19, 2025

dkatzan commented Feb 19, 2025

halibobo1205 commented Feb 20, 2025

halibobo1205 commented Apr 15, 2025 • edited Loading

halibobo1205 commented Apr 16, 2025

Detailed Description of Hardcoded Special Cases:

comparison the two options:

halibobo1205 commented Apr 18, 2025 • edited Loading

NewOF commented Apr 18, 2025 • edited Loading

Principle of X87 Instruction Simulation

Simulation implementation(c++)

halibobo1205 commented Apr 18, 2025

NewOF commented Apr 21, 2025

halibobo1205 commented Apr 21, 2025

317787106 commented Apr 21, 2025

halibobo1205 commented Apr 21, 2025

halibobo1205 commented Apr 21, 2025

317787106 commented Apr 22, 2025

halibobo1205 commented Apr 22, 2025

halibobo1205 commented Aug 15, 2024 •

edited

Loading

angrynurd commented Aug 15, 2024 •

edited

Loading

halibobo1205 commented Aug 15, 2024 •

edited

Loading

317787106 commented Aug 15, 2024 •

edited

Loading

halibobo1205 commented Aug 15, 2024 •

edited

Loading

halibobo1205 commented Aug 20, 2024 •

edited

Loading

halibobo1205 commented Aug 20, 2024 •

edited

Loading

halibobo1205 commented Aug 21, 2024 •

edited

Loading

halibobo1205 commented Nov 21, 2024 •

edited

Loading

halibobo1205 commented Apr 15, 2025 •

edited

Loading

halibobo1205 commented Apr 18, 2025 •

edited

Loading

NewOF commented Apr 18, 2025 •

edited

Loading