Skip to content

Latest commit

 

History

History
96 lines (67 loc) · 5.95 KB

README.md

File metadata and controls

96 lines (67 loc) · 5.95 KB

Investigation

It was first pointed out that a call to get a method generic return type was throwing an exception.

During the investigation it was found that the exception was thrown while gathering the method's data.

And the cause of the exception was that the signature returned by Class#getGenericSignature0 was java/util/concurrent/CompletableFuture while the correct value is <T:Ljava/lang/Object;>Ljava/lang/Object;Ljava/util/concurrent/Future<TT;>;.

Given that Class#getGenericSignature0 is a native method, the next step was to inspect the generated bytecode.

A simplified version of the instrumentation was created. This version did not change anything in the code, it only adds a couple of annotations to the class. Even then the problem persisted. And looking at the bytecode, both instrumented and original contain a line with the proper signature: Signature: #xyz // <T:Ljava/lang/Object;>Ljava/lang/Object;Ljava/util/concurrent/Future<TT;>;Ljava/util/concurrent/CompletionStage<TT;>;.

Bytecodes:

Further comparing the bytecodes, the only differences are the order of the items in the constant pool (which cause the references throughout the class to change), the order of the methods and the order of the tables in each method (LineNumberTable, LocalVariableTable, LocalVariableTypeTable).

NOTE as mentioned before, the instrumented CompletableFuture bytecode was generated with a trimmed down version of the New Relic Java agent. The bytecode that will be generated by this application will have more differences from the original since a full New Relic Java agent is used.

Since the bytecode did not provide any clue, the next step was to debug the JVM itself.

ClassFileParser::apply_parsed_class_attributes is called twice for CompletableFuture. Once before instrumentation and once after. And in both cases, the _generic_signature_index is correct relative to the bytecode being read.

Following the execution to VM_RedefineClasses::merge_cp_and_rewrite:

1794  jvmtiError VM_RedefineClasses::merge_cp_and_rewrite(
1795               InstanceKlass* the_class, InstanceKlass* scratch_class,
1796               TRAPS) {
...
...       // point A
1916      if (!rewrite_cp_refs(scratch_class)) { 
1917        return JVMTI_ERROR_INTERNAL;
1918      }
...       // point B
...
1924      set_new_constant_pool(loader_data, scratch_class, merge_cp, merge_cp_length,
1925                            CHECK_(JVMTI_ERROR_OUT_OF_MEMORY));
...       // point C
...
1931      cp_cleaner.add_scratch_cp(scratch_cp());
...

At point A, scratch_class->generic_signature() is correct and generic_signature_index is 4.

At point B, scratch_class->generic_signature() is wrong, its value changes with each run, and generic_signature_index is 1019.

At point C, scratch_class->generic_signature() is the value observed at runtime and generic_signature_index is 4. Also scratch_class->constants()->symbol_at(1019) has the correct generic signature.

Note that 1019 is the index for the generic signature in the original bytecode and 4 is the index for the generic signature in the instrumented bytecode. (This is likely not relevant as explained later.)

Inside VM_RedefineClasses::rewrite_cp_refs the explanation to what happens between points A and B.

1936  bool VM_RedefineClasses::rewrite_cp_refs(InstanceKlass* scratch_class) {
...
2024    // rewrite class generic signature index:
2025    u2 generic_signature_index = scratch_class->generic_signature_index(); // this is set to 4
2026    if (generic_signature_index != 0) {
2027      u2 new_generic_signature_index = find_new_index(generic_signature_index); // this is set to 1019
2028      if (new_generic_signature_index != 0) {
2029        scratch_class->set_generic_signature_index(new_generic_signature_index);
2030      }
2031    }
...

It looks like VM_RedefineClasses::rewrite_cp_refs writes the generic_signature_index into scratch_cp (the cp inside scratch_class) to be correct index of the generic signature for the constant pool that will be put into scratch_class.

Then merge_cp is passed into VM_RedefineClasses::set_new_constant_pool as scratch_cp. A new constant pool is created (smaller_cp), the constants from the "new" scratch_cp are put in it, along with its generic_signature_index which is not the value updated in VM_RedefineClasses::rewrite_cp_refs.

Looks like there is some confusion with the names here and the generic_signature_index that was updated in VM_RedefineClasses::rewrite_cp_refs gets lost.

Possible explanation/solution

Inside VM_RedefineClasses::merge_cp_and_rewrite, the scratch_cp (inside scratch_class), has its fields updated to reference the items in the final constant pool during the call to VM_RedefineClasses::rewrite_cp_refs.

Just after that, there is a call to VM_RedefineClasses::set_new_constant_pool in which merge_cp is passed as a parameter named scratch_cp. Inside this method, the final constant pool is created (smaller_cp) and there is a call to populate its fields.

smaller_cp->copy_fields(scratch_cp);

but remember that inside VM_RedefineClasses::set_new_constant_pool, scratch_cp is merge_cp from the calling method and does not have the updated fields. So changing this line to

smaller_cp->copy_fields(scratch_class->constants());

copies the right fields to smaller_cp.

This has not been thoroughly tested, but does fix the problem we are seeing. It also does not make the code any less confusing. Renaming scratch_cp inside VM_RedefineClasses::set_new_constant_pool might make things a little clearer. Another possible fix is to update the fields in merge_cp instead of scratch_cp (in VM_RedefineClasses::rewrite_cp_refs).

I don't have a holistic view of everything going on, so this may be completely off.