Skip to content

Discuss linking with libclangInterpreter.a for covering missing symbols in CppInterOpTest.js #519

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
anutosh491 opened this issue Mar 10, 2025 · 22 comments
Labels
bug Something isn't working

Comments

@anutosh491
Copy link
Collaborator

Description of bug

Currently if we see the approach taken in #483 with respect to exporting symbols.

There are 2 main places from where undefined symbols are coming out of for CppInterOpTests.js

  1. llvm/clang static libs
  2. CXCppInterOp.cpp.o (check issue -fvisibility-inlines-hidden hinders symbol export from CXCppInterOp.cpp for emscripten use case #518 for the same)

What operating system was you using when the bug occured?

Other

What is the architechture of the cpu on your system?

Other

What did you build CppInterOp against?

Clang-repl (LLVM19)

@anutosh491 anutosh491 added the bug Something isn't working label Mar 10, 2025
@anutosh491
Copy link
Collaborator Author

What is being done here is the following

  1. Symbols are being exported manually using exports.ld. Which basically means they are being propagted to CppInterOpTest.js through libclangCppInterOp.so
  2. These symbols are required by CppInterOpTest.js which links against gtest, gmock and libclangCppInterOp.so
  3. Now this is how we build libclangCppInterOp.so right now
  add_llvm_library(clangCppInterOp
    SHARED

    CppInterOp.cpp
    CXCppInterOp.cpp
    DynamicLibraryManager.cpp
    DynamicLibraryManagerSymbol.cpp
    Paths.cpp

    # Additional libraries from Clang and LLD
    LINK_LIBS
    clangInterpreter
  )

So emscripten by nature while building clangCppInterOp would fetch all symbols coming out of libclangInterpreter.a and propagte it to clangCppInterOp but the linker gets rid of any symbol that is not being used out of these 5 source files !

  1. Now the CppInterOpTest.js is dependent on these source object files
add_cppinterop_unittest(CppInterOpTests
  EnumReflectionTest.cpp
  FunctionReflectionTest.cpp
  InterpreterTest.cpp
  JitTest.cpp
  ScopeReflectionTest.cpp
  TypeReflectionTest.cpp
  Utils.cpp
  VariableReflectionTest.cpp
  ${EXTRA_TEST_SOURCE_FILES}
)

Hence it is highly probable that CppInterOpTests would require symbols being used in these source object files but that are not present in clangCppInterOp just cause these source files are not covered yet.

@anutosh491
Copy link
Collaborator Author

anutosh491 commented Mar 10, 2025

Some options on what can be done

  1. Either when we create clangCppInterOp (and we know that the user enabled tests)

We can here itself create a test-intermediate library (naming it libtestintermediate.so for now)

And we can do

  add_llvm_library(testintermediate
    SHARED

    EnumReflectionTest.cpp
    FunctionReflectionTest.cpp
    InterpreterTest.cpp
    JitTest.cpp
    ScopeReflectionTest.cpp
    TypeReflectionTest.cpp
    Utils.cpp
    VariableReflectionTest.cpp

    # Additional libraries from Clang and LLD
    LINK_LIBS
    clangInterpreter
  )

So now we have a lib that can has all the symbols relevant to the test object files and not being included in clangCppInterOp and then we can use this while linking

  1. Or we could take an easier route and link straight away with libclangInterpreter.a as obviously this would anyways have the symbols the test needs. Again emscripten would make use of whatever the source files need and strip the rest.

So yeah the easier way out is basically just adding clangInterpreter in the linking step here

https://github.com/compiler-research/CppInterOp/pull/483/files#diff-83065e07beb1893613b121d9407df26d6fe57ed08948d6def04986690a22ba37R43
and
https://github.com/compiler-research/CppInterOp/pull/483/files#diff-83065e07beb1893613b121d9407df26d6fe57ed08948d6def04986690a22ba37R69

This should be done because this keeps clangCppInterOp clean (emscripten knows what symbols should be propagated through this and we don't play with that) and CppInterOpTests.js also gets the symbols it needs.

@anutosh491
Copy link
Collaborator Author

cc @vgvassilev

@mcbarton says he's waiting for your views before applying this suggestion on his PR. This should get rid of all the undefined symbols we are concerned with. let us know what you think.

@vgvassilev
Copy link
Contributor

The design of CppInterOp is that it should contain all required symbols. We need to understand why the symbols we need are not there.

@mcbarton
Copy link
Collaborator

mcbarton commented Mar 12, 2025

@vgvassilev @anutosh491 This is what I believe may be going on/possible solution. If you look at the remaining symbols one still exists from CppInterOp

-Wl,--export=_ZN3Cpp11GetOperatorEPvNS_8OperatorERNSt3__26vectorIS0_NS2_9allocatorIS0_EEEENS_13OperatorArityE
, so I decided to try and tackle this one (and test my solution). If you go to here https://emscripten.org/docs/getting_started/FAQ.html and search you'll find the statement

If your function is used in other functions, LLVM may inline it and it will not appear 
as a unique function in the JavaScript. Prevent inlining by defining the function with EMSCRIPTEN_KEEPALIVE

By making use of EMSCRIPTEN_KEEPALIVE here (found by applying c++filt -n to the symbol)

void GetOperator(TCppScope_t scope, Operator op,

CppInterOps shared library was able to export the symbol without me explicitly exporting it through the linker. You'll also find the statement in the FAQs

EMSCRIPTEN_KEEPALIVE also exports the function, as if it were on EXPORTED_FUNCTIONS.

Using the EXPORTED_FUNCTIONS flag is the Emscripten suggested way to export functions (which we cannot use since we are making a SIDE_MODULE).

So in summary It could be the case that during our llvm build these functions are being inlined and removed by Emscripten. By utilising EMSCRIPTEN_KEEPALIVE on the functions we might be able to keep them alive so we won't have export all symbols manually (and no linking the interpreter library directly to the tests). For llvm 19 this would require a new patch, but we should be able to get this into llvm 20 assuming you think this solution would get merged. Let me know what you think of the solution, and I will investigate what the exact patch for llvm would be if you think the solution is good.

@anutosh491
Copy link
Collaborator Author

EMSCRIPTEN_KEEPALIVE also exports the function, as if it were on EXPORTED_FUNCTIONS.
Using the EXPORTED_FUNCTIONS flag is the Emscripten suggested way to export functions (which we cannot use since we are making a SIDE_MODULE).

Well, we might not be able to use the exported_functions, but we just need this during the link step isn't it ?
The following should be enough isn't it ?

  target_link_options(clangCppInterOp PRIVATE
    PUBLIC "SHELL: -s WASM_BIGINT"
+   PUBLIC "SHELL: -Wl,--export=_ZN3Cpp11GetOperatorEPvNS_8OperatorERNSt3__26vectorIS0_NS2_9allocatorIS0_EEEENS_13OperatorArityE"
  )
  1. This symbol should be a part of libclangCppInterOp.so as it belongs to CppInterOp.cpp which we compile while building the shared module so nothing wrong with exporting it !
  2. This is the only symbol which we have to deal with like this. Rest all come from llvm/clang and don't need to be a part of libclangCppInterOp.so as are eventually only needed by the main module CppInterOpTests.js

@vgvassilev
Copy link
Contributor

Looks like we need CPPINTEROP_API for GetOperator and GetOperatorArity?

@anutosh491
Copy link
Collaborator Author

Looks like we need CPPINTEROP_API for GetOperator and GetOperatorArity?

Yes perfect that should be enough to take care of those symbols.

@mcbarton
Copy link
Collaborator

@vgvassilev @anutosh491 After using CPPINTEROP_API I am left with just the clang and llvm specific symbols. I've separated them out in the attached files

llvm_symbols.txt

clang_symbols.txt

Applying c++filt -n to the llvm symbols I get

llvm::raw_ostream::flush_nonempty()
llvm::raw_ostream::SetBufferAndMode(char*, unsigned long, llvm::raw_ostream::BufferKind)
llvm::raw_ostream::write(char const*, unsigned long)
llvm::raw_ostream::~raw_ostream()
llvm::SmallVectorBase<unsigned int>::grow_pod(void*, unsigned long, unsigned long)
llvm::allocate_buffer(unsigned long, unsigned long)
llvm::logAllUnhandledErrors(llvm::Error, llvm::raw_ostream&, llvm::Twine)
llvm::llvm_unreachable_internal(char const*, char const*, unsigned int)
llvm::sys::RunningOnValgrind()
llvm::dbgs()
llvm::errs()
llvm::APInt::APInt(unsigned int, llvm::ArrayRef<unsigned long long>)
llvm::APInt::compareSigned(llvm::APInt const&) const
llvm::APInt::sext(unsigned int) const
llvm::APInt::zext(unsigned int) const
llvm::APInt::compare(llvm::APInt const&) const
llvm::Error::fatalUncheckedError() const
vtable for llvm::raw_string_ostream

and applying it to the clang symbols I get

clang::ASTContext::createMangleContext(clang::TargetInfo const*)
clang::DeclContext::classof(clang::Decl const*)
clang::Interpreter::getCompilerInstance()
clang::Interpreter::Parse(llvm::StringRef)
clang::Interpreter::create(std::__2::unique_ptr<clang::CompilerInstance, std::__2::default_delete<clang::CompilerInstance>>)
clang::Interpreter::Execute(clang::PartialTranslationUnit&)
clang::MangleContext::mangleName(clang::GlobalDecl, llvm::raw_ostream&)
clang::MangleContext::shouldMangleDeclName(clang::NamedDecl const*)
clang::IncrementalCompilerBuilder::CreateCpp()
clang::Decl::castToDeclContext(clang::Decl const*)
clang::ASTContext::getLValueReferenceType(clang::QualType, bool) const
clang::DeclContext::decls_begin() const
clang::FunctionDecl::getTemplateSpecializationArgs() const
clang::FunctionDecl::getTemplateInstantiationPattern(bool) const
clang::VarTemplateSpecializationDecl::getSpecializedTemplate() const
clang::Decl::getASTContext() const
clang::Decl::hasDefiningAttr() const
clang::Decl::getAttrs() const
clang::Type::getUnqualifiedDesugaredType() const
clang::TagType::getDecl() const
clang::VarDecl::isThisDeclarationADefinition(clang::ASTContext&) const

Please look over this, and let me know if you anything stands out about the lists, and gives you an idea as to why they are not being exported by CppInterOps shared library.

@anutosh491
Copy link
Collaborator Author

anutosh491 commented Mar 13, 2025

Hey,

I think my approach is exactly what we want here ! We just need to link CppInterOptests target against libclangInterpreter now.
. These symbols are not supposed to be exposed through CppInterOps shared library and that's what I've been pointing towards. What are the doubts you have with the approach I write above ?

P:S I've been hacking around with your branch locally and it builds pretty smoothly just like we want. You might also be interested in seeing the other reviews I gave in this regard. More than half of the tests marked as skipped now work.

@anutosh491
Copy link
Collaborator Author

anutosh491 commented Mar 13, 2025

As an update, on reading your statement again , I see this

gives you an idea as to why they are not being exported by CppInterOps shared library.

That's actually what I am pointing towards. These symbols don't necessarily need to come out of cppinterop's shared build if that's the confusion here. Why should these be exposed through libclangCppInterOp.so ?

When building clangCppInterOp, we only care about these symbols coming out of these source object files

    CppInterOp.cpp
    CXCppInterOp.cpp
    DynamicLibraryManager.cpp
    DynamicLibraryManagerSymbol.cpp
    Paths.cpp

So emscripten would make sure to capture all llvm/clang based symbols being used here and strip the rest. There's nothing wrong there correct ? That's what libclangCppInterOp.so is responsible for !

But when we get to the tests, you need to have symbols out of these source files too !

    EnumReflectionTest.cpp
    FunctionReflectionTest.cpp
    InterpreterTest.cpp
    JitTest.cpp
    ScopeReflectionTest.cpp
    TypeReflectionTest.cpp
    Utils.cpp
    VariableReflectionTest.cpp

And its CppInterOpTests.js that needs these but isn't not really the job of clangCppInterOp to expose them cause clangCppInterOp is not responsible for the source files from the test dir.

Hence we just need to do

target_link_libraries (CppInterOptests
clangCppInterOp
clangInterpreter

and we're done.

If we really want to expose it through clangCppInterOp that can be done, but its not that goal here I think. Here's how you can do it if you really want to

if testing enabled
(
  add_llvm_library(testintermediate
    SHARED

    EnumReflectionTest.cpp
    FunctionReflectionTest.cpp
    InterpreterTest.cpp
    JitTest.cpp
    ScopeReflectionTest.cpp
    TypeReflectionTest.cpp
    Utils.cpp
    VariableReflectionTest.cpp

    # Additional libraries from Clang and LLD
    LINK_LIBS
    clangInterpreter
  )
)
// Then while linking against clangInterpreter we also link against this 

target_link_libraries (clangCppInterOp
+ testintermediate
 clangInterpreter)

I don't think this is called for here though :\

I am not sure expecting symbols from fileA, fileB in libXX.so when we are not even compiling them while building XX is correct while building with emscripten. Obviously anything unncessary is stripped off isn't it ?

@vgvassilev
Copy link
Contributor

Hey,

I think my approach is exactly what we want here ! We just need to link CppInterOptests target against libclangInterpreter now. . These symbols are not supposed to be exposed through CppInterOps shared library and that's what I've been pointing towards. What are the doubts you have with the approach I write above ?

P:S I've been hacking around with your branch locally and it builds pretty smoothly just like we want. You might also be interested in seeing the other reviews I gave in this regard. More than half of the tests marked as skipped now work.

No. This is the opposite of what we want. We don’t want to link to libclangInterpterer. As I mentioned in a previous comment CppInterOp should be standalone.

@anutosh491
Copy link
Collaborator Author

Hmm okay maybe I am missing something here. I've pasted a workaround to have it in cppinterop itself in my comment above (#519 (comment))

Something like that might be the only way to organically have these symbols in cppinterop's shared build and not export them forcefully.

@vgvassilev
Copy link
Contributor

vgvassilev commented Mar 13, 2025

The tests are essentially programs built as use-cases for CppInterOp. They should not need any llvm libraries as all symbols should be provided by our library.

PS: the proposed workaround is not what we want either. I still do not understand why we are debating this PR.

@anutosh491
Copy link
Collaborator Author

anutosh491 commented Mar 14, 2025

Hmmm okay, I guess something is still unclear to me but that leaves us with no option but to force these symbols

  1. Cause emscripten would just strip anything apart from those required in these source files
    CppInterOp.cpp
    CXCppInterOp.cpp
    DynamicLibraryManager.cpp
    DynamicLibraryManagerSymbol.cpp
    Paths.cpp

So our question isn't really "why these symbols are not found by default in clangCppInterOp ?" , that's obvious and what should happen when you build with emscripten. The question changes to "how we force the symbols ?"

  1. @mcbarton as you said there is really no difference between the end goal of EMSCRIPTEN_KEEPALIVE and --export....

EMSCRIPTEN_KEEPALIVE also exports the function, as if it were on EXPORTED_FUNCTIONS.

Both force the symbols to stay alive rather than being stripped in the linking step
and now that we have to force the symbols .... I think using --export.... as you did or even --export-if-defined=.... makes more sense than maintaining a patched llvm.

Cause the symbols are only required when you run the tests and not otherwise isn't it. Best probably would be to sort of maintain the symbol list in cppinterop itself (not messing with llvm) and only do the linking when tests are enabled (obviously you see the build time for clangCppInterOp go up after the changes .... so I don't think these symbols need to be taken care of if tests are off.).

@vgvassilev
Copy link
Contributor

I don't think these symbols need to be taken care of if tests are off

The tests show that these symbols are probably needed by to support some of the exported api, right? That means if the tests are broken, likely client code with CppInterOp is broken.

@anutosh491
Copy link
Collaborator Author

anutosh491 commented Mar 14, 2025

The tests show that these symbols are probably needed by to support some of the exported api, right?

Yeah like there's nothing wrong exposing those symbols ( obviously needed when tests are enabled, might not be too obvious when tests are disabled)

That's because, for testing any repo dependent on clangCppInterOp, the symbols put to use would only be coming out the source object files that are compiled while building clangCppInterOp. So the amount of symbols already exposed end up being enough.

Hence I had no issues while running the following on my PR (cause these are not dependent on symbols coming from the source files in the test dir)

target_link_libraries(test_xeus_cpp clangCppInterOp)

Its only when we build CppInterOpTests that we would need those symbols. Apart from then if we test xeus-cpp, clad, cppyy etc etc these symbols shouldn't play a role.

But again not a big concern, can be done in both cases (was just thinking about a faster build above and if we can skip those symbols when tests are disabled)

@vgvassilev
Copy link
Contributor

As I’ve said a week ago. I am not opposed to revisit this once we get more experience with the current approach.

@mcbarton
Copy link
Collaborator

@vgvassilev Am I reverting the llvm patch solution I added to my automated wasm tests PR I added last night (see here for the patch https://github.com/compiler-research/CppInterOp/blob/ffb37b4e038d43eeaa87aa11fd9de32611532dc1/patches/llvm/emscripten-clang19-4-emscripten-keepalive.patch )? Although using CPPINTEROP_API are -Wl,export are similar in the sense they allow the symbols to be available by the shared library, the first approach allows the emscripten driver to process the symbols, while the second (the Wl,export method) doesn't. Regardless of what is decided I suggest we move ahead with my automated wasm tests PR as is, and make subsequent improvements in a separate PR.

@vgvassilev
Copy link
Contributor

We must add CPPINTEROP_API to the interfaces that miss it.

@mcbarton
Copy link
Collaborator

We must add CPPINTEROP_API to the interfaces that miss it.

These are the symbols in llvm and clang, so cannot have cppinterop_api applied (or at least I cannot see how I can use it). Should I revert my llvm patch, which uses EMSCRIPTEN_KEEPALIVE and go back to the -Wl,export method (taking into consideration the drawback of the latter in this comment #519 (comment))?

@anutosh491
Copy link
Collaborator Author

I would do that and not mess with the llvm build as I stated above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants