Skip to content

[RemoveDIs] Print non-intrinsic debug info in textual IR output #79281

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Feb 26, 2024

Conversation

SLTozer
Copy link
Contributor

@SLTozer SLTozer commented Jan 24, 2024

This patch adds support for printing the proposed non-instruction debug info ("RemoveDIs") out to textual IR. This patch does not add any bitcode support, parsing support, or documentation (the latter two will be added in 2 further patches to be landed after this one).

Printing of the new format is controlled by a flag added in this patch, --write-experimental-debuginfo, which defaults to false. The new format will be printed iff this flag is true, so whether we use the IR format is completely independent of whether we use non-instruction debug info during LLVM passes (which is controlled by the --try-experimental-debuginfo-iterators flag). This means that for now, no existing tests need to be updated - the current intention is to add new tests for the new format to ensure basic coverage, then when we're ready to turn the flag on by default we update all the existing tests at once (which can be done with sed for 95% of cases) rather than trying to support testing both formats at once.

The design of this new IR format was proposed previously on Discourse[0], and the responses (though small in number) were positive, so we've proceeded with the design more-or-less as proposed, with the exception being a change from #dbgrecord value to #dbg_value, an improvement suggested by others within the thead for improved concision and less clash with existing keywords. For an example of the new format in this review:

Old: call void @llvm.dbg.value(metadata i32 %a, metadata !12, metadata !DIExpression()), !dbg !14
New:   #dbg_value { i32 %a, !12, !DIExpression(), !14 }

Old: call void @llvm.dbg.declare(metadata ptr %b, metadata !13, metadata !DIExpression()), !dbg !16
New:   #dbg_declare { ptr %b, !13, !DIExpression(), !16 }

Old: call void @llvm.dbg.assign(metadata i32 %add, metadata !13, metadata !DIExpression(), metadata !15, metadata ptr %b, metadata !DIExpression()), !dbg !18
New:   #dbg_assign { i32 %add, !13, !DIExpression(), !15, ptr %b, !DIExpression(), !18 }

The key changes here are the indentation, curly-brace-enclosed arguments and #-prefixed names that clearly distinguish these from instructions, and also the removal of metadata-prefixes for the arguments, and the movement of the !dbg metadata attachment to becoming a normal argument (that also does not require the !dbg prefix).

[0] https://discourse.llvm.org/t/rfc-debuginfo-proposed-changes-to-the-textual-ir-representation-for-debug-values/73491

@llvmbot
Copy link
Member

llvmbot commented Jan 24, 2024

@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-debuginfo

@llvm/pr-subscribers-llvm-ir

Author: Stephen Tozer (SLTozer)

Changes

This patch adds support for printing the proposed non-instruction debug info ("RemoveDIs") out to textual IR. This patch does not add any bitcode support, parsing support, or documentation (the latter two will be added in 2 further patches to be landed after this one).

Printing of the new format is controlled by a flag added in this patch, --write-experimental-debuginfo, which defaults to false. The new format will be printed iff this flag is passed, so whether we use the IR format is completely independent of whether we use non-instruction debug info during LLVM passes (which is controlled by the --try-experimental-debuginfo-iterators flag). This means that for now, no existing tests need to be updated - the current intention is to add new tests for the new format to ensure basic coverage, then when we're ready to turn the flag on by default we update all the existing tests at once (which can be done a sed for 95% of cases) rather than trying to support testing both formats at once.

The design of this new IR format was proposed previously on Discourse[0], and the responses (though small in number) were positive, so we've proceeded with the design more-or-less as proposed, with the exception being a change from #dbgrecord value to #dbg_value, an improvement suggested by others within the thead for improved concision and less clash with existing keywords. For an example of the new format in this review:

Old: call void @<!-- -->llvm.dbg.value(metadata i32 %a, metadata !12, metadata !DIExpression()), !dbg !14
New:   #dbg_value { i32 %a, !12, !DIExpression(), !14 }

Old: call void @<!-- -->llvm.dbg.declare(metadata ptr %b, metadata !13, metadata !DIExpression()), !dbg !16
New:   #dbg_declare { ptr %b, !13, !DIExpression(), !16 }

Old: call void @<!-- -->llvm.dbg.assign(metadata i32 %add, metadata !13, metadata !DIExpression(), metadata !15, metadata ptr %b, metadata !DIExpression()), !dbg !18
New:   #dbg_assign { i32 %add, !13, !DIExpression(), !15, ptr %b, !DIExpression(), !18 }

The key changes here are the indentation, curly-brace-enclosed arguments and #-prefixed names that clearly distinguish these from instructions, and also the removal of metadata-prefixes for the arguments, and the movement of the !dbg metadata attachment to becoming a normal argument (that also does not require the !dbg prefix).

[0] https://discourse.llvm.org/t/rfc-debuginfo-proposed-changes-to-the-textual-ir-representation-for-debug-values/73491


Patch is 20.34 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/79281.diff

7 Files Affected:

  • (modified) llvm/include/llvm/IR/Module.h (+4)
  • (modified) llvm/lib/IR/AsmWriter.cpp (+30-40)
  • (modified) llvm/lib/IR/BasicBlock.cpp (-4)
  • (modified) llvm/lib/IR/IRPrintingPasses.cpp (+32-14)
  • (modified) llvm/lib/IR/Module.cpp (+18)
  • (modified) llvm/lib/IRPrinter/IRPrintingPasses.cpp (+35-14)
  • (added) llvm/test/DebugInfo/print-non-instruction-debug-info.ll (+80)
diff --git a/llvm/include/llvm/IR/Module.h b/llvm/include/llvm/IR/Module.h
index 68a89dc45c2834a..1ed456e6f749929 100644
--- a/llvm/include/llvm/IR/Module.h
+++ b/llvm/include/llvm/IR/Module.h
@@ -218,11 +218,15 @@ class LLVM_EXTERNAL_VISIBILITY Module {
   /// \ref BasicBlock.
   bool IsNewDbgInfoFormat;
 
+  void removeDebugIntrinsicDeclarations();
+
   /// \see BasicBlock::convertToNewDbgValues.
   void convertToNewDbgValues() {
     for (auto &F : *this) {
       F.convertToNewDbgValues();
     }
+    // Remove the declarations of the old debug intrinsics, if any exist.
+    removeDebugIntrinsicDeclarations();
     IsNewDbgInfoFormat = true;
   }
 
diff --git a/llvm/lib/IR/AsmWriter.cpp b/llvm/lib/IR/AsmWriter.cpp
index 3c15784a0ed5eba..d56b08336eb5fa7 100644
--- a/llvm/lib/IR/AsmWriter.cpp
+++ b/llvm/lib/IR/AsmWriter.cpp
@@ -860,7 +860,7 @@ class SlotTracker : public AbstractSlotTrackerStorage {
   /// Add all of the metadata from an instruction.
   void processInstructionMetadata(const Instruction &I);
 
-  /// Add all of the metadata from an instruction.
+  /// Add all of the metadata from a DPValue.
   void processDPValueMetadata(const DPValue &DPV);
 };
 
@@ -1138,6 +1138,9 @@ void SlotTracker::processFunctionMetadata(const Function &F) {
 }
 
 void SlotTracker::processDPValueMetadata(const DPValue &DPV) {
+  // Process metadata used by DPValues; we only specifically care about the
+  // DILocalVariable, DILocation, and DIAssignID fields, as the Value and
+  // Expression fields should only be printed inline and so do not use a slot.
   CreateMetadataSlot(DPV.getVariable());
   CreateMetadataSlot(DPV.getDebugLoc());
   if (DPV.isDbgAssign()) {
@@ -2654,6 +2657,7 @@ class AssemblyWriter {
   void writeAttributeSet(const AttributeSet &AttrSet, bool InAttrGroup = false);
   void writeAllAttributeGroups();
 
+
   void printTypeIdentities();
   void printGlobal(const GlobalVariable *GV);
   void printAlias(const GlobalAlias *GA);
@@ -2662,10 +2666,11 @@ class AssemblyWriter {
   void printFunction(const Function *F);
   void printArgument(const Argument *FA, AttributeSet Attrs);
   void printBasicBlock(const BasicBlock *BB);
+  void printDPValueLine(const DPValue &DPV);
   void printInstructionLine(const Instruction &I);
   void printInstruction(const Instruction &I);
   void printDPMarker(const DPMarker &DPI);
-  void printDPValue(const DPValue &DPI);
+  void printDPValue(const DPValue &DPV);
 
   void printUseListOrder(const Value *V, const std::vector<unsigned> &Shuffle);
   void printUseLists(const Function *F);
@@ -3848,9 +3853,6 @@ void AssemblyWriter::printTypeIdentities() {
 
 /// printFunction - Print all aspects of a function.
 void AssemblyWriter::printFunction(const Function *F) {
-  bool ConvertBack = F->IsNewDbgInfoFormat;
-  if (ConvertBack)
-    const_cast<Function *>(F)->convertFromNewDbgValues();
   if (AnnotationWriter) AnnotationWriter->emitFunctionAnnot(F, Out);
 
   if (F->isMaterializable())
@@ -3993,8 +3995,6 @@ void AssemblyWriter::printFunction(const Function *F) {
     Out << "}\n";
   }
 
-  if (ConvertBack)
-    const_cast<Function *>(F)->convertToNewDbgValues();
   Machine.purgeFunction();
 }
 
@@ -4061,6 +4061,8 @@ void AssemblyWriter::printBasicBlock(const BasicBlock *BB) {
 
   // Output all of the instructions in the basic block...
   for (const Instruction &I : *BB) {
+    for (const DPValue &DPV : I.getDbgValueRange())
+      printDPValueLine(DPV);
     printInstructionLine(I);
   }
 
@@ -4091,12 +4093,6 @@ void AssemblyWriter::printInfoComment(const Value &V) {
 
   if (AnnotationWriter) {
     AnnotationWriter->printInfoComment(V, Out);
-  } else if (const Instruction *I = dyn_cast<Instruction>(&V)) {
-    if (I->DbgMarker) {
-      // In the new, experimental DPValue representation of debug-info, print
-      // out which instructions have DPMarkers and where they are.
-      Out << "; dbgmarker @ " << I->DbgMarker;
-    }
   }
 }
 
@@ -4571,12 +4567,10 @@ void AssemblyWriter::printDPMarker(const DPMarker &Marker) {
   return;
 }
 
-void AssemblyWriter::printDPValue(const DPValue &Value) {
-  // There's no formal representation of a DPValue -- print purely as a
-  // debugging aid.
-  Out << "  DPValue ";
-
-  switch (Value.getType()) {
+void AssemblyWriter::printDPValue(const DPValue &DPV) {
+  auto WriterCtx = getContext();
+  Out << "#dbg_";
+  switch (DPV.getType()) {
   case DPValue::LocationType::Value:
     Out << "value";
     break;
@@ -4588,28 +4582,34 @@ void AssemblyWriter::printDPValue(const DPValue &Value) {
     break;
   default:
     llvm_unreachable("Tried to print a DPValue with an invalid LocationType!");
-  }
+  };
   Out << " { ";
-  auto WriterCtx = getContext();
-  WriteAsOperandInternal(Out, Value.getRawLocation(), WriterCtx, true);
+  WriteAsOperandInternal(Out, DPV.getRawLocation(), WriterCtx, true);
   Out << ", ";
-  WriteAsOperandInternal(Out, Value.getVariable(), WriterCtx, true);
+  WriteAsOperandInternal(Out, DPV.getVariable(), WriterCtx, true);
   Out << ", ";
-  WriteAsOperandInternal(Out, Value.getExpression(), WriterCtx, true);
+  WriteAsOperandInternal(Out, DPV.getExpression(), WriterCtx, true);
   Out << ", ";
-  if (Value.isDbgAssign()) {
-    WriteAsOperandInternal(Out, Value.getAssignID(), WriterCtx, true);
+  if (DPV.isDbgAssign()) {
+    WriteAsOperandInternal(Out, DPV.getAssignID(), WriterCtx, true);
     Out << ", ";
-    WriteAsOperandInternal(Out, Value.getRawAddress(), WriterCtx, true);
+    WriteAsOperandInternal(Out, DPV.getRawAddress(), WriterCtx, true);
     Out << ", ";
-    WriteAsOperandInternal(Out, Value.getAddressExpression(), WriterCtx, true);
+    WriteAsOperandInternal(Out, DPV.getAddressExpression(), WriterCtx, true);
     Out << ", ";
   }
-  WriteAsOperandInternal(Out, Value.getDebugLoc().get(), WriterCtx, true);
-  Out << " marker @" << Value.getMarker();
+  WriteAsOperandInternal(Out, DPV.getDebugLoc().getAsMDNode(), WriterCtx, true);
   Out << " }";
 }
 
+/// printDPValueLine - Print a DPValue with indentation and a newline character.
+void AssemblyWriter::printDPValueLine(const DPValue &DPV) {
+  // Print lengthier indentation to bring out-of-line with instructions.
+  Out << "    ";
+  printDPValue(DPV);
+  Out << '\n';
+}
+
 void AssemblyWriter::printMetadataAttachments(
     const SmallVectorImpl<std::pair<unsigned, MDNode *>> &MDs,
     StringRef Separator) {
@@ -4755,19 +4755,11 @@ void BasicBlock::print(raw_ostream &ROS, AssemblyAnnotationWriter *AAW,
 
 void Module::print(raw_ostream &ROS, AssemblyAnnotationWriter *AAW,
                    bool ShouldPreserveUseListOrder, bool IsForDebug) const {
-  // RemoveDIs: always print with debug-info in intrinsic format.
-  bool ConvertAfter = IsNewDbgInfoFormat;
-  if (IsNewDbgInfoFormat)
-    const_cast<Module *>(this)->convertFromNewDbgValues();
-
   SlotTracker SlotTable(this);
   formatted_raw_ostream OS(ROS);
   AssemblyWriter W(OS, SlotTable, this, AAW, IsForDebug,
                    ShouldPreserveUseListOrder);
   W.printModule(this);
-
-  if (ConvertAfter)
-    const_cast<Module *>(this)->convertToNewDbgValues();
 }
 
 void NamedMDNode::print(raw_ostream &ROS, bool IsForDebug) const {
@@ -4875,8 +4867,6 @@ void DPMarker::print(raw_ostream &ROS, ModuleSlotTracker &MST,
 
 void DPValue::print(raw_ostream &ROS, ModuleSlotTracker &MST,
                     bool IsForDebug) const {
-  // There's no formal representation of a DPValue -- print purely as a
-  // debugging aid.
   formatted_raw_ostream OS(ROS);
   SlotTracker EmptySlotTable(static_cast<const Module *>(nullptr));
   SlotTracker &SlotTable =
diff --git a/llvm/lib/IR/BasicBlock.cpp b/llvm/lib/IR/BasicBlock.cpp
index dca528328384701..b2d40e5bb55af3e 100644
--- a/llvm/lib/IR/BasicBlock.cpp
+++ b/llvm/lib/IR/BasicBlock.cpp
@@ -61,10 +61,6 @@ DPMarker *BasicBlock::createMarker(InstListType::iterator It) {
 }
 
 void BasicBlock::convertToNewDbgValues() {
-  // Is the command line option set?
-  if (!UseNewDbgInfoFormat)
-    return;
-
   IsNewDbgInfoFormat = true;
 
   // Iterate over all instructions in the instruction list, collecting dbg.value
diff --git a/llvm/lib/IR/IRPrintingPasses.cpp b/llvm/lib/IR/IRPrintingPasses.cpp
index b19210e776ed5cb..dacec22a4e0c6e0 100644
--- a/llvm/lib/IR/IRPrintingPasses.cpp
+++ b/llvm/lib/IR/IRPrintingPasses.cpp
@@ -23,6 +23,8 @@
 
 using namespace llvm;
 
+extern cl::opt<bool> WriteNewDbgInfoFormat;
+
 namespace {
 
 class PrintModulePassWrapper : public ModulePass {
@@ -39,11 +41,15 @@ class PrintModulePassWrapper : public ModulePass {
         ShouldPreserveUseListOrder(ShouldPreserveUseListOrder) {}
 
   bool runOnModule(Module &M) override {
-    // RemoveDIs: there's no textual representation of the DPValue debug-info,
-    // convert to dbg.values before writing out.
-    bool IsNewDbgInfoFormat = M.IsNewDbgInfoFormat;
-    if (IsNewDbgInfoFormat)
-      M.convertFromNewDbgValues();
+    // RemoveDIs: Regardless of the format we've processed this module in, use
+    // `WriteNewDbgInfoFormat` to determine which format we use to write it.
+    bool ShouldConvert = M.IsNewDbgInfoFormat != WriteNewDbgInfoFormat;
+    if (ShouldConvert) {
+      if (WriteNewDbgInfoFormat)
+        M.convertToNewDbgValues();
+      else
+        M.convertFromNewDbgValues();
+    }
 
     if (llvm::isFunctionInPrintList("*")) {
       if (!Banner.empty())
@@ -62,8 +68,12 @@ class PrintModulePassWrapper : public ModulePass {
       }
     }
 
-    if (IsNewDbgInfoFormat)
-      M.convertToNewDbgValues();
+    if (ShouldConvert) {
+      if (WriteNewDbgInfoFormat)
+        M.convertFromNewDbgValues();
+      else
+        M.convertToNewDbgValues();
+    }
 
     return false;
   }
@@ -87,11 +97,15 @@ class PrintFunctionPassWrapper : public FunctionPass {
 
   // This pass just prints a banner followed by the function as it's processed.
   bool runOnFunction(Function &F) override {
-    // RemoveDIs: there's no textual representation of the DPValue debug-info,
-    // convert to dbg.values before writing out.
-    bool IsNewDbgInfoFormat = F.IsNewDbgInfoFormat;
-    if (IsNewDbgInfoFormat)
-      F.convertFromNewDbgValues();
+    // RemoveDIs: Regardless of the format we've processed this function in, use
+    // `WriteNewDbgInfoFormat` to determine which format we use to write it.
+    bool ShouldConvert = F.IsNewDbgInfoFormat != WriteNewDbgInfoFormat;
+    if (ShouldConvert) {
+      if (WriteNewDbgInfoFormat)
+        F.convertToNewDbgValues();
+      else
+        F.convertFromNewDbgValues();
+    }
 
     if (isFunctionInPrintList(F.getName())) {
       if (forcePrintModuleIR())
@@ -101,8 +115,12 @@ class PrintFunctionPassWrapper : public FunctionPass {
         OS << Banner << '\n' << static_cast<Value &>(F);
     }
 
-    if (IsNewDbgInfoFormat)
-      F.convertToNewDbgValues();
+    if (ShouldConvert) {
+      if (WriteNewDbgInfoFormat)
+        F.convertFromNewDbgValues();
+      else
+        F.convertToNewDbgValues();
+    }
 
     return false;
   }
diff --git a/llvm/lib/IR/Module.cpp b/llvm/lib/IR/Module.cpp
index eeb90a6cb3c465a..e883eead72749d5 100644
--- a/llvm/lib/IR/Module.cpp
+++ b/llvm/lib/IR/Module.cpp
@@ -85,6 +85,24 @@ Module::~Module() {
   IFuncList.clear();
 }
 
+void Module::removeDebugIntrinsicDeclarations() {
+  auto *DeclareIntrinsicFn =
+      Intrinsic::getDeclaration(this, Intrinsic::dbg_declare);
+  assert(DeclareIntrinsicFn->hasZeroLiveUses() &&
+         "Debug declare intrinsic should have had uses removed.");
+  DeclareIntrinsicFn->eraseFromParent();
+  auto *ValueIntrinsicFn =
+      Intrinsic::getDeclaration(this, Intrinsic::dbg_value);
+  assert(ValueIntrinsicFn->hasZeroLiveUses() &&
+         "Debug value intrinsic should have had uses removed.");
+  ValueIntrinsicFn->eraseFromParent();
+  auto *AssignIntrinsicFn =
+      Intrinsic::getDeclaration(this, Intrinsic::dbg_assign);
+  assert(AssignIntrinsicFn->hasZeroLiveUses() &&
+         "Debug assign intrinsic should have had uses removed.");
+  AssignIntrinsicFn->eraseFromParent();
+}
+
 std::unique_ptr<RandomNumberGenerator>
 Module::createRNG(const StringRef Name) const {
   SmallString<32> Salt(Name);
diff --git a/llvm/lib/IRPrinter/IRPrintingPasses.cpp b/llvm/lib/IRPrinter/IRPrintingPasses.cpp
index 52b242b4dcd52a5..d770e0be3e85c7a 100644
--- a/llvm/lib/IRPrinter/IRPrintingPasses.cpp
+++ b/llvm/lib/IRPrinter/IRPrintingPasses.cpp
@@ -22,6 +22,11 @@
 
 using namespace llvm;
 
+cl::opt<bool> WriteNewDbgInfoFormat(
+    "write-experimental-debuginfo",
+    cl::desc("Write debug info in the new non-intrinsic format"),
+    cl::init(false));
+
 PrintModulePass::PrintModulePass() : OS(dbgs()) {}
 PrintModulePass::PrintModulePass(raw_ostream &OS, const std::string &Banner,
                                  bool ShouldPreserveUseListOrder,
@@ -31,11 +36,15 @@ PrintModulePass::PrintModulePass(raw_ostream &OS, const std::string &Banner,
       EmitSummaryIndex(EmitSummaryIndex) {}
 
 PreservedAnalyses PrintModulePass::run(Module &M, ModuleAnalysisManager &AM) {
-  // RemoveDIs: there's no textual representation of the DPValue debug-info,
-  // convert to dbg.values before writing out.
-  bool ShouldConvert = M.IsNewDbgInfoFormat;
-  if (ShouldConvert)
-    M.convertFromNewDbgValues();
+  // RemoveDIs: Regardless of the format we've processed this module in, use
+  // `WriteNewDbgInfoFormat` to determine which format we use to write it.
+  bool ShouldConvert = M.IsNewDbgInfoFormat != WriteNewDbgInfoFormat;
+  if (ShouldConvert) {
+    if (WriteNewDbgInfoFormat)
+      M.convertToNewDbgValues();
+    else
+      M.convertFromNewDbgValues();
+  }
 
   if (llvm::isFunctionInPrintList("*")) {
     if (!Banner.empty())
@@ -63,8 +72,12 @@ PreservedAnalyses PrintModulePass::run(Module &M, ModuleAnalysisManager &AM) {
     Index->print(OS);
   }
 
-  if (ShouldConvert)
-    M.convertToNewDbgValues();
+  if (ShouldConvert) {
+    if (WriteNewDbgInfoFormat)
+      M.convertFromNewDbgValues();
+    else
+      M.convertToNewDbgValues();
+  }
 
   return PreservedAnalyses::all();
 }
@@ -75,11 +88,15 @@ PrintFunctionPass::PrintFunctionPass(raw_ostream &OS, const std::string &Banner)
 
 PreservedAnalyses PrintFunctionPass::run(Function &F,
                                          FunctionAnalysisManager &) {
-  // RemoveDIs: there's no textual representation of the DPValue debug-info,
-  // convert to dbg.values before writing out.
-  bool ShouldConvert = F.IsNewDbgInfoFormat;
-  if (ShouldConvert)
-    F.convertFromNewDbgValues();
+  // RemoveDIs: Regardless of the format we've processed this function in, use
+  // `WriteNewDbgInfoFormat` to determine which format we use to write it.
+  bool ShouldConvert = F.IsNewDbgInfoFormat != WriteNewDbgInfoFormat;
+  if (ShouldConvert) {
+    if (WriteNewDbgInfoFormat)
+      F.convertToNewDbgValues();
+    else
+      F.convertFromNewDbgValues();
+  }
 
   if (isFunctionInPrintList(F.getName())) {
     if (forcePrintModuleIR())
@@ -88,8 +105,12 @@ PreservedAnalyses PrintFunctionPass::run(Function &F,
       OS << Banner << '\n' << static_cast<Value &>(F);
   }
 
-  if (ShouldConvert)
-    F.convertToNewDbgValues();
+  if (ShouldConvert) {
+    if (WriteNewDbgInfoFormat)
+      F.convertFromNewDbgValues();
+    else
+      F.convertToNewDbgValues();
+  }
 
   return PreservedAnalyses::all();
 }
diff --git a/llvm/test/DebugInfo/print-non-instruction-debug-info.ll b/llvm/test/DebugInfo/print-non-instruction-debug-info.ll
new file mode 100644
index 000000000000000..fcb7d22fee6fadc
--- /dev/null
+++ b/llvm/test/DebugInfo/print-non-instruction-debug-info.ll
@@ -0,0 +1,80 @@
+;; Test that we can write in the new debug info format.
+; RUN: opt --passes=verify -S --write-experimental-debuginfo=false < %s \
+; RUN:   | FileCheck %s --check-prefixes=CHECK,OLDDBG --implicit-check-not=llvm.dbg
+; RUN: opt --passes=verify -S --write-experimental-debuginfo=true < %s \
+; RUN:   | FileCheck %s --check-prefixes=CHECK,NEWDBG --implicit-check-not=llvm.dbg
+
+;; Test also that the new flag is independent of the flag that enables use of
+;; these non-instruction debug info during LLVM passes.
+; RUN: opt --passes=verify -S --try-experimental-debuginfo-iterators --write-experimental-debuginfo=false < %s \
+; RUN:   | FileCheck %s --check-prefixes=CHECK,OLDDBG --implicit-check-not=llvm.dbg
+; RUN: opt --passes=verify -S --try-experimental-debuginfo-iterators --write-experimental-debuginfo=true < %s \
+; RUN:   | FileCheck %s --check-prefixes=CHECK,NEWDBG --implicit-check-not=llvm.dbg
+
+; CHECK: @f(i32 %[[VAL_A:[0-9a-zA-Z]+]])
+; CHECK-NEXT: entry:
+; OLDDBG-NEXT: call void @llvm.dbg.value(metadata i32 %[[VAL_A]], metadata ![[VAR_A:[0-9]+]], metadata !DIExpression()), !dbg ![[LOC_1:[0-9]+]]
+; NEWDBG-NEXT: {{^}}    #dbg_value { i32 %[[VAL_A]], ![[VAR_A:[0-9]+]], !DIExpression(), ![[LOC_1:[0-9]+]] }
+; CHECK-NEXT: {{^}}  %[[VAL_B:[0-9a-zA-Z]+]] = alloca
+; OLDDBG-NEXT: call void @llvm.dbg.declare(metadata ptr %[[VAL_B]], metadata ![[VAR_B:[0-9]+]], metadata !DIExpression()), !dbg ![[LOC_2:[0-9]+]]
+; NEWDBG-NEXT: {{^}}    #dbg_declare { ptr %[[VAL_B]], ![[VAR_B:[0-9]+]], !DIExpression(), ![[LOC_2:[0-9]+]] }
+; CHECK-NEXT: {{^}}  %[[VAL_ADD:[0-9a-zA-Z]+]] = add i32 %[[VAL_A]], 5
+; OLDDBG-NEXT: call void @llvm.dbg.value(metadata !DIArgList(i32 %[[VAL_A]], i32 %[[VAL_ADD]]), metadata ![[VAR_A]], metadata !DIExpression(DW_OP_LLVM_arg, 0, DW_OP_LLVM_arg, 1, DW_OP_plus)), !dbg ![[LOC_3:[0-9]+]]
+; NEWDBG-NEXT: {{^}}    #dbg_value { !DIArgList(i32 %[[VAL_A]], i32 %[[VAL_ADD]]), ![[VAR_A]], !DIExpression(DW_OP_LLVM_arg, 0, DW_OP_LLVM_arg, 1, DW_OP_plus), ![[LOC_3:[0-9]+]] }
+; CHECK-NEXT: {{^}}  store i32 %[[VAL_ADD]]{{.+}}, !DIAssignID ![[ASSIGNID:[0-9]+]]
+; OLDDBG-NEXT: call void @llvm.dbg.assign(metadata i32 %[[VAL_ADD]], metadata ![[VAR_B]], metadata !DIExpression(), metadata ![[ASSIGNID]], metadata ptr %[[VAL_B]], metadata !DIExpression()), !dbg ![[LOC_4:[0-9]+]]
+; NEWDBG-NEXT: {{^}}    #dbg_assign { i32 %[[VAL_ADD]], ![[VAR_B]], !DIExpression(), ![[ASSIGNID]], ptr %[[VAL_B]], !DIExpression(), ![[LOC_4:[0-9]+]] }
+; CHECK-NEXT: {{^}}  ret i32
+
+; OLDDBG-DAG: declare void @llvm.dbg.value
+; OLDDBG-DAG: declare void @llvm.dbg.declare
+; OLDDBG-DAG: declare void @llvm.dbg.assign
+
+; CHECK-DAG: llvm.dbg.cu
+; CHECK-DAG: ![[VAR_A]] = !DILocalVariable(name: "a"
+; CHECK-DAG: ![[VAR_B]] = !DILocalVariable(name: "b"
+; CHECK-DAG: ![[LOC_1]] = !DILocation(line: 3, column: 15
+; CHECK-DAG: ![[LOC_2]] = !DILocation(line: 3, column: 20
+; CHECK-DAG: ![[LOC_3]] = !DILocation(line: 3, column: 25
+; CHECK-DAG: ![[LOC_4]] = !DILocation(line: 3, column: 30
+
+define dso_local i32 @f(i32 %a) !dbg !7 {
+entry:
+  call void @llvm.dbg.value(metadata i32 %a, metadata !20, metadata !DIExpression()), !dbg !30
+  %b = alloca i32, !dbg !30, !DIAssignID !40
+  call void @llvm.dbg.declare(metadata ptr %b, metadata !21, metadata !DIExpression()), !dbg !31
+  %add = add i32 %a, 5, !dbg !31
+  call void @llvm.dbg.value(metadata !DIArgList(i32 %a, i32 %add), metadata !20, metadata !DIExpression(DW_OP_LLVM_arg, 0, DW_OP_LLVM_arg, 1, DW_OP_plus)), !dbg !32
+  store i32 %add, ptr %b, !dbg !32, !DIAssignID !40
+  call void @llvm.dbg.assign(metadata i32 %add, metadata !21, metadata !DIExpression(), metadata !40, metadata ptr %b, metadata !DIExpression()), !dbg !33
+  ret i32 %add, !dbg !33
+
+}
+
+declare void @llvm.dbg.value(metadata, metadata, metadata)
+declare void @llvm.dbg.declare(metadata, metadata, metadata)
+declare void @llvm.dbg.assign(metadata, metadata, metadata, metadata, metadata, metadata)
+
+!llvm.dbg.cu = !{!0}
+!llvm.module.flags = !{!3, !4, !5}
+!llvm.ident = !{!6}
+
+!0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 18.0.0", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !2, nameTableKind: None)
+!1 = !DIFile(filename: "print.c", directory: "/tmp")
+!2 = !{}
+!3 = !{i32 2, !"Dwarf Version", i32 5}
+!4 = !{i32 2, !"Debug Info Version", i32 3}
+!5 = !{i32 1, !"wchar_size", i32 4}
+!6 = !{!"clang version 18.0.0"}
+!7 = distinct !DISubprogram(name: "f", scope: !1, file: !1, line: 3, type: !8, isLocal: false, isDefinition: true, scopeLine: 3, flags: DIFlagPrototyped, isOptimized: true, unit: !0, retainedNodes: !13)
+!8 = !DISubroutineType(types: !9)
+!9 = !{!12, !12}
+!12 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
+!13 = !{!20, !21}
+!20 = !DILocalVariable(name: "a", arg: 1, scope: !7, file: !1, line:...
[truncated]

Copy link

github-actions bot commented Jan 24, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@nikic
Copy link
Contributor

nikic commented Jan 24, 2024

Why is the syntax #dbg_value {} rather than #dbg_value()? The arguments are positional, right?

@phyBrackets
Copy link
Member

Why is the syntax #dbg_value {} rather than #dbg_value()? The arguments are positional, right?

Seems to be mentioned that using curly braces {} instead of parentheses () makes the debug information stand out more clearly from regular instruction syntax. And I think even the argument are positional, curly braces can still effectively encapsulate them 🤔

@SLTozer
Copy link
Contributor Author

SLTozer commented Jan 24, 2024

Why is the syntax #dbg_value {} rather than #dbg_value()? The arguments are positional, right?

As @phyBrackets mentioned the goal is purely for visual distinction - it would be possible to use parentheses instead. We do use curly braces in contexts where we expect ordered/positional arguments, as with structs and some non-specialized MDNodes; ultimately though it's a minor detail and can easily be changed if there's a strong preference for either option.

@SLTozer SLTozer requested a review from rnk January 24, 2024 16:00
Copy link
Collaborator

@dwblaikie dwblaikie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks pretty good to me with a few minor things.

(because I haven't been keeping up with this work, what does the P in DP stand for?)

Comment on lines 91 to 99
// RemoveDIs: Regardless of the format we've processed this function in, use
// `WriteNewDbgInfoFormat` to determine which format we use to write it.
bool ShouldConvert = F.IsNewDbgInfoFormat != WriteNewDbgInfoFormat;
if (ShouldConvert) {
if (WriteNewDbgInfoFormat)
F.convertToNewDbgValues();
else
F.convertFromNewDbgValues();
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This idiom (& the inverse conversion at the end of the scope) appears in a few places - could it be generalized/refactored into a scoped tool or something to share between the 3-4 places that use it & make them a bit more terse?

@SLTozer
Copy link
Contributor Author

SLTozer commented Jan 24, 2024

(because I haven't been keeping up with this work, what does the P in DP stand for?)

"Debug Program" - it's the term we used earlier on when our model for RemoveDIs was essentially a "shadow function" that duplicated the original function with placeholders instead of instructions ("debug program markers"), where all the debug info would be inserted (as "debug program values") instead of in the real function. We've changed our approach now, so we're also renaming the classes as part of a more general restructure to support dbg.labels (see: #78252) to have DbgRecord as the base class and DbgValueRecord/DbgDeclareRecord/DbgAssignRecord descending from that.

@dwblaikie
Copy link
Collaborator

(because I haven't been keeping up with this work, what does the P in DP stand for?)

"Debug Program" - it's the term we used earlier on when our model for RemoveDIs was essentially a "shadow function" that duplicated the original function with placeholders instead of instructions ("debug program markers"), where all the debug info would be inserted (as "debug program values") instead of in the real function. We've changed our approach now, so we're also renaming the classes as part of a more general restructure to support dbg.labels (see: #78252) to have DbgRecord as the base class and DbgValueRecord/DbgDeclareRecord/DbgAssignRecord descending from that.

Cool, thanks for the context!

Copy link
Member

@phyBrackets phyBrackets left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank You Stephen, LGTM, just got couple of feedback!

@@ -234,6 +238,13 @@ class LLVM_EXTERNAL_VISIBILITY Module {
IsNewDbgInfoFormat = false;
}

void setIsNewDbgInfoFormat(bool NewFlag) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NewFlag could be rename to ISNewFormat

@nikic
Copy link
Contributor

nikic commented Jan 29, 2024

Why is the syntax #dbg_value {} rather than #dbg_value()? The arguments are positional, right?

As @phyBrackets mentioned the goal is purely for visual distinction - it would be possible to use parentheses instead. We do use curly braces in contexts where we expect ordered/positional arguments, as with structs and some non-specialized MDNodes; ultimately though it's a minor detail and can easily be changed if there's a strong preference for either option.

If {} is used, I would generally expect something list-like. This may be an ordered (metadata, struct) or an unordered (attributes) list, but I would expect that I can specify an arbitrary number of arbitrary elements. As far as I understand, for #dbg_value and friends, the schema is fixed.

@SLTozer
Copy link
Contributor Author

SLTozer commented Jan 29, 2024

If {} is used, I would generally expect something list-like. This may be an ordered (metadata, struct) or an unordered (attributes) list, but I would expect that I can specify an arbitrary number of arbitrary elements.

That's reasonable, I'm fine with either format - the thought process is that debug records should be understandable and visually distinct from instructions; if using braces hurts understanding (by creating a false expectation about the nature of the parameter list) then parentheses will be fine, since the records are still distinct from instructions (firstly by the # prefix and secondly by indentation) - though if anyone else has thoughts about what the brackets should or shouldn't be, the discussion is still open of course.
Edit: It'd also be entirely reasonable to not use parentheses at all, as we do for non-call instructions.

@dwblaikie
Copy link
Collaborator

Yeah, the argument in favor of parens works for me - and we have attributes with parens for parameters already, and they aren't IR instructions - so I think there's rough precedent here.

@SLTozer
Copy link
Contributor Author

SLTozer commented Feb 5, 2024

I've updated the implementation to use parentheses - change is:

Old:     #dbg_value { i32 %a, !12, !DIExpression(), !14 }
New:     #dbg_value(i32 %a, !12, !DIExpression(), !14)

@SLTozer
Copy link
Contributor Author

SLTozer commented Feb 16, 2024

The review comments so far have been addressed, either with a fix or a comment; pinging to see if anyone has any more comments or requested changes, or else whether this is in a good place to be landed?

Comment on lines 83 to 84
bool UsedNewDbgInfoFormat = F.IsNewDbgInfoFormat;
F.setIsNewDbgInfoFormat(WriteNewDbgInfoFormat);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make a scoped device for this idiom - for the set/reset rather than having them more loosely coupled like this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done - I chose to keep the comments where they are and pass WriteNewDbgInfoFormat into the constructor rather than folding the comments and variable into the class, just because these are temporary insertions (until we turn off printing the old format) and the behaviour is non-obvious without the comment+variable imo - if you disagree then I can adjust it.

Copy link
Member

@jmorse jmorse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM in principle with a few nits and some test questions. It looks like it'll only be possible to get this format out of opt and other tools that print through the pass-manager, so things like llvm-dis won't immediately be able to print this format. IMO this is fine as we're looking to incrementally expand what parts of LLVM support this format.

On the topic of the format, I think the one area where we're limiting ourselves a little is if we add or remove operands to "records" (ex-intrinsics) in the future. Previously with the intrinsics being variadic, it's been trivial to mess with them, which won't be true going forwards. However IMO some future parsing complexity is a perfectly good price to pay for having non-intrinsic debug-info.

The new format seems fine to me -- after all it's the same data, we're just explicitly making it "not-an-instruction". (We should probably find some syntax highlighters and update them)

@@ -78,6 +78,18 @@ std::string doSystemDiff(StringRef Before, StringRef After,
StringRef OldLineFormat, StringRef NewLineFormat,
StringRef UnchangedLineFormat);

template <typename T> class ScopedDbgInfoFormatSetter {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wants a docu-comment indicating what it's doing -- no need to be detailed as it's unlikely to be a long term facility.

@@ -218,11 +218,15 @@ class LLVM_EXTERNAL_VISIBILITY Module {
/// \ref BasicBlock.
bool IsNewDbgInfoFormat;

void removeDebugIntrinsicDeclarations();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Brief) Docu-comment wanted

@SLTozer
Copy link
Contributor Author

SLTozer commented Feb 19, 2024

It looks like it'll only be possible to get this format out of opt and other tools that print through the pass-manager, so things like llvm-dis won't immediately be able to print this format.

Yeah, the alternative would be to move the conversion step into Module::print, but that would affect debug printing, which isn't desirable I think, and we can easily move functionality piecemeal into tools that want it.

I think the one area where we're limiting ourselves a little is if we add or remove operands to "records" (ex-intrinsics) in the future.

This is a limitation, but one where I decided it wasn't worth implementing any kind of variadic logic because these kind of signature changes happen less than once a year, and in return we can keep the parser logic simple.

Copy link
Member

@jmorse jmorse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM; I'm not aware of any major objections to moving forwards with the new format, so IMO we're alright to land this and work towards test conversions. We can, after all, revise the format if problems crop up.

(Not sure what the convention is on landing / approving while @phyBrackets still has a request-changes bit on the PR?)

@jryans jryans requested a review from phyBrackets February 22, 2024 12:22
@SLTozer
Copy link
Contributor Author

SLTozer commented Feb 22, 2024

(Not sure what the convention is on landing / approving while @phyBrackets still has a request-changes bit on the PR?)

He's already manually accepted the current state, so no issue there. Patch needs rebasing before merging though!

@SLTozer
Copy link
Contributor Author

SLTozer commented Feb 26, 2024

Rebased with support for #dbg_label added - only notable changes are in AsmWriter.cpp, where printDPLabel now does the right thing, and the dbg_label intrinsic having been added to Module's "remove all debug intrinsics" function.

@OCHyams
Copy link
Contributor

OCHyams commented Feb 26, 2024

Label changes LGTM, ty

@SLTozer SLTozer merged commit 0b39825 into llvm:main Feb 26, 2024
@@ -23,6 +23,8 @@

using namespace llvm;

extern cl::opt<bool> WriteNewDbgInfoFormat;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes llvm/lib/IR depend on llvm/lib/IRPrinter (where this is defined), but the latter already depends on the former. That is, this introduces a cyclic dependency.

The definition of this symbol should probably live in this file and be extern in IRPrinter?

Copy link
Member

@MaskRay MaskRay Feb 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with the analysis. If you link the executable with ld.lld --warn-backrefs --no-fatal-warnings, you will get something like

error: backward reference detected: WriteNewDbgInfoFormat in llvm/_objs/Core/IRPrintingPasses.pic.o refers to llvm/_objs/IRPrinter/IRPrintingPasses.pic.o

(This is from a Bazel build).

SLTozer added a commit that referenced this pull request Feb 26, 2024
…ut (#79281)"

Reverted due to failures on buildbots, where a new cl flag was placed
in the wrong file, resulting in link errors.

https://lab.llvm.org/buildbot/#/builders/198/builds/8548

This reverts commit 0b39825.
SLTozer added a commit that referenced this pull request Feb 27, 2024
…put (#79281)"

Fixes the prior issue in which the symbol for a cl-arg was unavailable to
some binaries.

This reverts commit dc06d75.
SLTozer added a commit that referenced this pull request Feb 27, 2024
…l IR output (#79281)""

Reverted due to some test failures on some buildbots.

https://lab.llvm.org/buildbot/#/builders/67/builds/14669

This reverts commit aa43649.
SLTozer added a commit that referenced this pull request Feb 27, 2024
…put (#79281)"

This reapplication changes debug intrinsic declaration removal to only take
place when printing final IR, so that the processing format of the Module
does not affect the output.

This reverts commit d128448.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants