[HLSL][RootSignature] Implement Lexing of DescriptorTables #122981

inbelic · 2025-01-14T22:57:27Z

For the sake of scope, we will let the lexing of floating literals be deferred until needed for Static Samplers. Other than that this pr should allow us to simply define new enumerations/keywords in RootSignatureTokenKinds.def for when they are used in the parser. We could have defined all of these keywords here, but for the sake of correctness in review we will let them be split up.

Define RootSignatureLexer and provide a public LexToken method for external use
Define the file RootSignatureTokenKinds to define required tokens and allow for future custom keywords/enums
Implement the internal methods required to parse the different types of tokens (integers, flag enums, puncuators...)
Add test harness for unit testing and the respective unit tests for lexing the tokens

Resolves #126563

clang/lib/Parse/ParseHLSLRootSignature.cpp

clang/include/clang/Parse/ParseHLSLRootSignature.h

clang/lib/Parse/ParseHLSLRootSignature.cpp

llvmbot · 2025-01-14T23:13:25Z

@llvm/pr-subscribers-hlsl

@llvm/pr-subscribers-clang

Author: Finn Plummer (inbelic)

Changes

Define required tokens to parse a Descriptor Table in TokenKinds.def
Implements a Lexer to handle all of the defined tokens in ParseHLSLRootSignature

This is an initial part of #120472

Full diff: https://github.com/llvm/llvm-project/pull/122981.diff

7 Files Affected:

(added) clang/include/clang/Parse/HLSLRootSignatureTokenKinds.def (+121)
(added) clang/include/clang/Parse/ParseHLSLRootSignature.h (+96)
(modified) clang/lib/Parse/CMakeLists.txt (+1)
(added) clang/lib/Parse/ParseHLSLRootSignature.cpp (+152)
(modified) clang/unittests/CMakeLists.txt (+1)
(added) clang/unittests/Parse/CMakeLists.txt (+26)
(added) clang/unittests/Parse/ParseHLSLRootSignatureTest.cpp (+130)

diff --git a/clang/include/clang/Parse/HLSLRootSignatureTokenKinds.def b/clang/include/clang/Parse/HLSLRootSignatureTokenKinds.def
new file mode 100644
index 00000000000000..5e47af8d4b1364
--- /dev/null
+++ b/clang/include/clang/Parse/HLSLRootSignatureTokenKinds.def
@@ -0,0 +1,121 @@
+//===--- HLSLRootSignature.def - Tokens and Enum Database -------*- C++ -*-===//
+
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file defines the TokenKinds used in the Root Signature DSL. This
+// includes keywords, enums and a small subset of punctuators. Users of this
+// file must optionally #define the TOK, KEYWORD, ENUM or specific ENUM macros
+// to make use of this file.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef TOK
+#define TOK(X)
+#endif
+#ifndef PUNCTUATOR
+#define PUNCTUATOR(X,Y) TOK(pu_ ## X)
+#endif
+#ifndef KEYWORD
+#define KEYWORD(X) TOK(kw_ ## X)
+#endif
+#ifndef ENUM
+#define ENUM(NAME, LIT) TOK(en_ ## NAME)
+#endif
+
+// Defines the various types of enum
+#ifndef DESCRIPTOR_RANGE_OFFSET_ENUM
+#define DESCRIPTOR_RANGE_OFFSET_ENUM(NAME, LIT) ENUM(NAME, LIT)
+#endif
+#ifndef ROOT_DESCRIPTOR_FLAG_ENUM
+#define ROOT_DESCRIPTOR_FLAG_ENUM(NAME, LIT) ENUM(NAME, LIT)
+#endif
+// Note: ON denotes that the flag is unique from the above Root Descriptor
+//  Flags. This is required to avoid token kind enum conflicts.
+#ifndef DESCRIPTOR_RANGE_FLAG_ENUM_OFF
+#define DESCRIPTOR_RANGE_FLAG_ENUM_OFF(NAME, LIT)
+#endif
+#ifndef DESCRIPTOR_RANGE_FLAG_ENUM_ON
+#define DESCRIPTOR_RANGE_FLAG_ENUM_ON(NAME, LIT) ENUM(NAME, LIT)
+#endif
+#ifndef DESCRIPTOR_RANGE_FLAG_ENUM
+#define DESCRIPTOR_RANGE_FLAG_ENUM(NAME, LIT, ON) DESCRIPTOR_RANGE_FLAG_ENUM_##ON(NAME, LIT)
+#endif
+#ifndef SHADER_VISIBILITY_ENUM
+#define SHADER_VISIBILITY_ENUM(NAME, LIT) ENUM(NAME, LIT)
+#endif
+
+// General Tokens:
+TOK(invalid)
+TOK(int_literal)
+
+// Register Tokens:
+TOK(bReg)
+TOK(tReg)
+TOK(uReg)
+TOK(sReg)
+
+// Punctuators:
+PUNCTUATOR(l_paren, '(')
+PUNCTUATOR(r_paren, ')')
+PUNCTUATOR(comma,   ',')
+PUNCTUATOR(or,      '|')
+PUNCTUATOR(equal,   '=')
+
+// RootElement Keywords:
+KEYWORD(DescriptorTable)
+
+// DescriptorTable Keywords:
+KEYWORD(CBV)
+KEYWORD(SRV)
+KEYWORD(UAV)
+KEYWORD(Sampler)
+
+// General Parameter Keywords:
+KEYWORD(space)
+KEYWORD(visibility)
+KEYWORD(flags)
+
+// View Parameter Keywords:
+KEYWORD(numDescriptors)
+KEYWORD(offset)
+
+// Descriptor Range Offset Enum:
+DESCRIPTOR_RANGE_OFFSET_ENUM(DescriptorRangeOffsetAppend, "DESCRIPTOR_RANGE_OFFSET_APPEND")
+
+// Root Descriptor Flag Enums:
+ROOT_DESCRIPTOR_FLAG_ENUM(DataVolatile, "DATA_VOLATILE")
+ROOT_DESCRIPTOR_FLAG_ENUM(DataStaticWhileSetAtExecute, "DATA_STATIC_WHILE_SET_AT_EXECUTE")
+ROOT_DESCRIPTOR_FLAG_ENUM(DataStatic, "DATA_STATIC")
+
+// Descriptor Range Flag Enums:
+DESCRIPTOR_RANGE_FLAG_ENUM(DescriptorsVolatile, "DESCRIPTORS_VOLATILE", ON)
+DESCRIPTOR_RANGE_FLAG_ENUM(DataVolatile, "DATA_VOLATILE", OFF)
+DESCRIPTOR_RANGE_FLAG_ENUM(DataStaticWhileSetAtExecute, "DATA_STATIC_WHILE_SET_AT_EXECUTE", OFF)
+DESCRIPTOR_RANGE_FLAG_ENUM(DataStatic, "DATA_STATIC", OFF)
+DESCRIPTOR_RANGE_FLAG_ENUM(DescriptorsStaticKeepingBufferBoundsChecks, "DESCRIPTORS_STATIC_KEEPING_BUFFER_BOUNDS_CHECKS", ON)
+
+// Shader Visibiliy Enums:
+SHADER_VISIBILITY_ENUM(All, "SHADER_VISIBILITY_ALL")
+SHADER_VISIBILITY_ENUM(Vertex, "SHADER_VISIBILITY_VERTEX")
+SHADER_VISIBILITY_ENUM(Hull, "SHADER_VISIBILITY_HULL")
+SHADER_VISIBILITY_ENUM(Domain, "SHADER_VISIBILITY_DOMAIN")
+SHADER_VISIBILITY_ENUM(Geometry, "SHADER_VISIBILITY_GEOMETRY")
+SHADER_VISIBILITY_ENUM(Pixel, "SHADER_VISIBILITY_PIXEL")
+SHADER_VISIBILITY_ENUM(Amplification, "SHADER_VISIBILITY_AMPLIFICATION")
+SHADER_VISIBILITY_ENUM(Mesh, "SHADER_VISIBILITY_MESH")
+
+#undef SHADER_VISIBILITY_ENUM
+#undef DESCRIPTOR_RANGE_FLAG_ENUM
+#undef DESCRIPTOR_RANGE_FLAG_ENUM_OFF
+#undef DESCRIPTOR_RANGE_FLAG_ENUM_ON
+#undef ROOT_DESCRIPTOR_FLAG_ENUM
+#undef DESCRIPTOR_RANGE_OFFSET_ENUM
+#undef ENUM
+#undef KEYWORD
+#undef PUNCTUATOR
+#undef TOK
diff --git a/clang/include/clang/Parse/ParseHLSLRootSignature.h b/clang/include/clang/Parse/ParseHLSLRootSignature.h
new file mode 100644
index 00000000000000..6c534411e754a0
--- /dev/null
+++ b/clang/include/clang/Parse/ParseHLSLRootSignature.h
@@ -0,0 +1,96 @@
+//===--- ParseHLSLRootSignature.h -------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+//  This file defines the ParseHLSLRootSignature interface.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_CLANG_PARSE_PARSEHLSLROOTSIGNATURE_H
+#define LLVM_CLANG_PARSE_PARSEHLSLROOTSIGNATURE_H
+
+#include "clang/Lex/LiteralSupport.h"
+#include "clang/Lex/Preprocessor.h"
+
+#include "llvm/ADT/APInt.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/ADT/StringSwitch.h"
+
+namespace llvm {
+namespace hlsl {
+namespace root_signature {
+
+struct RootSignatureToken {
+  enum Kind {
+#define TOK(X) X,
+#include "clang/Parse/HLSLRootSignatureTokenKinds.def"
+  };
+
+  Kind Kind = Kind::invalid;
+
+  // Retain the SouceLocation of the token for diagnostics
+  clang::SourceLocation TokLoc;
+
+  // Retain if the uint32_t bits represent a signed integer
+  bool Signed = false;
+  union {
+    uint32_t IntLiteral = 0;
+    float FloatLiteral;
+  };
+
+  // Constructors
+  RootSignatureToken() {}
+  RootSignatureToken(clang::SourceLocation TokLoc) : TokLoc(TokLoc) {}
+};
+using TokenKind = enum RootSignatureToken::Kind;
+
+class RootSignatureLexer {
+public:
+  RootSignatureLexer(StringRef Signature, clang::SourceLocation SourceLoc,
+                     clang::Preprocessor &PP)
+      : Buffer(Signature), SourceLoc(SourceLoc), PP(PP) {}
+
+  // Consumes the internal buffer as a list of tokens and will emplace them
+  // onto the given tokens.
+  //
+  // It will consume until it successfully reaches the end of the buffer,
+  // or, until the first error is encountered. The return value denotes if
+  // there was a failure.
+  bool Lex(SmallVector<RootSignatureToken> &Tokens);
+
+  // Get the current source location of the lexer
+  clang::SourceLocation GetLocation() { return SourceLoc; };
+
+private:
+  // Internal buffer to iterate over
+  StringRef Buffer;
+
+  // Passed down parameters from Sema
+  clang::SourceLocation SourceLoc;
+  clang::Preprocessor &PP;
+
+  bool LexNumber(RootSignatureToken &Result);
+
+  // Consumes the internal buffer for a single token.
+  //
+  // The return value denotes if there was a failure.
+  bool LexToken(RootSignatureToken &Token);
+
+  // Advance the buffer by the specified number of characters. Updates the
+  // SourceLocation appropriately.
+  void AdvanceBuffer(unsigned NumCharacters = 1) {
+    Buffer = Buffer.drop_front(NumCharacters);
+    SourceLoc = SourceLoc.getLocWithOffset(NumCharacters);
+  }
+};
+
+} // namespace root_signature
+} // namespace hlsl
+} // namespace llvm
+
+#endif // LLVM_CLANG_PARSE_PARSEHLSLROOTSIGNATURE_H
diff --git a/clang/lib/Parse/CMakeLists.txt b/clang/lib/Parse/CMakeLists.txt
index 22e902f7e1bc50..00fde537bb9c60 100644
--- a/clang/lib/Parse/CMakeLists.txt
+++ b/clang/lib/Parse/CMakeLists.txt
@@ -14,6 +14,7 @@ add_clang_library(clangParse
   ParseExpr.cpp
   ParseExprCXX.cpp
   ParseHLSL.cpp
+  ParseHLSLRootSignature.cpp
   ParseInit.cpp
   ParseObjc.cpp
   ParseOpenMP.cpp
diff --git a/clang/lib/Parse/ParseHLSLRootSignature.cpp b/clang/lib/Parse/ParseHLSLRootSignature.cpp
new file mode 100644
index 00000000000000..baa9950163a5c6
--- /dev/null
+++ b/clang/lib/Parse/ParseHLSLRootSignature.cpp
@@ -0,0 +1,152 @@
+#include "clang/Parse/ParseHLSLRootSignature.h"
+
+namespace llvm {
+namespace hlsl {
+namespace root_signature {
+
+// Lexer Definitions
+
+static bool IsPreprocessorNumberChar(char C) {
+  // TODO: extend for float support with or without hexadecimal/exponent
+  return isdigit(C); // integer support
+}
+
+bool RootSignatureLexer::LexNumber(RootSignatureToken &Result) {
+  // NumericLiteralParser does not handle the sign so we will manually apply it
+  Result.Signed = Buffer.front() == '-';
+  if (Result.Signed)
+    AdvanceBuffer();
+
+  // Retrieve the possible number
+  StringRef NumSpelling = Buffer.take_while(IsPreprocessorNumberChar);
+
+  // Parse the numeric value and so semantic checks on its specification
+  clang::NumericLiteralParser Literal(NumSpelling, SourceLoc,
+                                      PP.getSourceManager(), PP.getLangOpts(),
+                                      PP.getTargetInfo(), PP.getDiagnostics());
+  if (Literal.hadError)
+    return true; // Error has already been reported so just return
+
+  // Retrieve the number value to store into the token
+  if (Literal.isIntegerLiteral()) {
+    Result.Kind = TokenKind::int_literal;
+
+    APSInt X = APSInt(32, Result.Signed);
+    if (Literal.GetIntegerValue(X))
+      return true; // TODO: Report overflow error
+
+    X = Result.Signed ? -X : X;
+    Result.IntLiteral = (uint32_t)X.getZExtValue();
+  } else {
+    return true; // TODO: report unsupported number literal specification
+  }
+
+  AdvanceBuffer(NumSpelling.size());
+  return false;
+}
+
+bool RootSignatureLexer::Lex(SmallVector<RootSignatureToken> &Tokens) {
+  // Discard any leading whitespace
+  AdvanceBuffer(Buffer.take_while(isspace).size());
+
+  while (!Buffer.empty()) {
+    RootSignatureToken Result;
+    if (LexToken(Result))
+      return true;
+
+    // Successfully Lexed the token so we can store it
+    Tokens.push_back(Result);
+
+    // Discard any trailing whitespace
+    AdvanceBuffer(Buffer.take_while(isspace).size());
+  }
+
+  return false;
+}
+
+bool RootSignatureLexer::LexToken(RootSignatureToken &Result) {
+  // Record where this token is in the text for usage in parser diagnostics
+  Result.TokLoc = SourceLoc;
+
+  char C = Buffer.front();
+
+  // Punctuators
+  switch (C) {
+#define PUNCTUATOR(X, Y)                                                       \
+  case Y: {                                                                    \
+    Result.Kind = TokenKind::pu_##X;                                           \
+    AdvanceBuffer();                                                           \
+    return false;                                                              \
+  }
+#include "clang/Parse/HLSLRootSignatureTokenKinds.def"
+  default:
+    break;
+  }
+
+  // Numeric constant
+  if (isdigit(C) || C == '-')
+    return LexNumber(Result);
+
+  // All following tokens require at least one additional character
+  if (Buffer.size() <= 1)
+    return true; // TODO: Report invalid token error
+
+  // Peek at the next character to deteremine token type
+  char NextC = Buffer[1];
+
+  // Registers: [tsub][0-9+]
+  if ((C == 't' || C == 's' || C == 'u' || C == 'b') && isdigit(NextC)) {
+    AdvanceBuffer();
+
+    if (LexNumber(Result))
+      return true;
+
+    // Lex number could also parse a float so ensure it was an unsigned int
+    if (Result.Kind != TokenKind::int_literal || Result.Signed)
+      return true; // Return invalid number literal for register error
+
+    // Convert character to the register type.
+    // This is done after LexNumber to override the TokenKind
+    switch (C) {
+    case 'b':
+      Result.Kind = TokenKind::bReg;
+      break;
+    case 't':
+      Result.Kind = TokenKind::tReg;
+      break;
+    case 'u':
+      Result.Kind = TokenKind::uReg;
+      break;
+    case 's':
+      Result.Kind = TokenKind::sReg;
+      break;
+    default:
+      llvm_unreachable("Switch for an expected token was not provided");
+      return true;
+    }
+    return false;
+  }
+
+  // Keywords and Enums:
+  StringRef TokSpelling =
+      Buffer.take_while([](char C) { return isalnum(C) || C == '_'; });
+
+  // Define a large string switch statement for all the keywords and enums
+  auto Switch = llvm::StringSwitch<TokenKind>(TokSpelling);
+#define KEYWORD(NAME) Switch.Case(#NAME, TokenKind::kw_##NAME);
+#define ENUM(NAME, LIT) Switch.CaseLower(LIT, TokenKind::en_##NAME);
+#include "clang/Parse/HLSLRootSignatureTokenKinds.def"
+
+  // Then attempt to retreive a string from it
+  auto Kind = Switch.Default(TokenKind::invalid);
+  if (Kind == TokenKind::invalid)
+    return true; // TODO: Report invalid identifier
+
+  Result.Kind = Kind;
+  AdvanceBuffer(TokSpelling.size());
+  return false;
+}
+
+} // namespace root_signature
+} // namespace hlsl
+} // namespace llvm
diff --git a/clang/unittests/CMakeLists.txt b/clang/unittests/CMakeLists.txt
index 85d265426ec80b..9b3ce8aa7de739 100644
--- a/clang/unittests/CMakeLists.txt
+++ b/clang/unittests/CMakeLists.txt
@@ -25,6 +25,7 @@ endfunction()
 
 add_subdirectory(Basic)
 add_subdirectory(Lex)
+add_subdirectory(Parse)
 add_subdirectory(Driver)
 if(CLANG_ENABLE_STATIC_ANALYZER)
   add_subdirectory(Analysis)
diff --git a/clang/unittests/Parse/CMakeLists.txt b/clang/unittests/Parse/CMakeLists.txt
new file mode 100644
index 00000000000000..1b7eb4934a46c8
--- /dev/null
+++ b/clang/unittests/Parse/CMakeLists.txt
@@ -0,0 +1,26 @@
+set(LLVM_LINK_COMPONENTS
+  Support
+  )
+
+add_clang_unittest(ParseTests
+  ParseHLSLRootSignatureTest.cpp
+  )
+
+clang_target_link_libraries(ParseTests
+  PRIVATE
+  clangAST
+  clangASTMatchers
+  clangBasic
+  clangFrontend
+  clangParse
+  clangSema
+  clangSerialization
+  clangTooling
+  )
+
+target_link_libraries(ParseTests
+  PRIVATE
+  LLVMTestingAnnotations
+  LLVMTestingSupport
+  clangTesting
+  )
diff --git a/clang/unittests/Parse/ParseHLSLRootSignatureTest.cpp b/clang/unittests/Parse/ParseHLSLRootSignatureTest.cpp
new file mode 100644
index 00000000000000..c430b0657dacf5
--- /dev/null
+++ b/clang/unittests/Parse/ParseHLSLRootSignatureTest.cpp
@@ -0,0 +1,130 @@
+//=== ParseHLSLRootSignatureTest.cpp - Parse Root Signature tests ---------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "clang/Basic/Diagnostic.h"
+#include "clang/Basic/DiagnosticOptions.h"
+#include "clang/Basic/FileManager.h"
+#include "clang/Basic/LangOptions.h"
+#include "clang/Basic/SourceLocation.h"
+#include "clang/Basic/SourceManager.h"
+#include "clang/Basic/TargetInfo.h"
+#include "clang/Lex/HeaderSearch.h"
+#include "clang/Lex/HeaderSearchOptions.h"
+#include "clang/Lex/Lexer.h"
+#include "clang/Lex/ModuleLoader.h"
+#include "clang/Lex/Preprocessor.h"
+#include "clang/Lex/PreprocessorOptions.h"
+
+#include "clang/Parse/ParseHLSLRootSignature.h"
+#include "gtest/gtest.h"
+
+using namespace llvm::hlsl::root_signature;
+using namespace clang;
+
+namespace {
+
+// The test fixture.
+class ParseHLSLRootSignatureTest : public ::testing::Test {
+protected:
+  ParseHLSLRootSignatureTest()
+      : FileMgr(FileMgrOpts), DiagID(new DiagnosticIDs()),
+        Diags(DiagID, new DiagnosticOptions, new IgnoringDiagConsumer()),
+        SourceMgr(Diags, FileMgr), TargetOpts(new TargetOptions) {
+    TargetOpts->Triple = "x86_64-apple-darwin11.1.0";
+    Target = TargetInfo::CreateTargetInfo(Diags, TargetOpts);
+  }
+
+  Preprocessor *CreatePP(StringRef Source, TrivialModuleLoader &ModLoader) {
+    std::unique_ptr<llvm::MemoryBuffer> Buf =
+        llvm::MemoryBuffer::getMemBuffer(Source);
+    SourceMgr.setMainFileID(SourceMgr.createFileID(std::move(Buf)));
+
+    HeaderSearch HeaderInfo(std::make_shared<HeaderSearchOptions>(), SourceMgr,
+                            Diags, LangOpts, Target.get());
+    Preprocessor *PP =
+        new Preprocessor(std::make_shared<PreprocessorOptions>(), Diags,
+                         LangOpts, SourceMgr, HeaderInfo, ModLoader,
+                         /*IILookup =*/nullptr,
+                         /*OwnsHeaderSearch =*/false);
+    PP->Initialize(*Target);
+    PP->EnterMainSourceFile();
+    return PP;
+  }
+
+  void CheckTokens(SmallVector<RootSignatureToken> &Computed,
+                   SmallVector<TokenKind> &Expected) {
+    ASSERT_EQ(Computed.size(), Expected.size());
+    for (unsigned I = 0, E = Expected.size(); I != E; ++I) {
+      ASSERT_EQ(Computed[I].Kind, Expected[I]);
+    }
+  }
+
+  FileSystemOptions FileMgrOpts;
+  FileManager FileMgr;
+  IntrusiveRefCntPtr<DiagnosticIDs> DiagID;
+  DiagnosticsEngine Diags;
+  SourceManager SourceMgr;
+  LangOptions LangOpts;
+  std::shared_ptr<TargetOptions> TargetOpts;
+  IntrusiveRefCntPtr<TargetInfo> Target;
+};
+
+TEST_F(ParseHLSLRootSignatureTest, LexValidTokensTest) {
+  const llvm::StringLiteral Source = R"cc(
+    -42
+
+    b0 t43 u987 s234
+
+    (),|=
+
+    DescriptorTable
+
+    CBV SRV UAV Sampler
+    space visibility flags
+    numDescriptors offset
+
+    DESCRIPTOR_RANGE_OFFSET_APPEND
+
+    DATA_VOLATILE
+    DATA_STATIC_WHILE_SET_AT_EXECUTE
+    DATA_STATIC
+    DESCRIPTORS_VOLATILE
+    DESCRIPTORS_STATIC_KEEPING_BUFFER_BOUNDS_CHECKS
+
+    shader_visibility_all
+    shader_visibility_vertex
+    shader_visibility_hull
+    shader_visibility_domain
+    shader_visibility_geometry
+    shader_visibility_pixel
+    shader_visibility_amplification
+    shader_visibility_mesh
+  )cc";
+
+  TrivialModuleLoader ModLoader;
+  Preprocessor *PP = CreatePP(Source, ModLoader);
+  auto TokLoc = SourceLocation();
+
+  RootSignatureLexer Lexer(Source, TokLoc, *PP);
+
+  SmallVector<RootSignatureToken> Tokens = {
+      RootSignatureToken() // invalid token for completeness
+  };
+  ASSERT_FALSE(Lexer.Lex(Tokens));
+
+  SmallVector<TokenKind> Expected = {
+#define TOK(NAME) TokenKind::NAME,
+#include "clang/Parse/HLSLRootSignatureTokenKinds.def"
+  };
+
+  CheckTokens(Tokens, Expected);
+
+  delete PP;
+}
+
+} // anonymous namespace

clang/lib/Parse/ParseHLSLRootSignature.cpp

clang/include/clang/Parse/HLSLRootSignatureTokenKinds.def

clang/lib/Parse/ParseHLSLRootSignature.cpp

clang/unittests/Parse/ParseHLSLRootSignatureTest.cpp

clang/include/clang/Parse/ParseHLSLRootSignature.h

V-FEXrt

Personally happy with the code at a high level, though it would be best to have someone with domain knowledge also approve

clang/lib/Parse/ParseHLSLRootSignature.cpp

V-FEXrt · 2025-01-22T17:15:38Z

clang/unittests/Parse/ParseHLSLRootSignatureTest.cpp

+  SmallVector<RootSignatureToken> Tokens = {
+      RootSignatureToken() // invalid token for completeness
+  };
+  ASSERT_FALSE(Lexer.Lex(Tokens));


Would be nice to see a more robust set of failure tests

Yes, I decided to leave out the implementation of generating and testing diagnostics from this pr, just to help with the size. If you feel that should they be here to ensure correctness I could add it.

No, I'm happy to punt it

I'll reiterate, I think this is the wrong tradeoff, because it makes it significantly more difficult to review the follow-up change since we'll need to compare not just the incoming PR but also go back and look at this PR to ensure all the appropriate error states are captured and tested.

clang/unittests/Parse/ParseHLSLRootSignatureTest.cpp

- Defines the RootSignatureLexer class - Defines the test harness required for testing - Implements the punctuator tokens and tests functionality

- Integrate the use of the `NumericLiteralParser` to lex integer literals - Add additional hlsl specific diagnostics messages

inbelic · 2025-01-28T20:30:20Z

Updated to include diagnostics output and the relevant testing.

I have rebased to split the pr into smaller incremental changes. But I have taken care to address all previous comments.

inbelic · 2025-01-29T22:47:47Z

HLSL Build failure is because there is an assertion that DIAG_SIZE_LEX is not big enough (insufficient). I did add some new diag errors under the Lex category so it seems the threshold is passed on some environments.

From here, it seems that we could, and there is trend to, bump this limit when needed. IIUC, I should create a separate pr to bump this.

clang/lib/Parse/ParseHLSLRootSignature.cpp

bogner · 2025-02-03T19:07:37Z

clang/lib/Parse/ParseHLSLRootSignature.cpp

+  // Retrieve the number value to store into the token
+  Result.Kind = TokenKind::int_literal;
+
+  llvm::APSInt X = llvm::APSInt(32, !Signed);


What happens if we have a value like +2147483648? This would fit in an unsigned 32-bit int, but not a signed one. However, we're deciding that the value is signed if the + is present above. Will this error and tell us that there's overflow, treat the value as unsigned because of the range, or give us a garbage negative number?

If it's one of the first two options, does the one we're implementing match DXC's behaviour here?

clang/unittests/Parse/ParseHLSLRootSignatureTest.cpp

- the previous implementation would not have thrown an overflow error as expected. Instead it would have been treated as a negative int - DXC treats all positively signed ints as an unsigned integer, so for compatibility, we will be doing the same. - add better unit tests to demonstrate expected functionality

damyanp

LGTM - I'm surprised we don't have all of the tokens we're going to need for root signatures included in this PR.

clang/lib/Parse/ParseHLSLRootSignature.cpp

clang/include/clang/Basic/DiagnosticLexKinds.td

llvm-beanz · 2025-02-09T21:00:49Z

clang/lib/Parse/ParseHLSLRootSignature.cpp

+
+  // All following tokens require at least one additional character
+  if (Buffer.size() <= 1) {
+    PP.getDiagnostics().Report(Result.TokLoc, diag::err_hlsl_invalid_token);


This error doesn't seem particularly useful to end users who make a mistake. Why not just return false here because it can't construct a token?

Would you also remove the same error message from here?

They were added just to have some indication that lexing failed.

Is it more useful to the user to note a failure in lexing, or is it more useful at the parsing level to be able to recognize that the token was set to invalid when something else was expected?

Clang tends to do the later, which is why you get errors like "expected identifier", because the parser knows the next thing should be an identifier, but the lexer just knows it didn't produce the right result (or any result).

clang/lib/Parse/ParseHLSLRootSignature.cpp

llvm-beanz · 2025-02-09T21:08:18Z

clang/lib/Parse/ParseHLSLRootSignature.cpp

+  return false;
+}
+
+bool RootSignatureLexer::Lex(SmallVector<RootSignatureToken> &Tokens) {


I'm not sure this function should exist.

Most modern compilers don't lex entire source files into allocated lists of tokens then parse, they do the two at the same time never allocating more than a couple tokens at a time. You can see this in Clang where the ConsumeToken API advances the token to the next token, and is called throughout the parser.

I've seen other compilers where the tokenizer is implemented as effectively an iterator where the iterator state tracks the current token.

For context, the reason compilers tend to do this is because lexing isn't always "fast". In particular translation of numeric literals into actual integer values can be quite slow, so if you lex all the tokens, you do all the transformations even if an error would be detected during parsing. If you lex one token at a time, you stop lexing once the error is encountered providing faster user feedback.

Understood, I have updated this to only support a ConsumeToken/PeekNextToken api.

It could be updated further to define and be used as an iterator. Although from how it is used in the parser (usually through PeekNextToken). I think it would essentially be just renaming ConsumeToken to the ++ operator. Maybe there are other benefits I don't see?

- we want to prevent the pre-allocation of all the tokens and rather let the parser allocate the tokens as it goes along - this allows for a quicker cycle to the user if there is a parsing error as we don't need to lex all of the tokens before trying to parse - as such, this change provides a new api for interacting with the lexer through ConsumeToken and PeekNextToken as opposed to generated the entire list of lexed tokens

github-actions · 2025-02-11T15:54:45Z

✅ With the latest revision this PR passed the C/C++ code formatter.

llvm-beanz

In general I think this is very much moving in the right direction. I have two high level areas of feedback (see my more specific comments inline).

In general, Lexer's don't often have the context to provide actionable error messages. As a result it is usually better for the Lexer to push up an invalid token to the parser and allow the parser to provide a more contextually relevant diagnostic.

The other area is around what is the responsibility of the parser vs the lexer. In general lexers just break the input into tokens, the parser then interprets and validates the tokens. In particular around numbers this is doing a lot of interpretation of the token values. I think it is probably better to have a clearer separation.

clang/lib/Parse/ParseHLSLRootSignature.cpp

clang/include/clang/Basic/DiagnosticLexKinds.td

clang/lib/Parse/ParseHLSLRootSignature.cpp

llvm-beanz · 2025-02-11T17:22:35Z

clang/lib/Parse/ParseHLSLRootSignature.cpp

+
+  // All following tokens require at least one additional character
+  if (Buffer.size() <= 1) {
+    PP.getDiagnostics().Report(Result.TokLoc, diag::err_hlsl_invalid_token);


Is it more useful to the user to note a failure in lexing, or is it more useful at the parsing level to be able to recognize that the token was set to invalid when something else was expected?

Clang tends to do the later, which is why you get errors like "expected identifier", because the parser knows the next thing should be an identifier, but the lexer just knows it didn't produce the right result (or any result).

clang/lib/Parse/ParseHLSLRootSignature.cpp

- instead of reporting the unique end_of_stream error, we can instead pass up an end_of_stream token and let the unexpected diag reporting handles this in the parser with more context

- similarily, we can let the unexpected_token diagnostics reporting handle this error by passing up an invalid token and provide more context

- the current use of invalid is not able to distinguish between a lex error that has been reported to diagnostics and an invalid token that should be reported during parsing - an error token is defined as the default token to denote a lex error that has been reported and shouldn't be handled by the parsing

llvm-beanz · 2025-02-12T20:57:02Z

clang/lib/Parse/ParseHLSLRootSignature.cpp

+
+bool RootSignatureLexer::LexNumber(RootSignatureToken &Result) {
+  // Retrieve the possible number
+  StringRef NumSpelling = Buffer.take_while(IsNumberChar);


If I'm reading this correctly right now if someone writes 1.3 the parser will break the token at the ., call it an integer literal and return 1 for the value. I assume that's not the expected behavior?

Correct. It is the expected behaviour for this pr, as we are not expecting to handle floats until #126565. This was split out because I want to have a pr that will have dedicated documentation and testing to show that the float parsing is equivalent to DXC. Which I think would get bogged down and increase the scope when including it in this pr.

…rror states - as commented, using the `NumericLiteralParser` during lexing of numeric constants falls more under parsing rather than lexing. this commit removes the use of this completely from the lexer - this implies that (after moving the register error to parsing) there are no longer any errors to report during lexing. instead we will just create an `invalid` or `end_of_stream` token and let the parser provide an error with more context - thus we change the api of `ConsumeToken`/`PeekNextToken` so that it no longer needs to denote an error and will just return a token. this greatly simplifies the lexer implementation as we no longer need to pass down the pre-processor

llvm-beanz

My last real question here is whether this belongs in clang/Parse or if it should be in clang/Lex. Since it is now strictly Lexing it shouldn't have any dependencies on the parse or AST libraries right?

llvm-beanz · 2025-02-13T21:37:30Z

clang/include/clang/Parse/ParseHLSLRootSignature.h

+#ifndef LLVM_CLANG_PARSE_PARSEHLSLROOTSIGNATURE_H
+#define LLVM_CLANG_PARSE_PARSEHLSLROOTSIGNATURE_H
+
+#include "clang/AST/APValue.h"


Is this needed?

In general it is also best to prefer forward declarations in headers rather than including headers where possible to reduce compile time overhead.

Removed this and the other no longer needed headers

- remove uneeded headers - rename to new directory

llvm-beanz

One small formatting cleanup in a suggestion. LGTM.

clang/include/clang/Lex/HLSLRootSignatureTokenKinds.def

Co-authored-by: Chris B <[email protected]>

llvm-ci · 2025-02-14T17:32:06Z

LLVM Buildbot has detected a new failure on builder lldb-x86_64-debian running on lldb-x86_64-debian while building clang at step 6 "test".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/162/builds/16230

Here is the relevant piece of the build log for the reference

Step 6 (test) failure: build (failure)
...
PASS: lldb-unit :: Utility/./UtilityTests/77/127 (2731 of 2777)
PASS: lldb-shell :: Subprocess/vfork-follow-parent-wp.test (2732 of 2777)
PASS: lldb-unit :: Utility/./UtilityTests/78/127 (2733 of 2777)
PASS: lldb-unit :: Core/./LLDBCoreTests/48/66 (2734 of 2777)
PASS: lldb-shell :: Unwind/basic-block-sections.test (2735 of 2777)
PASS: lldb-unit :: Core/./LLDBCoreTests/50/66 (2736 of 2777)
PASS: lldb-api :: tools/lldb-server/attach-wait/TestGdbRemoteAttachWait.py (2737 of 2777)
PASS: lldb-api :: tools/lldb-server/TestGdbRemoteLaunch.py (2738 of 2777)
PASS: lldb-unit :: Host/./HostTests/2/98 (2739 of 2777)
PASS: lldb-unit :: Host/./HostTests/30/98 (2740 of 2777)
FAIL: lldb-api :: tools/lldb-dap/launch/TestDAP_launch.py (2741 of 2777)
******************** TEST 'lldb-api :: tools/lldb-dap/launch/TestDAP_launch.py' FAILED ********************
Script:
--
/usr/bin/python3 /home/worker/2.0.1/lldb-x86_64-debian/llvm-project/lldb/test/API/dotest.py -u CXXFLAGS -u CFLAGS --env LLVM_LIBS_DIR=/home/worker/2.0.1/lldb-x86_64-debian/build/./lib --env LLVM_INCLUDE_DIR=/home/worker/2.0.1/lldb-x86_64-debian/build/include --env LLVM_TOOLS_DIR=/home/worker/2.0.1/lldb-x86_64-debian/build/./bin --arch x86_64 --build-dir /home/worker/2.0.1/lldb-x86_64-debian/build/lldb-test-build.noindex --lldb-module-cache-dir /home/worker/2.0.1/lldb-x86_64-debian/build/lldb-test-build.noindex/module-cache-lldb/lldb-api --clang-module-cache-dir /home/worker/2.0.1/lldb-x86_64-debian/build/lldb-test-build.noindex/module-cache-clang/lldb-api --executable /home/worker/2.0.1/lldb-x86_64-debian/build/./bin/lldb --compiler /home/worker/2.0.1/lldb-x86_64-debian/build/./bin/clang --dsymutil /home/worker/2.0.1/lldb-x86_64-debian/build/./bin/dsymutil --make /usr/bin/gmake --llvm-tools-dir /home/worker/2.0.1/lldb-x86_64-debian/build/./bin --lldb-obj-root /home/worker/2.0.1/lldb-x86_64-debian/build/tools/lldb --lldb-libs-dir /home/worker/2.0.1/lldb-x86_64-debian/build/./lib -t /home/worker/2.0.1/lldb-x86_64-debian/llvm-project/lldb/test/API/tools/lldb-dap/launch -p TestDAP_launch.py
--
Exit Code: 1

Command Output (stdout):
--
lldb version 21.0.0git (https://github.com/llvm/llvm-project.git revision b41b86a907f653f79bab10d4c80b3a41d146c71b)
  clang revision b41b86a907f653f79bab10d4c80b3a41d146c71b
  llvm revision b41b86a907f653f79bab10d4c80b3a41d146c71b
Skipping the following test categories: ['libc++', 'dsym', 'gmodules', 'debugserver', 'objc']
========= DEBUG ADAPTER PROTOCOL LOGS =========
1739554236.560714722 --> 
Content-Length: 344

{
  "arguments": {
    "adapterID": "lldb-native",
    "clientID": "vscode",
    "columnsStartAt1": true,
    "linesStartAt1": true,
    "locale": "en-us",
    "pathFormat": "path",
    "sourceInitFile": false,
    "supportsRunInTerminalRequest": true,
    "supportsStartDebuggingRequest": true,
    "supportsVariablePaging": true,
    "supportsVariableType": true
  },
  "command": "initialize",
  "seq": 1,
  "type": "request"
}
1739554236.563421011 <-- 
Content-Length: 1631

For the sake of scope, we will let the lexing of floating literals be deferred until needed for Static Samplers. Other than that this pr should allow us to simply define new enumerations/keywords in `RootSignatureTokenKinds.def` for when they are used in the parser. We could have defined all of these keywords here, but for the sake of correctness in review we will let them be split up. - Define `RootSignatureLexer` and provide a public `LexToken` method for external use - Define the file `RootSignatureTokenKinds` to define required tokens and allow for future custom keywords/enums - Implement the internal methods required to parse the different types of tokens (integers, flag enums, puncuators...) - Add test harness for unit testing and the respective unit tests for lexing the tokens Resolves llvm#126563 --------- Co-authored-by: Chris B <[email protected]>

inbelic mentioned this pull request Jan 14, 2025

[HLSL][RootSignature] Implement Parsing of Descriptor Tables #122982

Closed

inbelic commented Jan 14, 2025

View reviewed changes

clang/lib/Parse/ParseHLSLRootSignature.cpp Outdated Show resolved Hide resolved

inbelic commented Jan 14, 2025

View reviewed changes

clang/include/clang/Parse/ParseHLSLRootSignature.h Outdated Show resolved Hide resolved

inbelic commented Jan 14, 2025

View reviewed changes

clang/lib/Parse/ParseHLSLRootSignature.cpp Outdated Show resolved Hide resolved

inbelic marked this pull request as ready for review January 14, 2025 23:12

llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" HLSL HLSL Language Support labels Jan 14, 2025

inbelic commented Jan 15, 2025

View reviewed changes

clang/lib/Parse/ParseHLSLRootSignature.cpp Outdated Show resolved Hide resolved

inbelic commented Jan 15, 2025

View reviewed changes

clang/include/clang/Parse/HLSLRootSignatureTokenKinds.def Outdated Show resolved Hide resolved

V-FEXrt reviewed Jan 15, 2025

View reviewed changes

inbelic commented Jan 20, 2025

View reviewed changes

clang/include/clang/Parse/ParseHLSLRootSignature.h Outdated Show resolved Hide resolved

V-FEXrt approved these changes Jan 22, 2025

View reviewed changes

damyanp reviewed Jan 23, 2025

View reviewed changes

clang/unittests/Parse/ParseHLSLRootSignatureTest.cpp Outdated Show resolved Hide resolved

[HLSL][RootSignature] Initial Lexer Definition with puncuators

98deff6

- Defines the RootSignatureLexer class - Defines the test harness required for testing - Implements the punctuator tokens and tests functionality

inbelic linked an issue Jan 27, 2025 that may be closed by this pull request

[HLSL] Implement the Root Signature parser #120472

Open

inbelic added 4 commits January 28, 2025 00:02

Add lexing of integer literals

5f43ed8

- Integrate the use of the `NumericLiteralParser` to lex integer literals - Add additional hlsl specific diagnostics messages

Add support for lexing registers

dc0c7ba

Add lexing for example keyword and enum

c5d3881

Add lexing for remaining DescriptorTable keywords and enums

dc784ff

inbelic force-pushed the inbelic/rootsig-dt branch from 0a1df31 to dc784ff Compare January 28, 2025 20:08

bogner reviewed Feb 3, 2025

View reviewed changes

inbelic added 3 commits February 4, 2025 18:22

review: update uses of llvm_unreachable

82c645d

review: update target test triple with comment

1c95e6d

damyanp reviewed Feb 8, 2025

View reviewed changes

clang/lib/Parse/ParseHLSLRootSignature.cpp Outdated Show resolved Hide resolved

llvm-beanz reviewed Feb 9, 2025

View reviewed changes

inbelic removed a link to an issue Feb 10, 2025

[HLSL] Implement the Root Signature parser #120472

Open

inbelic linked an issue Feb 10, 2025 that may be closed by this pull request

[HLSL] Implement Root Signature Lexer for Descriptor Tables #126563

Closed

4 tasks

inbelic added 3 commits February 10, 2025 20:29

review: update diagnostics message

de2f62e

review: update linked todo

f557d5a

clang format

35c1566

llvm-beanz reviewed Feb 11, 2025

View reviewed changes

inbelic added 4 commits February 11, 2025 21:19

review: fix some typos

8ac979c

review: remove unique end_of_stream lexing error

0c372bf

- instead of reporting the unique end_of_stream error, we can instead pass up an end_of_stream token and let the unexpected diag reporting handles this in the parser with more context

review: remove unique invalid_token lexing error

e58c89a

- similarily, we can let the unexpected_token diagnostics reporting handle this error by passing up an invalid token and provide more context

review: punt parsing of a signed integer to the parser

aa7a73d

inbelic mentioned this pull request Feb 12, 2025

[HLSL] Define the HLSLRootSignature Attr #123985

Open

llvm-beanz reviewed Feb 12, 2025

View reviewed changes

llvm-beanz reviewed Feb 13, 2025

View reviewed changes

inbelic added 3 commits February 13, 2025 22:15

review: move to lex dir

5439fa8

review: clean up after moving

cf9788e

- remove uneeded headers - rename to new directory

self-review: whoops missed renaming header guards

6a5545f

llvm-beanz approved these changes Feb 13, 2025

View reviewed changes

clang/include/clang/Lex/HLSLRootSignatureTokenKinds.def Outdated Show resolved Hide resolved

Update clang/include/clang/Lex/HLSLRootSignatureTokenKinds.def

7f27295

Co-authored-by: Chris B <[email protected]>

inbelic merged commit b41b86a into llvm:main Feb 14, 2025
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HLSL][RootSignature] Implement Lexing of DescriptorTables #122981

[HLSL][RootSignature] Implement Lexing of DescriptorTables #122981

inbelic commented Jan 14, 2025 •

edited

Loading

llvmbot commented Jan 14, 2025 •

edited

Loading

V-FEXrt left a comment

V-FEXrt Jan 22, 2025

inbelic Jan 22, 2025

V-FEXrt Jan 22, 2025

llvm-beanz Jan 23, 2025

inbelic commented Jan 28, 2025

inbelic commented Jan 29, 2025

bogner Feb 3, 2025

damyanp left a comment

llvm-beanz Feb 9, 2025

inbelic Feb 11, 2025

llvm-beanz Feb 11, 2025

llvm-beanz Feb 9, 2025

inbelic Feb 11, 2025

github-actions bot commented Feb 11, 2025 •

edited

Loading

llvm-beanz left a comment

llvm-beanz Feb 11, 2025

llvm-beanz Feb 12, 2025

inbelic Feb 12, 2025

llvm-beanz left a comment

llvm-beanz Feb 13, 2025

inbelic Feb 13, 2025

llvm-beanz left a comment

llvm-ci commented Feb 14, 2025

[HLSL][RootSignature] Implement Lexing of DescriptorTables #122981

[HLSL][RootSignature] Implement Lexing of DescriptorTables #122981

Conversation

inbelic commented Jan 14, 2025 • edited Loading

llvmbot commented Jan 14, 2025 • edited Loading

V-FEXrt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

inbelic commented Jan 28, 2025

inbelic commented Jan 29, 2025

Choose a reason for hiding this comment

damyanp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Feb 11, 2025 • edited Loading

llvm-beanz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

llvm-beanz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

llvm-beanz left a comment

Choose a reason for hiding this comment

llvm-ci commented Feb 14, 2025

inbelic commented Jan 14, 2025 •

edited

Loading

llvmbot commented Jan 14, 2025 •

edited

Loading

github-actions bot commented Feb 11, 2025 •

edited

Loading