Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HLSL][RootSignature] Implement Lexing of DescriptorTables #122981

Merged
merged 22 commits into from
Feb 14, 2025

Conversation

inbelic
Copy link
Contributor

@inbelic inbelic commented Jan 14, 2025

For the sake of scope, we will let the lexing of floating literals be deferred until needed for Static Samplers. Other than that this pr should allow us to simply define new enumerations/keywords in RootSignatureTokenKinds.def for when they are used in the parser. We could have defined all of these keywords here, but for the sake of correctness in review we will let them be split up.

  • Define RootSignatureLexer and provide a public LexToken method for external use
  • Define the file RootSignatureTokenKinds to define required tokens and allow for future custom keywords/enums
  • Implement the internal methods required to parse the different types of tokens (integers, flag enums, puncuators...)
  • Add test harness for unit testing and the respective unit tests for lexing the tokens

Resolves #126563

@inbelic inbelic marked this pull request as ready for review January 14, 2025 23:12
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" HLSL HLSL Language Support labels Jan 14, 2025
@llvmbot
Copy link
Member

llvmbot commented Jan 14, 2025

@llvm/pr-subscribers-hlsl

@llvm/pr-subscribers-clang

Author: Finn Plummer (inbelic)

Changes
  • Define required tokens to parse a Descriptor Table in TokenKinds.def
  • Implements a Lexer to handle all of the defined tokens in ParseHLSLRootSignature

This is an initial part of #120472


Full diff: https://github.com/llvm/llvm-project/pull/122981.diff

7 Files Affected:

  • (added) clang/include/clang/Parse/HLSLRootSignatureTokenKinds.def (+121)
  • (added) clang/include/clang/Parse/ParseHLSLRootSignature.h (+96)
  • (modified) clang/lib/Parse/CMakeLists.txt (+1)
  • (added) clang/lib/Parse/ParseHLSLRootSignature.cpp (+152)
  • (modified) clang/unittests/CMakeLists.txt (+1)
  • (added) clang/unittests/Parse/CMakeLists.txt (+26)
  • (added) clang/unittests/Parse/ParseHLSLRootSignatureTest.cpp (+130)
diff --git a/clang/include/clang/Parse/HLSLRootSignatureTokenKinds.def b/clang/include/clang/Parse/HLSLRootSignatureTokenKinds.def
new file mode 100644
index 00000000000000..5e47af8d4b1364
--- /dev/null
+++ b/clang/include/clang/Parse/HLSLRootSignatureTokenKinds.def
@@ -0,0 +1,121 @@
+//===--- HLSLRootSignature.def - Tokens and Enum Database -------*- C++ -*-===//
+
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file defines the TokenKinds used in the Root Signature DSL. This
+// includes keywords, enums and a small subset of punctuators. Users of this
+// file must optionally #define the TOK, KEYWORD, ENUM or specific ENUM macros
+// to make use of this file.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef TOK
+#define TOK(X)
+#endif
+#ifndef PUNCTUATOR
+#define PUNCTUATOR(X,Y) TOK(pu_ ## X)
+#endif
+#ifndef KEYWORD
+#define KEYWORD(X) TOK(kw_ ## X)
+#endif
+#ifndef ENUM
+#define ENUM(NAME, LIT) TOK(en_ ## NAME)
+#endif
+
+// Defines the various types of enum
+#ifndef DESCRIPTOR_RANGE_OFFSET_ENUM
+#define DESCRIPTOR_RANGE_OFFSET_ENUM(NAME, LIT) ENUM(NAME, LIT)
+#endif
+#ifndef ROOT_DESCRIPTOR_FLAG_ENUM
+#define ROOT_DESCRIPTOR_FLAG_ENUM(NAME, LIT) ENUM(NAME, LIT)
+#endif
+// Note: ON denotes that the flag is unique from the above Root Descriptor
+//  Flags. This is required to avoid token kind enum conflicts.
+#ifndef DESCRIPTOR_RANGE_FLAG_ENUM_OFF
+#define DESCRIPTOR_RANGE_FLAG_ENUM_OFF(NAME, LIT)
+#endif
+#ifndef DESCRIPTOR_RANGE_FLAG_ENUM_ON
+#define DESCRIPTOR_RANGE_FLAG_ENUM_ON(NAME, LIT) ENUM(NAME, LIT)
+#endif
+#ifndef DESCRIPTOR_RANGE_FLAG_ENUM
+#define DESCRIPTOR_RANGE_FLAG_ENUM(NAME, LIT, ON) DESCRIPTOR_RANGE_FLAG_ENUM_##ON(NAME, LIT)
+#endif
+#ifndef SHADER_VISIBILITY_ENUM
+#define SHADER_VISIBILITY_ENUM(NAME, LIT) ENUM(NAME, LIT)
+#endif
+
+// General Tokens:
+TOK(invalid)
+TOK(int_literal)
+
+// Register Tokens:
+TOK(bReg)
+TOK(tReg)
+TOK(uReg)
+TOK(sReg)
+
+// Punctuators:
+PUNCTUATOR(l_paren, '(')
+PUNCTUATOR(r_paren, ')')
+PUNCTUATOR(comma,   ',')
+PUNCTUATOR(or,      '|')
+PUNCTUATOR(equal,   '=')
+
+// RootElement Keywords:
+KEYWORD(DescriptorTable)
+
+// DescriptorTable Keywords:
+KEYWORD(CBV)
+KEYWORD(SRV)
+KEYWORD(UAV)
+KEYWORD(Sampler)
+
+// General Parameter Keywords:
+KEYWORD(space)
+KEYWORD(visibility)
+KEYWORD(flags)
+
+// View Parameter Keywords:
+KEYWORD(numDescriptors)
+KEYWORD(offset)
+
+// Descriptor Range Offset Enum:
+DESCRIPTOR_RANGE_OFFSET_ENUM(DescriptorRangeOffsetAppend, "DESCRIPTOR_RANGE_OFFSET_APPEND")
+
+// Root Descriptor Flag Enums:
+ROOT_DESCRIPTOR_FLAG_ENUM(DataVolatile, "DATA_VOLATILE")
+ROOT_DESCRIPTOR_FLAG_ENUM(DataStaticWhileSetAtExecute, "DATA_STATIC_WHILE_SET_AT_EXECUTE")
+ROOT_DESCRIPTOR_FLAG_ENUM(DataStatic, "DATA_STATIC")
+
+// Descriptor Range Flag Enums:
+DESCRIPTOR_RANGE_FLAG_ENUM(DescriptorsVolatile, "DESCRIPTORS_VOLATILE", ON)
+DESCRIPTOR_RANGE_FLAG_ENUM(DataVolatile, "DATA_VOLATILE", OFF)
+DESCRIPTOR_RANGE_FLAG_ENUM(DataStaticWhileSetAtExecute, "DATA_STATIC_WHILE_SET_AT_EXECUTE", OFF)
+DESCRIPTOR_RANGE_FLAG_ENUM(DataStatic, "DATA_STATIC", OFF)
+DESCRIPTOR_RANGE_FLAG_ENUM(DescriptorsStaticKeepingBufferBoundsChecks, "DESCRIPTORS_STATIC_KEEPING_BUFFER_BOUNDS_CHECKS", ON)
+
+// Shader Visibiliy Enums:
+SHADER_VISIBILITY_ENUM(All, "SHADER_VISIBILITY_ALL")
+SHADER_VISIBILITY_ENUM(Vertex, "SHADER_VISIBILITY_VERTEX")
+SHADER_VISIBILITY_ENUM(Hull, "SHADER_VISIBILITY_HULL")
+SHADER_VISIBILITY_ENUM(Domain, "SHADER_VISIBILITY_DOMAIN")
+SHADER_VISIBILITY_ENUM(Geometry, "SHADER_VISIBILITY_GEOMETRY")
+SHADER_VISIBILITY_ENUM(Pixel, "SHADER_VISIBILITY_PIXEL")
+SHADER_VISIBILITY_ENUM(Amplification, "SHADER_VISIBILITY_AMPLIFICATION")
+SHADER_VISIBILITY_ENUM(Mesh, "SHADER_VISIBILITY_MESH")
+
+#undef SHADER_VISIBILITY_ENUM
+#undef DESCRIPTOR_RANGE_FLAG_ENUM
+#undef DESCRIPTOR_RANGE_FLAG_ENUM_OFF
+#undef DESCRIPTOR_RANGE_FLAG_ENUM_ON
+#undef ROOT_DESCRIPTOR_FLAG_ENUM
+#undef DESCRIPTOR_RANGE_OFFSET_ENUM
+#undef ENUM
+#undef KEYWORD
+#undef PUNCTUATOR
+#undef TOK
diff --git a/clang/include/clang/Parse/ParseHLSLRootSignature.h b/clang/include/clang/Parse/ParseHLSLRootSignature.h
new file mode 100644
index 00000000000000..6c534411e754a0
--- /dev/null
+++ b/clang/include/clang/Parse/ParseHLSLRootSignature.h
@@ -0,0 +1,96 @@
+//===--- ParseHLSLRootSignature.h -------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+//  This file defines the ParseHLSLRootSignature interface.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_CLANG_PARSE_PARSEHLSLROOTSIGNATURE_H
+#define LLVM_CLANG_PARSE_PARSEHLSLROOTSIGNATURE_H
+
+#include "clang/Lex/LiteralSupport.h"
+#include "clang/Lex/Preprocessor.h"
+
+#include "llvm/ADT/APInt.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/ADT/StringSwitch.h"
+
+namespace llvm {
+namespace hlsl {
+namespace root_signature {
+
+struct RootSignatureToken {
+  enum Kind {
+#define TOK(X) X,
+#include "clang/Parse/HLSLRootSignatureTokenKinds.def"
+  };
+
+  Kind Kind = Kind::invalid;
+
+  // Retain the SouceLocation of the token for diagnostics
+  clang::SourceLocation TokLoc;
+
+  // Retain if the uint32_t bits represent a signed integer
+  bool Signed = false;
+  union {
+    uint32_t IntLiteral = 0;
+    float FloatLiteral;
+  };
+
+  // Constructors
+  RootSignatureToken() {}
+  RootSignatureToken(clang::SourceLocation TokLoc) : TokLoc(TokLoc) {}
+};
+using TokenKind = enum RootSignatureToken::Kind;
+
+class RootSignatureLexer {
+public:
+  RootSignatureLexer(StringRef Signature, clang::SourceLocation SourceLoc,
+                     clang::Preprocessor &PP)
+      : Buffer(Signature), SourceLoc(SourceLoc), PP(PP) {}
+
+  // Consumes the internal buffer as a list of tokens and will emplace them
+  // onto the given tokens.
+  //
+  // It will consume until it successfully reaches the end of the buffer,
+  // or, until the first error is encountered. The return value denotes if
+  // there was a failure.
+  bool Lex(SmallVector<RootSignatureToken> &Tokens);
+
+  // Get the current source location of the lexer
+  clang::SourceLocation GetLocation() { return SourceLoc; };
+
+private:
+  // Internal buffer to iterate over
+  StringRef Buffer;
+
+  // Passed down parameters from Sema
+  clang::SourceLocation SourceLoc;
+  clang::Preprocessor &PP;
+
+  bool LexNumber(RootSignatureToken &Result);
+
+  // Consumes the internal buffer for a single token.
+  //
+  // The return value denotes if there was a failure.
+  bool LexToken(RootSignatureToken &Token);
+
+  // Advance the buffer by the specified number of characters. Updates the
+  // SourceLocation appropriately.
+  void AdvanceBuffer(unsigned NumCharacters = 1) {
+    Buffer = Buffer.drop_front(NumCharacters);
+    SourceLoc = SourceLoc.getLocWithOffset(NumCharacters);
+  }
+};
+
+} // namespace root_signature
+} // namespace hlsl
+} // namespace llvm
+
+#endif // LLVM_CLANG_PARSE_PARSEHLSLROOTSIGNATURE_H
diff --git a/clang/lib/Parse/CMakeLists.txt b/clang/lib/Parse/CMakeLists.txt
index 22e902f7e1bc50..00fde537bb9c60 100644
--- a/clang/lib/Parse/CMakeLists.txt
+++ b/clang/lib/Parse/CMakeLists.txt
@@ -14,6 +14,7 @@ add_clang_library(clangParse
   ParseExpr.cpp
   ParseExprCXX.cpp
   ParseHLSL.cpp
+  ParseHLSLRootSignature.cpp
   ParseInit.cpp
   ParseObjc.cpp
   ParseOpenMP.cpp
diff --git a/clang/lib/Parse/ParseHLSLRootSignature.cpp b/clang/lib/Parse/ParseHLSLRootSignature.cpp
new file mode 100644
index 00000000000000..baa9950163a5c6
--- /dev/null
+++ b/clang/lib/Parse/ParseHLSLRootSignature.cpp
@@ -0,0 +1,152 @@
+#include "clang/Parse/ParseHLSLRootSignature.h"
+
+namespace llvm {
+namespace hlsl {
+namespace root_signature {
+
+// Lexer Definitions
+
+static bool IsPreprocessorNumberChar(char C) {
+  // TODO: extend for float support with or without hexadecimal/exponent
+  return isdigit(C); // integer support
+}
+
+bool RootSignatureLexer::LexNumber(RootSignatureToken &Result) {
+  // NumericLiteralParser does not handle the sign so we will manually apply it
+  Result.Signed = Buffer.front() == '-';
+  if (Result.Signed)
+    AdvanceBuffer();
+
+  // Retrieve the possible number
+  StringRef NumSpelling = Buffer.take_while(IsPreprocessorNumberChar);
+
+  // Parse the numeric value and so semantic checks on its specification
+  clang::NumericLiteralParser Literal(NumSpelling, SourceLoc,
+                                      PP.getSourceManager(), PP.getLangOpts(),
+                                      PP.getTargetInfo(), PP.getDiagnostics());
+  if (Literal.hadError)
+    return true; // Error has already been reported so just return
+
+  // Retrieve the number value to store into the token
+  if (Literal.isIntegerLiteral()) {
+    Result.Kind = TokenKind::int_literal;
+
+    APSInt X = APSInt(32, Result.Signed);
+    if (Literal.GetIntegerValue(X))
+      return true; // TODO: Report overflow error
+
+    X = Result.Signed ? -X : X;
+    Result.IntLiteral = (uint32_t)X.getZExtValue();
+  } else {
+    return true; // TODO: report unsupported number literal specification
+  }
+
+  AdvanceBuffer(NumSpelling.size());
+  return false;
+}
+
+bool RootSignatureLexer::Lex(SmallVector<RootSignatureToken> &Tokens) {
+  // Discard any leading whitespace
+  AdvanceBuffer(Buffer.take_while(isspace).size());
+
+  while (!Buffer.empty()) {
+    RootSignatureToken Result;
+    if (LexToken(Result))
+      return true;
+
+    // Successfully Lexed the token so we can store it
+    Tokens.push_back(Result);
+
+    // Discard any trailing whitespace
+    AdvanceBuffer(Buffer.take_while(isspace).size());
+  }
+
+  return false;
+}
+
+bool RootSignatureLexer::LexToken(RootSignatureToken &Result) {
+  // Record where this token is in the text for usage in parser diagnostics
+  Result.TokLoc = SourceLoc;
+
+  char C = Buffer.front();
+
+  // Punctuators
+  switch (C) {
+#define PUNCTUATOR(X, Y)                                                       \
+  case Y: {                                                                    \
+    Result.Kind = TokenKind::pu_##X;                                           \
+    AdvanceBuffer();                                                           \
+    return false;                                                              \
+  }
+#include "clang/Parse/HLSLRootSignatureTokenKinds.def"
+  default:
+    break;
+  }
+
+  // Numeric constant
+  if (isdigit(C) || C == '-')
+    return LexNumber(Result);
+
+  // All following tokens require at least one additional character
+  if (Buffer.size() <= 1)
+    return true; // TODO: Report invalid token error
+
+  // Peek at the next character to deteremine token type
+  char NextC = Buffer[1];
+
+  // Registers: [tsub][0-9+]
+  if ((C == 't' || C == 's' || C == 'u' || C == 'b') && isdigit(NextC)) {
+    AdvanceBuffer();
+
+    if (LexNumber(Result))
+      return true;
+
+    // Lex number could also parse a float so ensure it was an unsigned int
+    if (Result.Kind != TokenKind::int_literal || Result.Signed)
+      return true; // Return invalid number literal for register error
+
+    // Convert character to the register type.
+    // This is done after LexNumber to override the TokenKind
+    switch (C) {
+    case 'b':
+      Result.Kind = TokenKind::bReg;
+      break;
+    case 't':
+      Result.Kind = TokenKind::tReg;
+      break;
+    case 'u':
+      Result.Kind = TokenKind::uReg;
+      break;
+    case 's':
+      Result.Kind = TokenKind::sReg;
+      break;
+    default:
+      llvm_unreachable("Switch for an expected token was not provided");
+      return true;
+    }
+    return false;
+  }
+
+  // Keywords and Enums:
+  StringRef TokSpelling =
+      Buffer.take_while([](char C) { return isalnum(C) || C == '_'; });
+
+  // Define a large string switch statement for all the keywords and enums
+  auto Switch = llvm::StringSwitch<TokenKind>(TokSpelling);
+#define KEYWORD(NAME) Switch.Case(#NAME, TokenKind::kw_##NAME);
+#define ENUM(NAME, LIT) Switch.CaseLower(LIT, TokenKind::en_##NAME);
+#include "clang/Parse/HLSLRootSignatureTokenKinds.def"
+
+  // Then attempt to retreive a string from it
+  auto Kind = Switch.Default(TokenKind::invalid);
+  if (Kind == TokenKind::invalid)
+    return true; // TODO: Report invalid identifier
+
+  Result.Kind = Kind;
+  AdvanceBuffer(TokSpelling.size());
+  return false;
+}
+
+} // namespace root_signature
+} // namespace hlsl
+} // namespace llvm
diff --git a/clang/unittests/CMakeLists.txt b/clang/unittests/CMakeLists.txt
index 85d265426ec80b..9b3ce8aa7de739 100644
--- a/clang/unittests/CMakeLists.txt
+++ b/clang/unittests/CMakeLists.txt
@@ -25,6 +25,7 @@ endfunction()
 
 add_subdirectory(Basic)
 add_subdirectory(Lex)
+add_subdirectory(Parse)
 add_subdirectory(Driver)
 if(CLANG_ENABLE_STATIC_ANALYZER)
   add_subdirectory(Analysis)
diff --git a/clang/unittests/Parse/CMakeLists.txt b/clang/unittests/Parse/CMakeLists.txt
new file mode 100644
index 00000000000000..1b7eb4934a46c8
--- /dev/null
+++ b/clang/unittests/Parse/CMakeLists.txt
@@ -0,0 +1,26 @@
+set(LLVM_LINK_COMPONENTS
+  Support
+  )
+
+add_clang_unittest(ParseTests
+  ParseHLSLRootSignatureTest.cpp
+  )
+
+clang_target_link_libraries(ParseTests
+  PRIVATE
+  clangAST
+  clangASTMatchers
+  clangBasic
+  clangFrontend
+  clangParse
+  clangSema
+  clangSerialization
+  clangTooling
+  )
+
+target_link_libraries(ParseTests
+  PRIVATE
+  LLVMTestingAnnotations
+  LLVMTestingSupport
+  clangTesting
+  )
diff --git a/clang/unittests/Parse/ParseHLSLRootSignatureTest.cpp b/clang/unittests/Parse/ParseHLSLRootSignatureTest.cpp
new file mode 100644
index 00000000000000..c430b0657dacf5
--- /dev/null
+++ b/clang/unittests/Parse/ParseHLSLRootSignatureTest.cpp
@@ -0,0 +1,130 @@
+//=== ParseHLSLRootSignatureTest.cpp - Parse Root Signature tests ---------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "clang/Basic/Diagnostic.h"
+#include "clang/Basic/DiagnosticOptions.h"
+#include "clang/Basic/FileManager.h"
+#include "clang/Basic/LangOptions.h"
+#include "clang/Basic/SourceLocation.h"
+#include "clang/Basic/SourceManager.h"
+#include "clang/Basic/TargetInfo.h"
+#include "clang/Lex/HeaderSearch.h"
+#include "clang/Lex/HeaderSearchOptions.h"
+#include "clang/Lex/Lexer.h"
+#include "clang/Lex/ModuleLoader.h"
+#include "clang/Lex/Preprocessor.h"
+#include "clang/Lex/PreprocessorOptions.h"
+
+#include "clang/Parse/ParseHLSLRootSignature.h"
+#include "gtest/gtest.h"
+
+using namespace llvm::hlsl::root_signature;
+using namespace clang;
+
+namespace {
+
+// The test fixture.
+class ParseHLSLRootSignatureTest : public ::testing::Test {
+protected:
+  ParseHLSLRootSignatureTest()
+      : FileMgr(FileMgrOpts), DiagID(new DiagnosticIDs()),
+        Diags(DiagID, new DiagnosticOptions, new IgnoringDiagConsumer()),
+        SourceMgr(Diags, FileMgr), TargetOpts(new TargetOptions) {
+    TargetOpts->Triple = "x86_64-apple-darwin11.1.0";
+    Target = TargetInfo::CreateTargetInfo(Diags, TargetOpts);
+  }
+
+  Preprocessor *CreatePP(StringRef Source, TrivialModuleLoader &ModLoader) {
+    std::unique_ptr<llvm::MemoryBuffer> Buf =
+        llvm::MemoryBuffer::getMemBuffer(Source);
+    SourceMgr.setMainFileID(SourceMgr.createFileID(std::move(Buf)));
+
+    HeaderSearch HeaderInfo(std::make_shared<HeaderSearchOptions>(), SourceMgr,
+                            Diags, LangOpts, Target.get());
+    Preprocessor *PP =
+        new Preprocessor(std::make_shared<PreprocessorOptions>(), Diags,
+                         LangOpts, SourceMgr, HeaderInfo, ModLoader,
+                         /*IILookup =*/nullptr,
+                         /*OwnsHeaderSearch =*/false);
+    PP->Initialize(*Target);
+    PP->EnterMainSourceFile();
+    return PP;
+  }
+
+  void CheckTokens(SmallVector<RootSignatureToken> &Computed,
+                   SmallVector<TokenKind> &Expected) {
+    ASSERT_EQ(Computed.size(), Expected.size());
+    for (unsigned I = 0, E = Expected.size(); I != E; ++I) {
+      ASSERT_EQ(Computed[I].Kind, Expected[I]);
+    }
+  }
+
+  FileSystemOptions FileMgrOpts;
+  FileManager FileMgr;
+  IntrusiveRefCntPtr<DiagnosticIDs> DiagID;
+  DiagnosticsEngine Diags;
+  SourceManager SourceMgr;
+  LangOptions LangOpts;
+  std::shared_ptr<TargetOptions> TargetOpts;
+  IntrusiveRefCntPtr<TargetInfo> Target;
+};
+
+TEST_F(ParseHLSLRootSignatureTest, LexValidTokensTest) {
+  const llvm::StringLiteral Source = R"cc(
+    -42
+
+    b0 t43 u987 s234
+
+    (),|=
+
+    DescriptorTable
+
+    CBV SRV UAV Sampler
+    space visibility flags
+    numDescriptors offset
+
+    DESCRIPTOR_RANGE_OFFSET_APPEND
+
+    DATA_VOLATILE
+    DATA_STATIC_WHILE_SET_AT_EXECUTE
+    DATA_STATIC
+    DESCRIPTORS_VOLATILE
+    DESCRIPTORS_STATIC_KEEPING_BUFFER_BOUNDS_CHECKS
+
+    shader_visibility_all
+    shader_visibility_vertex
+    shader_visibility_hull
+    shader_visibility_domain
+    shader_visibility_geometry
+    shader_visibility_pixel
+    shader_visibility_amplification
+    shader_visibility_mesh
+  )cc";
+
+  TrivialModuleLoader ModLoader;
+  Preprocessor *PP = CreatePP(Source, ModLoader);
+  auto TokLoc = SourceLocation();
+
+  RootSignatureLexer Lexer(Source, TokLoc, *PP);
+
+  SmallVector<RootSignatureToken> Tokens = {
+      RootSignatureToken() // invalid token for completeness
+  };
+  ASSERT_FALSE(Lexer.Lex(Tokens));
+
+  SmallVector<TokenKind> Expected = {
+#define TOK(NAME) TokenKind::NAME,
+#include "clang/Parse/HLSLRootSignatureTokenKinds.def"
+  };
+
+  CheckTokens(Tokens, Expected);
+
+  delete PP;
+}
+
+} // anonymous namespace

Copy link
Contributor

@V-FEXrt V-FEXrt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally happy with the code at a high level, though it would be best to have someone with domain knowledge also approve

SmallVector<RootSignatureToken> Tokens = {
RootSignatureToken() // invalid token for completeness
};
ASSERT_FALSE(Lexer.Lex(Tokens));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to see a more robust set of failure tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I decided to leave out the implementation of generating and testing diagnostics from this pr, just to help with the size. If you feel that should they be here to ensure correctness I could add it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I'm happy to punt it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll reiterate, I think this is the wrong tradeoff, because it makes it significantly more difficult to review the follow-up change since we'll need to compare not just the incoming PR but also go back and look at this PR to ensure all the appropriate error states are captured and tested.

- Defines the RootSignatureLexer class
- Defines the test harness required for testing

- Implements the punctuator tokens and tests functionality
@inbelic inbelic linked an issue Jan 27, 2025 that may be closed by this pull request
- Integrate the use of the `NumericLiteralParser` to lex integer
literals
- Add additional hlsl specific diagnostics messages
@inbelic inbelic force-pushed the inbelic/rootsig-dt branch from 0a1df31 to dc784ff Compare January 28, 2025 20:08
@inbelic
Copy link
Contributor Author

inbelic commented Jan 28, 2025

Updated to include diagnostics output and the relevant testing.

I have rebased to split the pr into smaller incremental changes. But I have taken care to address all previous comments.

@inbelic
Copy link
Contributor Author

inbelic commented Jan 29, 2025

HLSL Build failure is because there is an assertion that DIAG_SIZE_LEX is not big enough (insufficient). I did add some new diag errors under the Lex category so it seems the threshold is passed on some environments.

From here, it seems that we could, and there is trend to, bump this limit when needed. IIUC, I should create a separate pr to bump this.

// Retrieve the number value to store into the token
Result.Kind = TokenKind::int_literal;

llvm::APSInt X = llvm::APSInt(32, !Signed);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if we have a value like +2147483648? This would fit in an unsigned 32-bit int, but not a signed one. However, we're deciding that the value is signed if the + is present above. Will this error and tell us that there's overflow, treat the value as unsigned because of the range, or give us a garbage negative number?

If it's one of the first two options, does the one we're implementing match DXC's behaviour here?

- the previous implementation would not have thrown an overflow error as
expected. Instead it would have been treated as a negative int
- DXC treats all positively signed ints as an unsigned integer, so for
compatibility, we will be doing the same.
- add better unit tests to demonstrate expected functionality
Copy link
Contributor

@damyanp damyanp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - I'm surprised we don't have all of the tokens we're going to need for root signatures included in this PR.


// All following tokens require at least one additional character
if (Buffer.size() <= 1) {
PP.getDiagnostics().Report(Result.TokLoc, diag::err_hlsl_invalid_token);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error doesn't seem particularly useful to end users who make a mistake. Why not just return false here because it can't construct a token?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you also remove the same error message from here?

They were added just to have some indication that lexing failed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it more useful to the user to note a failure in lexing, or is it more useful at the parsing level to be able to recognize that the token was set to invalid when something else was expected?

Clang tends to do the later, which is why you get errors like "expected identifier", because the parser knows the next thing should be an identifier, but the lexer just knows it didn't produce the right result (or any result).

return false;
}

bool RootSignatureLexer::Lex(SmallVector<RootSignatureToken> &Tokens) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this function should exist.

Most modern compilers don't lex entire source files into allocated lists of tokens then parse, they do the two at the same time never allocating more than a couple tokens at a time. You can see this in Clang where the ConsumeToken API advances the token to the next token, and is called throughout the parser.

I've seen other compilers where the tokenizer is implemented as effectively an iterator where the iterator state tracks the current token.

For context, the reason compilers tend to do this is because lexing isn't always "fast". In particular translation of numeric literals into actual integer values can be quite slow, so if you lex all the tokens, you do all the transformations even if an error would be detected during parsing. If you lex one token at a time, you stop lexing once the error is encountered providing faster user feedback.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, I have updated this to only support a ConsumeToken/PeekNextToken api.

It could be updated further to define and be used as an iterator. Although from how it is used in the parser (usually through PeekNextToken). I think it would essentially be just renaming ConsumeToken to the ++ operator. Maybe there are other benefits I don't see?

@inbelic inbelic linked an issue Feb 10, 2025 that may be closed by this pull request
4 tasks
- we want to prevent the pre-allocation of all the tokens and rather let
the parser allocate the tokens as it goes along
- this allows for a quicker cycle to the user if there is a parsing
error as we don't need to lex all of the tokens before trying to parse

- as such, this change provides a new api for interacting with the lexer
through ConsumeToken and PeekNextToken as opposed to generated the
entire list of lexed tokens
Copy link

github-actions bot commented Feb 11, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Collaborator

@llvm-beanz llvm-beanz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I think this is very much moving in the right direction. I have two high level areas of feedback (see my more specific comments inline).

In general, Lexer's don't often have the context to provide actionable error messages. As a result it is usually better for the Lexer to push up an invalid token to the parser and allow the parser to provide a more contextually relevant diagnostic.

The other area is around what is the responsibility of the parser vs the lexer. In general lexers just break the input into tokens, the parser then interprets and validates the tokens. In particular around numbers this is doing a lot of interpretation of the token values. I think it is probably better to have a clearer separation.


// All following tokens require at least one additional character
if (Buffer.size() <= 1) {
PP.getDiagnostics().Report(Result.TokLoc, diag::err_hlsl_invalid_token);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it more useful to the user to note a failure in lexing, or is it more useful at the parsing level to be able to recognize that the token was set to invalid when something else was expected?

Clang tends to do the later, which is why you get errors like "expected identifier", because the parser knows the next thing should be an identifier, but the lexer just knows it didn't produce the right result (or any result).

- instead of reporting the unique end_of_stream error, we can instead
pass up an end_of_stream token and let the unexpected diag reporting
handles this in the parser with more context
- similarily, we can let the unexpected_token diagnostics reporting
handle this error by passing up an invalid token and provide more
context
- the current use of invalid is not able to distinguish between a lex
error that has been reported to diagnostics and an invalid token that
should be reported during parsing
- an error token is defined as the default token to denote a lex error
that has been reported and shouldn't be handled by the parsing

bool RootSignatureLexer::LexNumber(RootSignatureToken &Result) {
// Retrieve the possible number
StringRef NumSpelling = Buffer.take_while(IsNumberChar);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading this correctly right now if someone writes 1.3 the parser will break the token at the ., call it an integer literal and return 1 for the value. I assume that's not the expected behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. It is the expected behaviour for this pr, as we are not expecting to handle floats until #126565. This was split out because I want to have a pr that will have dedicated documentation and testing to show that the float parsing is equivalent to DXC. Which I think would get bogged down and increase the scope when including it in this pr.

…rror states

- as commented, using the `NumericLiteralParser` during lexing of
numeric constants falls more under parsing rather than lexing. this
commit removes the use of this completely from the lexer

- this implies that (after moving the register error to parsing) there
are no longer any errors to report during lexing. instead we will just
create an `invalid` or `end_of_stream` token and let the parser provide
an error with more context

- thus we change the api of `ConsumeToken`/`PeekNextToken` so that it no
longer needs to denote an error and will just return a token. this
greatly simplifies the lexer implementation as we no longer need to pass
down the pre-processor
Copy link
Collaborator

@llvm-beanz llvm-beanz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My last real question here is whether this belongs in clang/Parse or if it should be in clang/Lex. Since it is now strictly Lexing it shouldn't have any dependencies on the parse or AST libraries right?

#ifndef LLVM_CLANG_PARSE_PARSEHLSLROOTSIGNATURE_H
#define LLVM_CLANG_PARSE_PARSEHLSLROOTSIGNATURE_H

#include "clang/AST/APValue.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed?

In general it is also best to prefer forward declarations in headers rather than including headers where possible to reduce compile time overhead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this and the other no longer needed headers

Copy link
Collaborator

@llvm-beanz llvm-beanz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small formatting cleanup in a suggestion. LGTM.

@inbelic inbelic merged commit b41b86a into llvm:main Feb 14, 2025
9 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Feb 14, 2025

LLVM Buildbot has detected a new failure on builder lldb-x86_64-debian running on lldb-x86_64-debian while building clang at step 6 "test".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/162/builds/16230

Here is the relevant piece of the build log for the reference
Step 6 (test) failure: build (failure)
...
PASS: lldb-unit :: Utility/./UtilityTests/77/127 (2731 of 2777)
PASS: lldb-shell :: Subprocess/vfork-follow-parent-wp.test (2732 of 2777)
PASS: lldb-unit :: Utility/./UtilityTests/78/127 (2733 of 2777)
PASS: lldb-unit :: Core/./LLDBCoreTests/48/66 (2734 of 2777)
PASS: lldb-shell :: Unwind/basic-block-sections.test (2735 of 2777)
PASS: lldb-unit :: Core/./LLDBCoreTests/50/66 (2736 of 2777)
PASS: lldb-api :: tools/lldb-server/attach-wait/TestGdbRemoteAttachWait.py (2737 of 2777)
PASS: lldb-api :: tools/lldb-server/TestGdbRemoteLaunch.py (2738 of 2777)
PASS: lldb-unit :: Host/./HostTests/2/98 (2739 of 2777)
PASS: lldb-unit :: Host/./HostTests/30/98 (2740 of 2777)
FAIL: lldb-api :: tools/lldb-dap/launch/TestDAP_launch.py (2741 of 2777)
******************** TEST 'lldb-api :: tools/lldb-dap/launch/TestDAP_launch.py' FAILED ********************
Script:
--
/usr/bin/python3 /home/worker/2.0.1/lldb-x86_64-debian/llvm-project/lldb/test/API/dotest.py -u CXXFLAGS -u CFLAGS --env LLVM_LIBS_DIR=/home/worker/2.0.1/lldb-x86_64-debian/build/./lib --env LLVM_INCLUDE_DIR=/home/worker/2.0.1/lldb-x86_64-debian/build/include --env LLVM_TOOLS_DIR=/home/worker/2.0.1/lldb-x86_64-debian/build/./bin --arch x86_64 --build-dir /home/worker/2.0.1/lldb-x86_64-debian/build/lldb-test-build.noindex --lldb-module-cache-dir /home/worker/2.0.1/lldb-x86_64-debian/build/lldb-test-build.noindex/module-cache-lldb/lldb-api --clang-module-cache-dir /home/worker/2.0.1/lldb-x86_64-debian/build/lldb-test-build.noindex/module-cache-clang/lldb-api --executable /home/worker/2.0.1/lldb-x86_64-debian/build/./bin/lldb --compiler /home/worker/2.0.1/lldb-x86_64-debian/build/./bin/clang --dsymutil /home/worker/2.0.1/lldb-x86_64-debian/build/./bin/dsymutil --make /usr/bin/gmake --llvm-tools-dir /home/worker/2.0.1/lldb-x86_64-debian/build/./bin --lldb-obj-root /home/worker/2.0.1/lldb-x86_64-debian/build/tools/lldb --lldb-libs-dir /home/worker/2.0.1/lldb-x86_64-debian/build/./lib -t /home/worker/2.0.1/lldb-x86_64-debian/llvm-project/lldb/test/API/tools/lldb-dap/launch -p TestDAP_launch.py
--
Exit Code: 1

Command Output (stdout):
--
lldb version 21.0.0git (https://github.com/llvm/llvm-project.git revision b41b86a907f653f79bab10d4c80b3a41d146c71b)
  clang revision b41b86a907f653f79bab10d4c80b3a41d146c71b
  llvm revision b41b86a907f653f79bab10d4c80b3a41d146c71b
Skipping the following test categories: ['libc++', 'dsym', 'gmodules', 'debugserver', 'objc']
========= DEBUG ADAPTER PROTOCOL LOGS =========
1739554236.560714722 --> 
Content-Length: 344

{
  "arguments": {
    "adapterID": "lldb-native",
    "clientID": "vscode",
    "columnsStartAt1": true,
    "linesStartAt1": true,
    "locale": "en-us",
    "pathFormat": "path",
    "sourceInitFile": false,
    "supportsRunInTerminalRequest": true,
    "supportsStartDebuggingRequest": true,
    "supportsVariablePaging": true,
    "supportsVariableType": true
  },
  "command": "initialize",
  "seq": 1,
  "type": "request"
}
1739554236.563421011 <-- 
Content-Length: 1631


joaosaffran pushed a commit to joaosaffran/llvm-project that referenced this pull request Feb 14, 2025
For the sake of scope, we will let the lexing of floating literals be
deferred until needed for Static Samplers. Other than that this pr
should allow us to simply define new enumerations/keywords in
`RootSignatureTokenKinds.def` for when they are used in the parser. We
could have defined all of these keywords here, but for the sake of
correctness in review we will let them be split up.

- Define `RootSignatureLexer` and provide a public `LexToken` method for
external use
- Define the file `RootSignatureTokenKinds` to define required tokens
and allow for future custom keywords/enums
- Implement the internal methods required to parse the different types
of tokens (integers, flag enums, puncuators...)
- Add test harness for unit testing and the respective unit tests for
lexing the tokens

Resolves llvm#126563

---------

Co-authored-by: Chris B <[email protected]>
sivan-shani pushed a commit to sivan-shani/llvm-project that referenced this pull request Feb 24, 2025
For the sake of scope, we will let the lexing of floating literals be
deferred until needed for Static Samplers. Other than that this pr
should allow us to simply define new enumerations/keywords in
`RootSignatureTokenKinds.def` for when they are used in the parser. We
could have defined all of these keywords here, but for the sake of
correctness in review we will let them be split up.

- Define `RootSignatureLexer` and provide a public `LexToken` method for
external use
- Define the file `RootSignatureTokenKinds` to define required tokens
and allow for future custom keywords/enums
- Implement the internal methods required to parse the different types
of tokens (integers, flag enums, puncuators...)
- Add test harness for unit testing and the respective unit tests for
lexing the tokens

Resolves llvm#126563

---------

Co-authored-by: Chris B <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category HLSL HLSL Language Support
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

[HLSL] Implement Root Signature Lexer for Descriptor Tables
7 participants