Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src: implement whatwg's URLPattern spec #56452

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

anonrig
Copy link
Member

@anonrig anonrig commented Jan 3, 2025

Co-authored-by: Daniel Lemire (@lemire)

Blocked

This is blocked from landing due to the old macOS machines we use in our infrastructure (cc @nodejs/build)

19:50:46 ../deps/ada/ada.h:8457:10: fatal error: 'ranges' file not found
19:50:46 #include <ranges>
19:50:46          ^~~~~~~~
19:50:46   /usr/local/bin/ccache cc -o /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx11-

Notes:

  • Ada now requires C++20
  • URLPattern is now a global class.
  • URLPattern is also exposed in node:url module

TODOs

  • Removed exception flag requirement
  • Pass all web-platform tests
  • Release Ada v3 before landing this PR
  • Make sure to split all changes to multiple commits
  • Add @lemire as co-author to all commits
  • Land upstream pull-request implement URLPattern ada-url/ada#785
  • Add documentation for global and node:url module declarations.

cc @nodejs/cpp-reviewers

Fixes #40844

@anonrig anonrig requested review from jasnell and RafaelGSS January 3, 2025 16:07
@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/gyp
  • @nodejs/security-wg
  • @nodejs/startup
  • @nodejs/url
  • @nodejs/web-standards

@nodejs-github-bot nodejs-github-bot added lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Jan 3, 2025
@targos targos added the semver-major PRs that contain breaking changes and should be released in the next major version. label Jan 3, 2025
@anonrig anonrig added macos Issues and PRs related to the macOS platform / OSX. blocked PRs that are blocked by other issues or PRs. build-agenda labels Jan 3, 2025
@targos
Copy link
Member

targos commented Jan 3, 2025

Ada now enables exceptions just like UV and V8

Can you elaborate? libuv is a C library so I don't think exceptions exist there, and I'm pretty sure V8 is built with exceptions disabled.

@anonrig
Copy link
Member Author

anonrig commented Jan 3, 2025

Ada now enables exceptions just like UV and V8

Can you elaborate? libuv is a C library so I don't think exceptions exist there, and I'm pretty sure V8 is built with exceptions disabled.

My bad UV does not enable exceptions. Referencing v8.gyp file:

{
  'target_name': 'torque_base',
  'type': 'static_library',
  'toolsets': ['host', 'target'],
  'sources': [
    '<!@pymod_do_main(GN-scraper "<(V8_ROOT)/BUILD.gn"  "\\"torque_base.*?sources = ")',
  ],
  'dependencies': [
    'v8_shared_internal_headers',
    'v8_libbase',
  ],
  'defines!': [
    '_HAS_EXCEPTIONS=0',
    'BUILDING_V8_SHARED=1',
  ],
  'cflags_cc!': ['-fno-exceptions'],
  'cflags_cc': ['-fexceptions'],
  'xcode_settings': {
    'GCC_ENABLE_CPP_EXCEPTIONS': 'YES',  # -fexceptions
  },
  'msvs_settings': {
    'VCCLCompilerTool': {
      'RuntimeTypeInfo': 'true',
      'ExceptionHandling': 1,
    },
  },
}

@targos
Copy link
Member

targos commented Jan 3, 2025

This is not really V8. It's a build-time executable (torque) used to generate code for V8

@anonrig anonrig requested a review from Qard January 3, 2025 16:27
src/node_url_pattern.cc Outdated Show resolved Hide resolved
src/node_url_pattern.cc Outdated Show resolved Hide resolved
src/node_url_pattern.cc Outdated Show resolved Hide resolved

MaybeLocal<Value> URLPattern::Hash() const {
auto context = env()->context();
return ToV8Value(context, url_pattern_.get_hash());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the key challenge here is that this will copy the string on every call. Any chance of memoizing the string once created.

URLPattern::URLPattern(Environment* env,
Local<Object> object,
ada::url_pattern&& url_pattern)
: BaseObject(env, object) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We likely should introduce this as experimental in the first release, even if it graduates from experimental quickly. There should likely be a warning emitted on the first construction.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK: There is no easy way to emit an experimental warning in C++ that can be dismissed using the CLI command. For now, I have made it experimental on the nodejs doc.

src/node_url_pattern.cc Outdated Show resolved Hide resolved
src/node_url_pattern.cc Outdated Show resolved Hide resolved
src/node_url_pattern.cc Outdated Show resolved Hide resolved
@jasnell
Copy link
Member

jasnell commented Jan 3, 2025

Can you also include a fairly simple benchmark?

Copy link

codecov bot commented Jan 3, 2025

Codecov Report

Attention: Patch coverage is 78.29181% with 122 lines in your changes missing coverage. Please review.

Project coverage is 89.18%. Comparing base (f2001e3) to head (eb005da).
Report is 10 commits behind head on main.

Files with missing lines Patch % Lines
src/node_url_pattern.cc 78.41% 58 Missing and 62 partials ⚠️
src/node_url_pattern.h 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #56452      +/-   ##
==========================================
- Coverage   89.21%   89.18%   -0.03%     
==========================================
  Files         663      665       +2     
  Lines      192001   192564     +563     
  Branches    36921    37045     +124     
==========================================
+ Hits       171286   171733     +447     
- Misses      13582    13642      +60     
- Partials     7133     7189      +56     
Files with missing lines Coverage Δ
lib/internal/url.js 97.55% <100.00%> (-0.13%) ⬇️
lib/url.js 100.00% <100.00%> (ø)
src/node_binding.cc 83.66% <ø> (ø)
src/node_errors.h 85.00% <ø> (ø)
src/node_external_reference.h 100.00% <ø> (ø)
src/node_url_pattern.h 0.00% <0.00%> (ø)
src/node_url_pattern.cc 78.41% <78.41%> (ø)

... and 32 files with indirect coverage changes

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think this is a good pattern to land in Node.js. Specifically, a server using this will create one per route and iterate in a loop. This will be slow, specifically if you need to match the last of the list.

(This feedback was provided when URLPattern was standardized and essentially ignored).

For this to be useful, we would need to have a Node.js-specific API to organize these URLPattern in a radix prefix trie and actually do the matching all at once.

I can possibly be persuaded that we need this for Web platform compatibility, but it’s not that popular either (unlike fetch()).

@mcollina
Copy link
Member

mcollina commented Jan 3, 2025

@jasnell I’ll try to build this and get a benchmark going against the ecosystem routers.

@anonrig
Copy link
Member Author

anonrig commented Jan 3, 2025

@jasnell I’ll try to build this and get a benchmark going against the ecosystem routers.

Right now, this pull-request does not pass WPT, and not at all optimized. Any benchmarks will not be beneficial.

@anonrig anonrig force-pushed the yagiz/implement-url-pattern branch 2 times, most recently from ad32b4d to cef45f0 Compare January 27, 2025 20:36
@anonrig
Copy link
Member Author

anonrig commented Jan 27, 2025

@targos I've removed the exception requirement of this pull-request. We no longer use std::regex and I've put it under a flag where it's not enabled for Node.js builds: ada-url/ada#853

@anonrig anonrig force-pushed the yagiz/implement-url-pattern branch from cef45f0 to 72cb00f Compare January 27, 2025 20:46
@StefanStojanovic
Copy link
Contributor

@nodejs/platform-windows @lemire can you help me identify and fix the build error on windows?

From what I see it's all ada related. Since we have collaborators that are actively working on that project I'll leave it to them, but feel free to reach out if any assistance is needed.

input = ada::url_pattern_init{};
} else if (args[0]->IsString()) {
Utf8Value input_value(env->isolate(), args[0].As<String>());
input_base = input_value.ToString();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small suggestion: We could just move the Utf8Value into the outer scope as a std::optional<Utf8Value> and skip the intermediate std::string then?

Similar recommendation elsewhere :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. Thank you for the recommendation and review!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow the following change causes "free(): invalid pointer"

void URLPattern::Test(const FunctionCallbackInfo<Value>& args) {
  URLPattern* url_pattern;
  ASSIGN_OR_RETURN_UNWRAP(&url_pattern, args.This());
  auto env = Environment::GetCurrent(args);

  ada::url_pattern_input input;
  std::optional<Utf8Value> input_base{};
  std::optional<Utf8Value> base_url{};
  if (args.Length() == 0) {
    input = ada::url_pattern_init{};
  } else if (args[0]->IsString()) {
    input_base = Utf8Value(env->isolate(), args[0].As<String>());
    input = input_base->ToStringView();
  } else if (args[0]->IsObject()) {
    input = URLPatternInit::FromJsObject(env, args[0].As<Object>());
  } else {
    THROW_ERR_INVALID_ARG_TYPE(
        env, "URLPattern input needs to be a string or an object");
    return;
  }

  if (args.Length() > 1) {
    if (!args[1]->IsString()) {
      THROW_ERR_INVALID_ARG_TYPE(env, "baseURL must be a string");
      return;
    }
    base_url = Utf8Value(env->isolate(), args[1].As<String>());
  }

  std::optional<std::string_view> base_url_opt =
      base_url ? std::optional(base_url->ToStringView()) : std::nullopt;
  args.GetReturnValue().Set(url_pattern->Test(env, input, base_url_opt));
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we should probably add = delete for the copy/move operators/constructors to MaybeStackValue to prevent people from running into this.

Using input_base.emplace() instead of input_base = Utf8Value(...) should work just fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using emplace still throws a free invalid pointer abort:

frame #9: 0x00005555563c92a4 node`node::url_pattern::URLPattern::Test(v8::FunctionCallbackInfo<v8::Value> const&) + 772
node`node::url_pattern::URLPattern::Test:
->  0x5555563c92a4 <+772>: movzbl -0x888(%rbp), %eax
    0x5555563c92ab <+779>: subl   $0x1, %eax
    0x5555563c92ae <+782>: cmpb   $-0x3, %al
    0x5555563c92b0 <+784>: ja     0x5555563c92ba ; <+794>

src/node_url_pattern.cc Outdated Show resolved Hide resolved
bool Test(
Environment* env,
const ada::url_pattern_input& input,
std::optional<std::string_view>& baseURL); // NOLINT (runtime/references)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to prefer NOLINT over the recommendation here?

(Although I feel like in this case this maybe should be a const reference anyway?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I make it a const parameter, then it doesn't compile due to this line:

if (auto result = url_pattern_.exec(input, baseURL ? &*baseURL : nullptr)) {

exec() method in ada expects std::string_view* base_url where ada::url has a similar API surface as well, and making it const would differentiate the signature between url_pattern and url in Ada.

I'm not sure what is the best way, other than adding NOLINT and keeping this non-const.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I make it a const parameter, then it doesn't compile due to this line:

So ... what if that line becomes

std::string_view tmp;
if (auto result = url_pattern_.exec(input, baseURL ? &(tmp = *baseURL) : nullptr)) {

? Might look a bit odd but it's perfectly fine and it is imho a bit clearer anyway about why we're passing a std::string_view* here (notably, the fact that url_pattern_.exec() does not perform relevant modifications on the underlying characters, which the presence of a non-const pointer could indicate)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blech... but yeah, that's better even if it looks ugly ;-)

@anonrig anonrig force-pushed the yagiz/implement-url-pattern branch from 72cb00f to 2609e54 Compare January 28, 2025 16:54
@addaleax addaleax added c++ Issues and PRs that require attention from people who are familiar with C++. whatwg-url Issues and PRs related to the WHATWG URL implementation. labels Jan 28, 2025
src/node_url_pattern.cc Outdated Show resolved Hide resolved
@jasnell jasnell added the semver-major PRs that contain breaking changes and should be released in the next major version. label Jan 29, 2025
@jasnell
Copy link
Member

jasnell commented Jan 29, 2025

PR has to be semver-major since it adds a new global

@anonrig anonrig force-pushed the yagiz/implement-url-pattern branch from 2609e54 to 333b7b3 Compare January 29, 2025 01:06
@anonrig anonrig removed the semver-major PRs that contain breaking changes and should be released in the next major version. label Jan 29, 2025
@anonrig
Copy link
Member Author

anonrig commented Jan 29, 2025

PR has to be semver-major since it adds a new global

Removing semver-major. I've removed the global assignment to make this backportable to older release lines. I'll open another PR with semver-major label once this lands.

@anonrig anonrig force-pushed the yagiz/implement-url-pattern branch from 333b7b3 to e81311a Compare January 29, 2025 01:14
@richardlau richardlau added the semver-minor PRs that contain new features and should be released in the next minor version. label Jan 29, 2025
@lemire
Copy link
Member

lemire commented Jan 29, 2025

@anonrig At a glance, the Windows issues have to do with predefined macros clashing with variable names...

See ada-url/ada#854 which should help.

@anonrig anonrig force-pushed the yagiz/implement-url-pattern branch 2 times, most recently from 72fe0bd to b4de80e Compare January 29, 2025 14:38
@anonrig
Copy link
Member Author

anonrig commented Jan 29, 2025

@anonrig At a glance, the Windows issues have to do with predefined macros clashing with variable names...

See ada-url/ada#854 which should help.

Thanks lemire. I rebased and updated Ada. Re-running tests now...

@anonrig anonrig force-pushed the yagiz/implement-url-pattern branch from b4de80e to eb005da Compare January 29, 2025 16:28
@nodejs-github-bot
Copy link
Collaborator

@anonrig anonrig added the notable-change PRs with changes that should be highlighted in changelogs. label Jan 30, 2025
Copy link
Contributor

The notable-change PRs with changes that should be highlighted in changelogs. label has been added by @anonrig.

Please suggest a text for the release notes if you'd like to include a more detailed summary, then proceed to update the PR description with the text or a link to the notable change suggested text comment. Otherwise, the commit will be placed in the Other Notable Changes section.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked PRs that are blocked by other issues or PRs. build-agenda c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. macos Issues and PRs related to the macOS platform / OSX. needs-ci PRs that need a full CI run. notable-change PRs with changes that should be highlighted in changelogs. semver-minor PRs that contain new features and should be released in the next minor version. whatwg-url Issues and PRs related to the WHATWG URL implementation.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

implement URLPattern