feat: add generative recommendation tokenizer. #317

magicheng0816 · 2025-11-04T13:39:39Z

No description provided.

yq33victor · 2025-11-05T08:06:16Z

xllm/core/common/version_singleton.h

@@ -0,0 +1,96 @@
+#pragma once


nit: add header like this:

/* Copyright 2025 The xLLM Authors. All Rights Reserved. Copyright 2024 The ScaleLLM Authors. All Rights Reserved. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at https://github.com/jd-opensource/xllm/blob/main/LICENSE Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. ==============================================================================*/

yq33victor · 2025-11-05T08:30:58Z

xllm/core/common/version_singleton.h

+      std::lock_guard<std::mutex> lock(instance_map_mutex_);
+
+      auto it = instance_map_.find(version);
+      if (it == instance_map_.end()) {


why we need to find twice from instance_map_? Line25 above has already do this.

yq33victor · 2025-11-05T08:36:06Z

xllm/core/common/version_singleton.h

+
+  static void DestroyInstance(const std::string& version) {
+    std::lock_guard<std::mutex> lock(instance_map_mutex_);
+    instance_map_.erase(version);


should erase version from instance_version_list_ ?

yq33victor · 2025-11-05T09:16:21Z

xllm/core/framework/hf_model_loader.cpp

+        path.append(tokenizer_args_.vocab_file()).string();
+
+    LOG(INFO) << "model_version:" << model_version;
+    LOG(INFO) << "vocab_full_path:" << vocab_full_path;


nit: :)

LOG(INFO) << "model_version:" << model_version << ", vocab_full_path:" << vocab_full_path;

yq33victor · 2025-11-05T10:24:56Z

xllm/core/framework/hf_model_loader.cpp

+              ->initialize(vocab_full_path))
+        << "Failed to initialize vocab dict from " << vocab_full_path;
+  } else {
+    LOG(INFO) << "vocab file is not set";


nit: maybe log(error) or log(fatal) or return false ?

yq33victor · 2025-11-05T10:33:06Z

xllm/core/framework/state_dict/rec_vocab_dict.cpp

@@ -0,0 +1,138 @@
+#include "rec_vocab_dict.h"


nit: add xllm header

yq33victor · 2025-11-05T12:04:37Z

xllm/core/framework/state_dict/rec_vocab_dict.cpp

+}
+
+bool RecVocabDict::get_items_by_tokens(const RecTokenTriple& rec_token_triple,
+                                       std::vector<int64_t>* item_ids) const {


nit: maybe use & here is better.

std::vector<int64_t>* item_ids -> const std::vector<int64_t>& item_ids CHECK(!item_ids.empty());

yq33victor · 2025-11-05T12:09:08Z

xllm/core/framework/state_dict/rec_vocab_dict.h

@@ -0,0 +1,94 @@
+#pragma once


nit: add xllm header

yq33victor · 2025-11-05T12:12:07Z

xllm/core/framework/state_dict/rec_vocab_dict.h

+
+  /**
+   * @brief initialize instance, parse vocab file
+   * @param vocab_file vocab file, need full path


nit: maybe we can align the comment format.

// @brief initialize instance, parse vocab file // @param vocab_file vocab file, need full path // ...

yq33victor · 2025-11-05T12:12:37Z

xllm/core/framework/tokenizer/rec_tokenizer.cpp

@@ -0,0 +1,52 @@
+#include "rec_tokenizer.h"


nit: add xllm header

liutongxuan · 2025-11-06T01:56:09Z

xllm/core/framework/hf_model_loader.cpp

  std::sort(model_weights_files_.begin(), model_weights_files_.end());
+
+  //@todo: 'false' will be replaced with generative recommendation judgment
+  if (false) {


If it doesn't work yet, shouldn't add this logic for now.

feat: add generative recommendation tokenizer.

5c97614

magicheng0816 requested review from Clement-Wang26, RobbieLeung, walsonyang and yq33victor November 4, 2025 13:39

yq33victor reviewed Nov 5, 2025

View reviewed changes

liutongxuan reviewed Nov 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add generative recommendation tokenizer. #317

feat: add generative recommendation tokenizer. #317

Uh oh!

magicheng0816 commented Nov 4, 2025

Uh oh!

yq33victor Nov 5, 2025

Uh oh!

yq33victor Nov 5, 2025

Uh oh!

yq33victor Nov 5, 2025

Uh oh!

yq33victor Nov 5, 2025

Uh oh!

yq33victor Nov 5, 2025

Uh oh!

yq33victor Nov 5, 2025

Uh oh!

yq33victor Nov 5, 2025

Uh oh!

yq33victor Nov 5, 2025

Uh oh!

yq33victor Nov 5, 2025

Uh oh!

yq33victor Nov 5, 2025

Uh oh!

liutongxuan Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: add generative recommendation tokenizer. #317

Are you sure you want to change the base?

feat: add generative recommendation tokenizer. #317

Uh oh!

Conversation

magicheng0816 commented Nov 4, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants