Skip to content

Test PR,忽略#671

Closed
guoyi8 wants to merge 8 commits intoxerrors:mainfrom
guoyi8:feature/cusc-202605
Closed

Test PR,忽略#671
guoyi8 wants to merge 8 commits intoxerrors:mainfrom
guoyi8:feature/cusc-202605

Conversation

@guoyi8
Copy link
Copy Markdown

@guoyi8 guoyi8 commented May 1, 2026

Summary

  • 修复知识库新建/编辑页面语言模型选择后无法回显且提交为空的问题(V2 spec 冒号分隔符未正确解析)
  • 修复 general parser 分块超限导致 LightRAG 索引失败
  • 解决离线环境中 tiktoken 下载问题并优化模型兼容性
  • 更新 CUSC 嵌入模型和重排序器配置
  • Docker 配置调整:npm 官方源、PostgreSQL 端口映射、清华 PyPI 源

Test plan

  • 新建 LightRAG 类型知识库,选择语言模型后验证回显正常
  • 编辑已有 LightRAG 知识库,切换语言模型后验证回显和保存
  • 验证已有知识库(旧 llm_info 格式)编辑时语言模型正常显示
  • 上传大文档验证分块不超限
  • 验证离线环境下 tiktoken 正常加载

郭诣 added 8 commits May 1, 2026 16:45
- 实现了 tiktoken 离线补丁,避免运行时从互联网下载 BPE 文件
- 添加了字符型分词器作为 tiktoken 不可用时的备选方案
- 为 LightRAG 实例注入离线分词器支持
- 适配推理模型输出格式,提取被包装在推理文本中的 JSON 内容
- 解决 LightRAG 索引参数配置中 chunk_size 和 chunk_overlap 参数传递问题
- 统一文件表组件中块大小重叠参数的显示逻辑
naive_merge 不保证输出 chunk 在 token 上限内,当单行内容超过
chunk_token_num 时会产生超大 chunk,导致 LightRAG 报错
"Chunk token length 3140 exceeds chunk_token_size 1200"。

- nlp.py: 新增 hard_split_by_token_limit 公共硬切分函数
- general.py: 新增 _ensure_chunk_token_limit 兜底保护
- laws.py: 删除本地重复函数,改用 nlp 版本(DRY)
# Conflicts:
#	docker-compose.yml
V2 模型 spec 格式为 provider_id:model_id(冒号分隔),
但 handleLLMSelect 使用斜杠拆分,导致 provider 和 model_name 均为空。
新增 model_spec 字段直接存储完整 spec,抽取 parseModelSpec/buildDisplaySpec
工具函数统一处理 V1/V2 两种格式,消除 3 处重复逻辑。
@guoyi8
Copy link
Copy Markdown
Author

guoyi8 commented May 1, 2026

误操作,关闭后重新提交

@guoyi8 guoyi8 closed this May 1, 2026
@guoyi8 guoyi8 deleted the feature/cusc-202605 branch May 1, 2026 17:11
@guoyi8 guoyi8 changed the title fix: CUSC 环境适配与知识库问题修复 Test PR,忽略 May 1, 2026
@guoyi8
Copy link
Copy Markdown
Author

guoyi8 commented May 1, 2026

This pull request was opened in error and contained unintended internal configuration details.
The PR has been closed and the source branch deleted.

The correct and sanitized change is available in #672.
Please ignore this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant