diff --git a/examples/LOCAL_MODELS_GUIDE.md b/examples/LOCAL_MODELS_GUIDE.md
new file mode 100644
index 00000000..276484f7
--- /dev/null
+++ b/examples/LOCAL_MODELS_GUIDE.md
@@ -0,0 +1,160 @@
+# 本地模型使用指南
+
+本指南将帮助你配置 KGGen 使用本地模型，而不是 OpenAI GPT 模型。
+
+## 支持的本地模型类型
+
+### 1. Ollama (推荐)
+
+**优点**: 易于安装和使用，支持多种开源模型，资源占用适中
+**缺点**: 需要额外安装 Ollama
+
+#### 安装步骤:
+```bash
+# 1. 安装 Ollama
+curl -fsSL https://ollama.ai/install.sh | sh
+
+# 2. 启动 Ollama 服务
+ollama serve
+
+# 3. 下载模型 (选择其中一个)
+ollama pull llama3.2        # 推荐，平衡性能和资源
+ollama pull llama3.1:8b     # 更大的模型，更好的性能
+ollama pull qwen2.5:7b      # 中文支持更好
+ollama pull mistral:7b      # 轻量级选择
+```
+
+#### 配置代码:
+```python
+LOCAL_MODEL_CONFIG = {
+    "model": "ollama/llama3.2",
+    "api_base": "http://localhost:11434",
+    "api_key": None,
+}
+```
+
+### 2. HuggingFace Transformers
+
+**优点**: 直接使用 transformers 库，无需额外服务
+**缺点**: 需要较多内存，首次使用需要下载模型
+
+#### 安装步骤:
+```bash
+pip install transformers torch
+```
+
+#### 配置代码:
+```python
+# 根据用户规则设置 HuggingFace 镜像
+os.environ["HF_ENDPOINT"] = "https://hf-mirror.com"
+
+HF_MODEL_CONFIG = {
+    "model": "huggingface/microsoft/DialoGPT-medium",
+    "api_key": os.getenv("HF_TOKEN"),  # 可选
+}
+```
+
+### 3. VLLM (高性能)
+
+**优点**: 高性能推理，支持批处理
+**缺点**: 安装复杂，需要 GPU
+
+#### 安装步骤:
+```bash
+pip install vllm
+```
+
+#### 配置代码:
+```python
+VLLM_MODEL_CONFIG = {
+    "model": "vllm/meta-llama/Llama-2-7b-chat-hf",
+    "api_base": "http://localhost:8000/v1",
+}
+```
+
+## 修改后的 basic.py 使用方法
+
+1. **选择模型配置**: 在 `basic.py` 中取消注释你想使用的模型配置
+2. **运行脚本**: `python examples/basic.py`
+
+## 常见问题排除
+
+### Ollama 相关问题
+
+**问题**: `Connection refused` 错误
+**解决**: 确保 Ollama 服务正在运行: `ollama serve`
+
+**问题**: 模型未找到
+**解决**: 确保已下载模型: `ollama pull llama3.2`
+
+**问题**: 端口冲突
+**解决**: 修改 `api_base` 为其他端口，或停止占用 11434 端口的程序
+
+### HuggingFace 相关问题
+
+**问题**: 下载速度慢
+**解决**: 已设置镜像 `HF_ENDPOINT=https://hf-mirror.com`
+
+**问题**: 内存不足
+**解决**: 选择更小的模型或增加系统内存
+
+### 性能优化建议
+
+1. **选择合适的模型大小**: 
+   - 7B 模型: 需要约 14GB 内存
+   - 13B 模型: 需要约 26GB 内存
+   - 70B 模型: 需要约 140GB 内存
+
+2. **调整参数**:
+   - `chunk_size`: 较小的值可以减少内存使用
+   - `temperature`: 设为 0.0 获得确定性结果
+   - `max_tokens`: 根据需要调整输出长度
+
+3. **硬件要求**:
+   - CPU: 至少 4 核心
+   - 内存: 至少 16GB (推荐 32GB)
+   - 存储: 至少 20GB 可用空间
+
+## 服务器环境配置
+
+根据用户规则，如果在特定服务器上运行:
+
+### 10.8.71.126 服务器
+```bash
+# 激活 Python 3.12 虚拟环境
+source /path/to/py312/bin/activate
+```
+
+### 10.8.71.44 服务器
+```bash
+# 激活 Python 3.10 虚拟环境
+source /path/to/py310/bin/activate
+# 激活 C++ 开发工具
+source /opt/rh/devtoolset-9/enable
+```
+
+## 模型推荐
+
+| 用途 | 推荐模型 | 内存需求 | 性能 | max_tokens 限制 |
+|------|----------|----------|------|----------------|
+| 快速测试 | ollama/llama3.2 | 8GB | 中等 | 4000+ |
+| 中文处理 | ollama/qwen2.5:7b | 14GB | 高 | 4000+ |
+| 高质量输出 | ollama/llama3.1:8b | 16GB | 高 | 4000+ |
+| 资源受限 | ollama/mistral:7b | 8GB | 中等 | 4000+ |
+| 中文对话 | deepseek/deepseek-chat | 云端 | 高 | **8192** |
+
+### 重要提示: max_tokens 限制
+
+不同模型对 `max_tokens` 有不同的限制：
+
+- **Deepseek 模型**: 最大 8192 tokens
+- **GPT-4/GPT-3.5**: 通常 4000-8000 tokens  
+- **开源模型 (Llama, Qwen 等)**: 通常 2000-4000 tokens
+- **GPT-5 系列**: 最小 16000 tokens
+
+如果遇到 `max_tokens` 相关错误，请：
+1. 检查模型的 token 限制
+2. 相应调整配置中的 `max_tokens` 值
+3. 考虑使用 `chunk_size` 参数分块处理长文本
+
+选择适合你的硬件配置和需求的模型即可开始使用！
\ No newline at end of file
diff --git a/examples/basic.py b/examples/basic.py
index 7c164d2f..d0a7b58b 100644
--- a/examples/basic.py
+++ b/examples/basic.py
@@ -26,21 +26,120 @@
 nose over the matter.
 """
 
-kg = KGGen()
-# with open("tests/data/kingkiller_chapter_one.txt", "r", encoding="utf-8") as f:
-#     text = f.read()
+# ========== 本地模型配置选项 ==========
+# 根据你的需求选择以下配置之一：
+
+# 选项1: 使用 Ollama 本地模型 (推荐)
+# 需要先安装并启动 Ollama: https://ollama.ai/
+# 然后下载模型: ollama pull llama3.2
+LOCAL_MODEL_CONFIG = {
+    "model": "ollama/llama3.2",  # 可选: llama3.1, qwen2.5:7b, mistral:7b 等
+    "api_base": "http://localhost:11434",  # Ollama 默认端口
+    "api_key": None,  # Ollama 不需要 API key
+    "max_tokens": 4000,  # 通用设置
+}
+
+# 选项1a: 使用 Deepseek 模型
+# DEEPSEEK_MODEL_CONFIG = {
+#     "model": "deepseek/deepseek-chat",
+#     "api_key": os.getenv("DEEPSEEK_API_KEY"),
+#     "max_tokens": 4000,  # Deepseek 最大支持 8192，建议设置为 4000
+# }
+
+# 选项2: 使用 HuggingFace 模型 (需要较多内存)
+# HF_MODEL_CONFIG = {
+#     "model": "huggingface/microsoft/DialoGPT-medium",
+#     "api_key": os.getenv("HF_TOKEN"),  # 可选，某些模型需要
+# }
+
+# 选项3: 使用 VLLM 本地部署 (高性能)
+# 需要先安装 vllm: pip install vllm
+# VLLM_MODEL_CONFIG = {
+#     "model": "vllm/meta-llama/Llama-2-7b-chat-hf",
+#     "api_base": "http://localhost:8000/v1",
+# }
+
+# 选项4: 使用原始 OpenAI GPT 模型 (需要 API key)
+# OPENAI_MODEL_CONFIG = {
+#     "model": "openai/gpt-4o",
+#     "api_key": os.getenv("OPENAI_API_KEY"),
+# }
+
+print("正在初始化 KGGen，使用本地模型...")
+print(f"模型: {LOCAL_MODEL_CONFIG['model']}")
 
-graph = kg.generate(
-    input_data=text,
-    model="openai/gpt-4o",
-    api_key=os.getenv("OPENAI_API_KEY"),
-    chunk_size=1000,
-    cluster=True,
+# 根据用户规则，如果需要从 HuggingFace 下载，设置镜像
+if "huggingface" in LOCAL_MODEL_CONFIG.get("model", ""):
+    os.environ["HF_ENDPOINT"] = "https://hf-mirror.com"
+
+# 根据模型类型设置合适的 max_tokens
+model_name = LOCAL_MODEL_CONFIG["model"].lower()
+if "deepseek" in model_name:
+    max_tokens = min(LOCAL_MODEL_CONFIG.get("max_tokens", 4000), 8192)
+elif "gpt-5" in model_name:
+    max_tokens = max(LOCAL_MODEL_CONFIG.get("max_tokens", 16000), 16000)
+else:
+    max_tokens = LOCAL_MODEL_CONFIG.get("max_tokens", 4000)
+
+print(f"设置 max_tokens: {max_tokens}")
+
+kg = KGGen(
+    model=LOCAL_MODEL_CONFIG["model"],
+    api_key=LOCAL_MODEL_CONFIG.get("api_key"),
+    api_base=LOCAL_MODEL_CONFIG.get("api_base"),
     temperature=0.0,
-    context="Kingkiller Chronicles",
-    output_folder="./examples/",
+    max_tokens=max_tokens,
 )
+
+# with open("tests/data/kingkiller_chapter_one.txt", "r", encoding="utf-8") as f:
+#     text = f.read()
+
+print("开始生成知识图谱...")
+try:
+    graph = kg.generate(
+        input_data=text,
+        chunk_size=1000,
+        cluster=True,
+        context="Kingkiller Chronicles",
+        output_folder="./examples/",
+    )
+    
+    print("知识图谱生成成功！")
+    print(f"实体数量: {len(graph.entities)}")
+    print(f"关系数量: {len(graph.relations)}")
+    
+except Exception as e:
+    error_msg = str(e)
+    print(f"生成知识图谱时出错: {e}")
+    print("\n故障排除提示:")
+    
+    if "max_tokens" in error_msg or "Invalid max_tokens" in error_msg:
+        print("❌ max_tokens 参数超出限制!")
+        print("解决方案:")
+        print("  - Deepseek 模型: 设置 max_tokens <= 8192")
+        print("  - 其他模型: 设置 max_tokens <= 4000")
+        print("  - 在配置中添加: 'max_tokens': 4000")
+    elif "Connection" in error_msg or "refused" in error_msg:
+        print("❌ 连接错误!")
+        print("解决方案:")
+        print("  - 如果使用 Ollama，请确保已安装并运行: ollama serve")
+        print("  - 检查 api_base 地址是否正确")
+        print("  - 检查防火墙设置")
+    elif "model" in error_msg.lower() and "not found" in error_msg.lower():
+        print("❌ 模型未找到!")
+        print("解决方案:")
+        print("  - 如果使用 Ollama，请下载模型: ollama pull llama3.2")
+        print("  - 检查模型名称是否正确")
+    else:
+        print("1. 如果使用 Ollama，请确保已安装并运行: ollama serve")
+        print("2. 如果使用 Ollama，请确保已下载模型: ollama pull llama3.2")
+        print("3. 如果使用 HuggingFace 模型，请确保有足够的内存")
+        print("4. 检查网络连接和防火墙设置")
+        print("5. 如果是 max_tokens 错误，请降低 max_tokens 值")
 # with open("./examples/graph.json", "r") as f:
 #     graph = Graph(**json.load(f))
 
+# 生成可视化
+print("正在生成可视化文件...")
 KGGen.visualize(graph, "./examples/basic-graph.html", True)
+print("可视化文件已保存到: ./examples/basic-graph.html")
diff --git a/examples/basic_local.py b/examples/basic_local.py
new file mode 100644
index 00000000..25433d25
--- /dev/null
+++ b/examples/basic_local.py
@@ -0,0 +1,107 @@
+from kg_gen.models import Graph  # noqa: F401
+from kg_gen import KGGen
+import json  # noqa: F401
+import os  # noqa: F401
+
+text = """
+A Place for Demons
+IT WAS FELLING NIGHT, and the usual crowd had gathered at the
+Waystone Inn. Five wasn't much of a crowd, but five was as many as the
+Waystone ever saw these days, times being what they were.
+Old Cob was filling his role as storyteller and advice dispensary. The
+men at the bar sipped their drinks and listened. In the back room a young
+innkeeper stood out of sight behind the door, smiling as he listened to the
+details of a familiar story.
+"When he awoke, Taborlin the Great found himself locked in a high
+tower. They had taken his sword and stripped him of his tools: key, coin,
+and candle were all gone. But that weren't even the worst of it, you see…"
+Cob paused for effect, "…cause the lamps on the wall were burning blue!"
+Graham, Jake, and Shep nodded to themselves. The three friends had
+grown up together, listening to Cob's stories and ignoring his advice.
+Cob peered closely at the newer, more attentive member of his small
+audience, the smith's prentice. "Do you know what that meant, boy?"
+Everyone called the smith's prentice "boy" despite the fact that he was a
+hand taller than anyone there. Small towns being what they are, he would
+most likely remain "boy" until his beard filled out or he bloodied someone's
+nose over the matter.
+"""
+
+# ========== 本地模型配置选项 ==========
+# 根据你的需求选择以下配置之一：
+
+# 选项1: 使用 Ollama 本地模型 (推荐)
+# 需要先安装并启动 Ollama: https://ollama.ai/
+# 然后下载模型: ollama pull llama3.2
+LOCAL_MODEL_CONFIG = {
+    "model": "ollama/llama3.2",  # 可选: llama3.1, qwen2.5:7b, mistral:7b 等
+    "api_base": "http://localhost:11434",  # Ollama 默认端口
+    "api_key": None,  # Ollama 不需要 API key
+}
+
+# 选项2: 使用 HuggingFace 模型 (需要较多内存)
+# HF_MODEL_CONFIG = {
+#     "model": "huggingface/microsoft/DialoGPT-medium",
+#     "api_key": os.getenv("HF_TOKEN"),  # 可选，某些模型需要
+# }
+
+# 选项3: 使用 VLLM 本地部署 (高性能)
+# 需要先安装 vllm: pip install vllm
+# VLLM_MODEL_CONFIG = {
+#     "model": "vllm/meta-llama/Llama-2-7b-chat-hf",
+#     "api_base": "http://localhost:8000/v1",
+# }
+
+# 选项4: 使用原始 OpenAI GPT 模型 (需要 API key)
+# OPENAI_MODEL_CONFIG = {
+#     "model": "openai/gpt-4o",
+#     "api_key": os.getenv("OPENAI_API_KEY"),
+# }
+
+# ========== 初始化 KGGen ==========
+print("正在初始化 KGGen，使用本地模型...")
+print(f"模型: {LOCAL_MODEL_CONFIG['model']}")
+
+# 根据用户规则，如果需要从 HuggingFace 下载，设置镜像
+if "huggingface" in LOCAL_MODEL_CONFIG.get("model", ""):
+    os.environ["HF_ENDPOINT"] = "https://hf-mirror.com"
+
+kg = KGGen(
+    model=LOCAL_MODEL_CONFIG["model"],
+    api_key=LOCAL_MODEL_CONFIG.get("api_key"),
+    api_base=LOCAL_MODEL_CONFIG.get("api_base"),
+    temperature=0.0,
+    max_tokens=4000,
+)
+
+# with open("tests/data/kingkiller_chapter_one.txt", "r", encoding="utf-8") as f:
+#     text = f.read()
+
+print("开始生成知识图谱...")
+try:
+    graph = kg.generate(
+        input_data=text,
+        chunk_size=1000,
+        cluster=True,
+        context="Kingkiller Chronicles",
+        output_folder="./examples/",
+    )
+    
+    print("知识图谱生成成功！")
+    print(f"实体数量: {len(graph.entities)}")
+    print(f"关系数量: {len(graph.relations)}")
+    
+    # 生成可视化
+    print("正在生成可视化文件...")
+    KGGen.visualize(graph, "./examples/basic-graph.html", True)
+    print("可视化文件已保存到: ./examples/basic-graph.html")
+    
+except Exception as e:
+    print(f"生成知识图谱时出错: {e}")
+    print("\n故障排除提示:")
+    print("1. 如果使用 Ollama，请确保已安装并运行: ollama serve")
+    print("2. 如果使用 Ollama，请确保已下载模型: ollama pull llama3.2")
+    print("3. 如果使用 HuggingFace 模型，请确保有足够的内存")
+    print("4. 检查网络连接和防火墙设置")
+
+# with open("./examples/graph.json", "r") as f:
+#     graph = Graph(**json.load(f))
\ No newline at end of file