-
Notifications
You must be signed in to change notification settings - Fork 7
将运行体检中心的python版本转换为Go语言版本 #67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
- 添加Go语言版本的完整健康检查系统 - 包含Python版本的原始实现 - 提供完整的文档和安装指南 - 支持服务发现、指标监控、异常检测和告警功能 - 包含测试脚本和运行脚本
CodeAgent PR ReviewThanks for submitting this health center monitoring system! I've conducted a comprehensive review of the code quality, architecture, and potential improvements. Review Progress:
🏗️ Architecture & DesignStrengths:
Areas for Improvement:
🔍 Code Quality ReviewGo Implementation (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @wqyhahaha, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
此拉取请求旨在将现有的Python健康检查中心系统完全迁移至Go语言。此次重构的核心目标是利用Go语言在性能、并发处理和部署简易性方面的优势,同时确保所有原有功能得以保留。通过这一转变,系统将变得更加健壮和高效,能够更好地监控服务指标并进行异常检测。
Highlights
- Python版本重写为Go语言: 将原有的Python运行体检中心系统完全重写为Go语言版本,旨在提升性能和并发处理能力。
- 新增Go项目管理文件: 引入了Go模块(go.mod, go.sum)进行依赖管理,并新增了Makefile和run.bat脚本以简化构建和运行流程。
- 核心功能Go语言实现: 核心功能模块,包括健康检查逻辑、Mock服务器、启动脚本和测试脚本,均已用Go语言重新实现。
- 完善的文档支持: 提供了详细的Go语言安装指南(INSTALL.md)和全面的项目总结(PROJECT_SUMMARY.md),方便用户快速上手和理解项目。
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
本次代码审查主要关注从 Python 到 Go 的迁移实现。整体来看,Go 版本的代码结构清晰,功能完整,成功复刻了 Python 版本的功能。然而,在代码实现、项目结构和测试方法上存在一些可以改进的地方。
主要建议包括:
- Go 代码健壮性: 在网络请求中加入
context
以支持超时和取消,提高程序的健壮性。 - 性能优化: 在核心工作流
HealthCheckWorkflow
中使用 goroutine 并行处理检测任务,以提升大规模服务检测的效率。 - 代码规范与最佳实践: 避免在多个文件中重复定义数据结构;使用 Go 推荐的测试框架 (
testing
包) 替代自定义测试脚本;遵循 Go 1.20+ 的math/rand
使用方式。 - 脚本和文档: 修正
Makefile
和文档中与平台相关的命令,确保跨平台兼容性;解决启动脚本中孤儿进程的问题。 - Python 代码问题: 修复了
health_check_workflow
中可能导致TypeError
的 bug,并指出了其他可以改进的地方。
这些修改将有助于提升代码质量、性能和可维护性。
if not services: | ||
logger.error("未发现任何服务,退出检测") | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
当 get_all_services
返回空列表时,health_check_workflow
函数会隐式返回 None
。然而,调用方(main
函数)期望得到一个字典,并尝试访问 result['total_checks']
,这将导致 TypeError: 'NoneType' is not subscriptable
的严重错误。函数应该确保在所有分支上都返回一致的数据类型。即使没有服务,也应该返回一个包含 total_checks: 0
的字典。
if not services: | |
logger.error("未发现任何服务,退出检测") | |
return | |
if not services: | |
logger.error("未发现任何服务,退出检测") | |
return { | |
'total_checks': 0, | |
'anomaly_count': 0, | |
'services': [] | |
} |
def health_check_workflow(self): | ||
"""运行体检中心主流程""" | ||
logger.info("🏥 开始运行体检中心检测...") | ||
|
||
# 1. 服务发现 | ||
services = self.get_all_services() | ||
if not services: | ||
logger.error("未发现任何服务,退出检测") | ||
return | ||
|
||
# 2. 遍历服务和指标进行检测 | ||
total_checks = 0 | ||
anomaly_count = 0 | ||
|
||
for service in services: | ||
logger.info(f"🔍 检测服务: {service}") | ||
|
||
for metric in self.metrics_to_check: | ||
total_checks += 1 | ||
logger.info(f" 📊 检测指标: {metric}") | ||
|
||
# 3. 获取指标数据 | ||
metric_data = self.fetch_metric_data(service, metric) | ||
if metric_data is None: | ||
continue | ||
|
||
# 4. 异常检测 | ||
is_anomaly = self.anomaly_detection(metric_data) | ||
|
||
# 5. 告警处理 | ||
if is_anomaly: | ||
anomaly_count += 1 | ||
self.trigger_alert(service, metric, metric_data) | ||
|
||
# 6. 输出检测总结 | ||
logger.info(f"✅ 检测完成: 共检测 {total_checks} 个指标,发现 {anomaly_count} 个异常") | ||
|
||
return { | ||
'total_checks': total_checks, | ||
'anomaly_count': anomaly_count, | ||
'services': services | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def test_health_center(): | ||
"""测试体检中心功能""" | ||
print("=" * 60) | ||
print("🧪 测试运行体检中心") | ||
print("=" * 60) | ||
|
||
# 检查服务器 | ||
if not check_mock_server(): | ||
return False | ||
|
||
# 创建体检中心实例 | ||
health_center = HealthCheckCenter() | ||
|
||
# 测试服务发现 | ||
print("\n1️⃣ 测试服务发现...") | ||
services = health_center.get_all_services() | ||
print(f" 发现服务: {services}") | ||
|
||
# 测试指标数据获取 | ||
print("\n2️⃣ 测试指标数据获取...") | ||
if services: | ||
service = services[0] | ||
metric = 'latency' | ||
data = health_center.fetch_metric_data(service, metric) | ||
if data: | ||
print(f" ✅ 成功获取 {service}/{metric} 数据") | ||
print(f" 数据示例: {data['data']['result'][0]['values'][:2]}...") | ||
else: | ||
print(f" ❌ 获取 {service}/{metric} 数据失败") | ||
|
||
# 测试完整工作流程 | ||
print("\n3️⃣ 测试完整工作流程...") | ||
result = health_center.health_check_workflow() | ||
|
||
print("\n📊 测试结果:") | ||
print(f" 总检测数: {result['total_checks']}") | ||
print(f" 异常数量: {result['anomaly_count']}") | ||
print(f" 检测服务: {', '.join(result['services'])}") | ||
|
||
return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func (h *HealthCheckCenter) GetAllServices() ([]string, error) { | ||
url := fmt.Sprintf("%s/v1/servers", h.BaseURL) | ||
|
||
resp, err := h.HTTPClient.Get(url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func (h *HealthCheckCenter) HealthCheckWorkflow() (*CheckResult, error) { | ||
h.Logger.Info("🏥 开始运行体检中心检测...") | ||
|
||
// 1. 服务发现 | ||
services, err := h.GetAllServices() | ||
if err != nil { | ||
h.Logger.WithError(err).Error("未发现任何服务,退出检测") | ||
return nil, err | ||
} | ||
|
||
if len(services) == 0 { | ||
h.Logger.Error("未发现任何服务,退出检测") | ||
return nil, fmt.Errorf("未发现任何服务") | ||
} | ||
|
||
// 2. 遍历服务和指标进行检测 | ||
totalChecks := 0 | ||
anomalyCount := 0 | ||
|
||
for _, service := range services { | ||
h.Logger.WithField("service", service).Info("🔍 检测服务") | ||
|
||
for _, metric := range h.MetricsToCheck { | ||
totalChecks++ | ||
h.Logger.WithFields(logrus.Fields{ | ||
"service": service, | ||
"metric": metric, | ||
}).Info("📊 检测指标") | ||
|
||
// 3. 获取指标数据 | ||
metricData, err := h.FetchMetricData(service, metric, 1) | ||
if err != nil { | ||
h.Logger.WithError(err).WithFields(logrus.Fields{ | ||
"service": service, | ||
"metric": metric, | ||
}).Error("获取指标数据失败") | ||
continue | ||
} | ||
|
||
// 4. 异常检测 | ||
isAnomaly := h.AnomalyDetection(metricData) | ||
|
||
// 5. 告警处理 | ||
if isAnomaly { | ||
anomalyCount++ | ||
h.TriggerAlert(service, metric, metricData) | ||
} | ||
} | ||
} | ||
|
||
// 6. 输出检测总结 | ||
h.Logger.WithFields(logrus.Fields{ | ||
"total_checks": totalChecks, | ||
"anomaly_count": anomalyCount, | ||
}).Info("✅ 检测完成") | ||
|
||
result := &CheckResult{ | ||
TotalChecks: totalChecks, | ||
AnomalyCount: anomalyCount, | ||
Services: services, | ||
} | ||
|
||
return result, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"values": generate_time_series() | ||
}, | ||
{ | ||
"metric": { | ||
"__name__": name, | ||
"service": service, | ||
"version": version, | ||
"instance": "localhost:8081" | ||
}, | ||
"values": generate_time_series() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
在 get_metrics
函数中,generate_time_series()
被调用了两次,这不仅效率低下,而且会导致两个版本的指标数据完全相同。建议只调用一次,并将结果存储在一个变量中,然后在响应中重复使用该变量。
"values": generate_time_series() | |
}, | |
{ | |
"metric": { | |
"__name__": name, | |
"service": service, | |
"version": version, | |
"instance": "localhost:8081" | |
}, | |
"values": generate_time_series() | |
"values": time_series_data | |
}, | |
{ | |
"metric": { | |
"__name__": name, | |
"service": service, | |
"version": version, | |
"instance": "localhost:8081" | |
}, | |
"values": time_series_data |
|
||
if __name__ == '__main__': | ||
print("Mock server starting on http://localhost:8080") | ||
app.run(host='0.0.0.0', port=8080, debug=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# 清理构建文件 | ||
clean: | ||
@echo "🧹 清理构建文件..." | ||
rm -rf bin/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set GOOS=linux | ||
set GOARCH=amd64 | ||
go build -o health-center-linux main.go health_check_center.go |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// return true | ||
// } | ||
|
||
return false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这是一个用Go语言重写的运行体检中心系统,用于定时检测系统运行指标。
功能特性
项目结构
快速开始
1. 环境要求
2. 安装依赖
3. 启动Mock服务器
服务器将在
http://localhost:8080
启动,提供以下API端点:GET /v1/servers
- 获取服务器列表GET /v1/metrics/:service/:name
- 获取指标数据GET /health
- 健康检查GET /
- 服务信息4. 运行健康检查
方式一:使用启动脚本(推荐)
然后选择运行模式: