✍️ Add: translate Agent-Leaderboard #111

jiminAn · 2025-11-17T11:26:07Z

작업내용
신규 포스팅 작성 : Agent Leaderboard: Evaluating AI Agents in Multi-Domain Scenarios
블로그 원문 : https://huggingface.co/blog/pratikbhavsar/agent-leaderboard

_posts/2025-11-17-agent-leaderboard.md

Jwaminju

안녕하세요! 발표를 듣지 못해 아쉽지만, 좋은 내용 정리 감사합니다!
몇가지 리뷰했습니다!
선택적으로 반영해주셔도 좋아요!

_posts/2025-11-17-agent-leaderboard.md

Jwaminju · 2025-11-20T13:06:37Z

PR을 한 번 rebase해야 build가 될 것 같네용!

sim-so

전체적으로 읽기 쉬운 문장이었던 것 같아요! type 위주로 확인해서 코멘트 남겼습니다!

_posts/2025-11-17-agent-leaderboard.md

hyeonseo2

좋은 번역과 발표 감사합니다! 특히, 모델 성능 관련 파트가 표로 정리되어 있어서 더 가독성이 좋았습니다 👍 번역 용어 관련해서 아래 몇 가지 리뷰를 남겼습니다.

_posts/2025-11-17-agent-leaderboard.md

eehyo

글을 깔끔하게 잘 작성해주셔서 읽기 좋았습니다! 섹션 나누어 부연 설명 있는 것도 좋았습니다! 전체적인 단어 흐름 통일 위주로만 보았습니다:)

eehyo · 2025-11-23T07:03:22Z

_posts/2025-11-17-agent-leaderboard.md

+
+3. **평가 지표 정의(Metric Definition)**: 도구 선택의 정확성과 매개변수 사용의 질을 모두 평가하는 **도구 선택 품질(TSQ)**을 주요 지표로 설정합니다. 이 지표는 실제 환경에서 요구되는 성능을 포착하도록 신중하게 설계되었습니다.
+
+4. **데이터셋 구성(Dataset Curation)**: 기존 벤치마크 데이터셋을 전략적으로 샘플링하여 균형 잡히고, 다중 도메인을 아우르는 평가용 데이터셋을 만들빈다. 이 데이터셋은 기본 함수 호출부터 복잡한 다중 턴 상호작용까지 모두 테스트할 수 있어 에이전트 능력을 포괄적으로 평가합니다.


Suggested change

4. **데이터셋 구성(Dataset Curation)**: 기존 벤치마크 데이터셋을 전략적으로 샘플링하여 균형 잡히고, 다중 도메인을 아우르는 평가용 데이터셋을 만들빈다. 이 데이터셋은 기본 함수 호출부터 복잡한 다중 턴 상호작용까지 모두 테스트할 수 있어 에이전트 능력을 포괄적으로 평가합니다.

4. **데이터셋 구성(Dataset Curation)**: 기존 벤치마크 데이터셋을 전략적으로 샘플링하여 균형 잡히고, 다중 도메인을 아우르는 평가용 데이터셋을 만들빈다. 이 데이터셋은 기본 함수 호출부터 복잡한 멀티 턴 상호작용까지 모두 테스트할 수 있어 에이전트 능력을 포괄적으로 평가합니다.

민주님이 다중 턴 -> 멀티 턴으로 수정하신 것에 맞게 통일시켜 보았습니다!

_posts/2025-11-17-agent-leaderboard.md

Co-authored-by: wony617 <[email protected]>

Co-authored-by: Sohyun Sim <[email protected]>

Co-authored-by: Hyeonseo Yun <[email protected]>

Co-authored-by: eehyo <[email protected]>

안지민 added 4 commits November 16, 2025 11:40

✍️ Add: Agent-Leaderboard

546a3ed

✍️ update: draft translation

c87c348

✍️ update: polish translation draft

31e7c25

✍️ update: final revision

5544251

Jwaminju reviewed Nov 20, 2025

View reviewed changes

_posts/2025-11-17-agent-leaderboard.md Outdated Show resolved Hide resolved

Jwaminju reviewed Nov 20, 2025

View reviewed changes

Jwaminju force-pushed the main branch 4 times, most recently from 657a070 to 3ddf137 Compare November 20, 2025 14:53

sim-so requested changes Nov 22, 2025

View reviewed changes

hyeonseo2 reviewed Nov 22, 2025

View reviewed changes

eehyo requested changes Nov 23, 2025

View reviewed changes

jiminAn and others added 16 commits November 28, 2025 21:13

Update _posts/2025-11-17-agent-leaderboard.md

9e1d855

Co-authored-by: wony617 <[email protected]>

Update _posts/2025-11-17-agent-leaderboard.md

0b91d4c

Co-authored-by: wony617 <[email protected]>

Update _posts/2025-11-17-agent-leaderboard.md

e2c564a

Co-authored-by: wony617 <[email protected]>

Update _posts/2025-11-17-agent-leaderboard.md

c986e65

Co-authored-by: wony617 <[email protected]>

Update _posts/2025-11-17-agent-leaderboard.md

ca537a4

Co-authored-by: wony617 <[email protected]>

Update _posts/2025-11-17-agent-leaderboard.md

5952706

Co-authored-by: wony617 <[email protected]>

Update _posts/2025-11-17-agent-leaderboard.md

6ba74b3

Co-authored-by: wony617 <[email protected]>

Update _posts/2025-11-17-agent-leaderboard.md

83a30bc

Co-authored-by: wony617 <[email protected]>

Update _posts/2025-11-17-agent-leaderboard.md

3d4529a

Co-authored-by: Sohyun Sim <[email protected]>

Update _posts/2025-11-17-agent-leaderboard.md

8c1e1b8

Co-authored-by: Sohyun Sim <[email protected]>

Update _posts/2025-11-17-agent-leaderboard.md

622f4df

Co-authored-by: Sohyun Sim <[email protected]>

Update _posts/2025-11-17-agent-leaderboard.md

32780a2

Co-authored-by: Sohyun Sim <[email protected]>

Update _posts/2025-11-17-agent-leaderboard.md

05dde2e

Co-authored-by: Sohyun Sim <[email protected]>

Update _posts/2025-11-17-agent-leaderboard.md

8ea7fb2

Co-authored-by: Sohyun Sim <[email protected]>

Update _posts/2025-11-17-agent-leaderboard.md

ba2a49d

Co-authored-by: Sohyun Sim <[email protected]>

Update _posts/2025-11-17-agent-leaderboard.md

1246e0a

Co-authored-by: Hyeonseo Yun <[email protected]>

jiminAn and others added 9 commits November 28, 2025 21:18

Update _posts/2025-11-17-agent-leaderboard.md

858517d

Co-authored-by: Hyeonseo Yun <[email protected]>

Update _posts/2025-11-17-agent-leaderboard.md

47935c0

Co-authored-by: Hyeonseo Yun <[email protected]>

Update _posts/2025-11-17-agent-leaderboard.md

645fcf9

Co-authored-by: Hyeonseo Yun <[email protected]>

Update _posts/2025-11-17-agent-leaderboard.md

0739d71

Co-authored-by: eehyo <[email protected]>

Update _posts/2025-11-17-agent-leaderboard.md

2c159e0

Co-authored-by: eehyo <[email protected]>

Update _posts/2025-11-17-agent-leaderboard.md

301cfe4

Co-authored-by: eehyo <[email protected]>

Update _posts/2025-11-17-agent-leaderboard.md

ed03f43

Co-authored-by: eehyo <[email protected]>

Update _posts/2025-11-17-agent-leaderboard.md

65c4dd2

Co-authored-by: eehyo <[email protected]>

Update _posts/2025-11-17-agent-leaderboard.md

08f19f0

Co-authored-by: eehyo <[email protected]>

jiminAn requested a review from sim-so November 28, 2025 12:20

jiminAn merged commit b92b877 into main Nov 28, 2025


		3. 평가 지표 정의(Metric Definition): 도구 선택의 정확성과 매개변수 사용의 질을 모두 평가하는 도구 선택 품질(TSQ)을 주요 지표로 설정합니다. 이 지표는 실제 환경에서 요구되는 성능을 포착하도록 신중하게 설계되었습니다.

		4. 데이터셋 구성(Dataset Curation): 기존 벤치마크 데이터셋을 전략적으로 샘플링하여 균형 잡히고, 다중 도메인을 아우르는 평가용 데이터셋을 만들빈다. 이 데이터셋은 기본 함수 호출부터 복잡한 다중 턴 상호작용까지 모두 테스트할 수 있어 에이전트 능력을 포괄적으로 평가합니다.

✍️ Add: translate Agent-Leaderboard #111

✍️ Add: translate Agent-Leaderboard #111

Uh oh!

Conversation

jiminAn commented Nov 17, 2025

Uh oh!

Uh oh!

Jwaminju left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Jwaminju commented Nov 20, 2025

Uh oh!

sim-so left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hyeonseo2 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eehyo left a comment

Choose a reason for hiding this comment

Uh oh!

eehyo Nov 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants