Skip to content

Conversation

@kyujin-cho
Copy link
Member

@kyujin-cho kyujin-cho commented Dec 20, 2024

This PR enhances model service reloading experience by enabling user to restart model process only, not the whole container. Since this only stops and reloads the process itself, any changes applied to container spec (e.g. image, resource request, env var, ...) will not be reflected.

What's changed

Backend.AI Agent

  • Added new restart_model_service() RPC function
  • Split out model definition dictionary builder function (AbstractAgent.load_model_definition()) from AbstractAgent onto a new class ModelServiceManager
  • Updated AbstractKernel to remember informations about its model service information (model_service_info), which was considered as volatile before
  • Updated Agent to restart model service process while remaining kernel runner steady (AbstractAgent.restart_model_service())
    • Utilizes both newly introduced model_service_info and ModelServiceManager to shutdown existing model service process and replicate the recreation process, which was only done at kernel start process until now

Backend.AI Kernel

  • Added new shutdown_model_service() function

Backend.AI Manager

  • Added restart_model_service REST API and restart_model_service registry function

Checklist: (if applicable)

  • Milestone metadata specifying the target backport version

📚 Documentation preview 📚: https://sorna--3282.org.readthedocs.build/en/3282/


📚 Documentation preview 📚: https://sorna-ko--3282.org.readthedocs.build/ko/3282/

@kyujin-cho kyujin-cho added the type:feature Add new features label Dec 20, 2024
@kyujin-cho kyujin-cho added this to the 24.12 milestone Dec 20, 2024
@kyujin-cho kyujin-cho self-assigned this Dec 20, 2024
@github-actions github-actions bot added area:docs Documentations comp:manager Related to Manager component comp:agent Related to Agent component size:L 100~500 LoC labels Dec 20, 2024
@kyujin-cho kyujin-cho marked this pull request as draft December 20, 2024 16:48
@github-actions github-actions bot added size:XL 500~ LoC and removed size:L 100~500 LoC labels Dec 22, 2024
@kyujin-cho kyujin-cho changed the title feature: restart model service process feat: restart model service process Dec 22, 2024
@kyujin-cho kyujin-cho marked this pull request as ready for review December 22, 2024 07:52
@kyujin-cho kyujin-cho added the urgency:blocker IT SHOULD BE RESOLVED BEFORE NEXT RELEASE! label Dec 23, 2024
@kyujin-cho kyujin-cho changed the title feat: restart model service process feat(BA-441): restart model service process Jan 2, 2025
@HyeockJinKim HyeockJinKim force-pushed the main branch 4 times, most recently from 1a10632 to 2d8c9ea Compare November 23, 2025 14:45
@HyeockJinKim HyeockJinKim force-pushed the main branch 2 times, most recently from 9552aac to 4af738e Compare December 31, 2025 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:docs Documentations comp:agent Related to Agent component comp:manager Related to Manager component size:XL 500~ LoC type:feature Add new features urgency:blocker IT SHOULD BE RESOLVED BEFORE NEXT RELEASE!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants