Recommendations to avoid model thrashing? #7937

TimothySeah · 2025-01-14T23:20:47Z

TimothySeah
Jan 14, 2025

I have a cluster of triton servers. Each of these loads a different model depending on the request it receives. However, because these models are large, there is a lot of "model thrashing" i.e. we waste time loading/unloading models that are too large to all fit in gpu memory. I was wondering if there is a general/canonical solution to this? For example, is there an easy way to route requests requiring a specific model to pods that already have that model loaded? Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recommendations to avoid model thrashing? #7937

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Recommendations to avoid model thrashing? #7937

TimothySeah Jan 14, 2025

Replies: 0 comments

TimothySeah
Jan 14, 2025