System design
Design an inference batching system for a single GPU that can handle up to 100 inputs per batch while users wait synchronously, maximizing utilization under compute constraints.
Top community answer
No community answers yet.
Contribute an answerHave a better answer?
Share your experience and earn credits toward your next interview session.
Contribute an answer