Event Timeslots (1)
Track 3 – 2024
-
Presenter: Max De Jong
Abstract: With the explosion of interest in machine learning, model weights are available for many custom architectures trained on very specific tasks. Often, the repositories storing these models were created to demonstrate performance on a standard task-specific benchmark at the expense of practical considerations required for these models to be useful in real-world applications. Common issues include managing dependencies (particularly CUDA with deep learning models), exposing models as endpoints, orchestrating multiple microservices around different models, scaling up web servers to handle concurrent requests, and scaling down GPU instances for cost optimization. In this talk, I will address solutions to these common problems using technology supported by AWS as well as AWS-native solutions. These challenges and associated solutions will be concretely grounded by a recent project exploring 3D pose estimation in videos. I will share the trajectory of this work, from unconnected code repositories to an on-prem service and finally to a scalable service hosted on AWS. I will document both specific solutions to problems encountered and also focus on broader takeaways to hopefully help interested community members avoid common pitfalls in hosting machine learning solutions.
AWS Services: S3, EC2, Fargate, Gateway, Lambda, ECR, ECS
Audience: Beginner
angelo