How To Run Production LLM Efficiently on GPUs
December 17, 2024
15:15
14:45
Hall B
English | Advanced | Accelerating LLM Inference with TensorRT and NIM

In this session, we will explore how to achieve state-of-the-art inference using TensorRT LLM and NIM. Attendees will dive into advanced TensorRT LLM features, such as in-flight batching and KV caching, designed to accelerate large-scale LLM production systems. We’ll also review unique inference challenges, including high computational demands, latency, and throughput. Discover how TensorRT LLM efficiently optimizes LLM performance on NVIDIA GPUs and integrates seamlessly with NIMs, offering an easy-to-use inference microservice that accelerates the deployment of foundation models across any cloud platform.

Asaf Nahum
Footer Social media icons - LinkedIn
Assaf Nahum
Senior Solutions Architect
NVIDIA

Assaf joined NVIDIA in 2021 as a Senior AI Solutions Architect, where he helps Israel’s leading organizations and startups to build and implement AI technologies in various fields such as vision, simulation, natural language processing, and healthcare. Assaf holds a B.Sc. and M.Sc. in Information System Engineering from Ben-Gurion University of the Negev. His thesis focused on image processing and time-series analysis of live 3D microscopy images of cells to analyze their unique communications. Recently, he has been concentrating on generative AI, NLP applications, and the metaverse.

Cancellation Policy

Sponsor Cancellation:

In case of cancellation of the event, we will offer a full refund to all attendees and sponsors.

Attendee cancellations:

Up to 30 days prior to the event – 100% Refund 30-14 days prior to the event – 50% Refund No refund will be offered later than that.

Cancellation Policy

Sponsor Cancellation:

In case of cancellation of the event, we will offer a full refund to all attendees and sponsors.

Attendee cancellations:

Up to 30 days prior to the event – 100% Refund.
30-14 days prior to the event – 50% Refund.
No refund will be offered later than that.