- 14:45
- -
- 15:15
In this session, we will explore how to achieve state-of-the-art inference using TensorRT LLM and NIM. Attendees will dive into advanced TensorRT LLM features, such as in-flight batching and KV caching, designed to accelerate large-scale LLM production systems. We’ll also review unique inference challenges, including high computational demands, latency, and throughput. Discover how TensorRT LLM efficiently optimizes LLM performance on NVIDIA GPUs and integrates seamlessly with NIMs, offering an easy-to-use inference microservice that accelerates the deployment of foundation models across any cloud platform.
Assaf joined NVIDIA in 2021 as a Senior AI Solutions Architect, where he helps Israel’s leading organizations and startups to build and implement AI technologies in various fields such as vision, simulation, natural language processing, and healthcare. Assaf holds a B.Sc. and M.Sc. in Information System Engineering from Ben-Gurion University of the Negev. His thesis focused on image processing and time-series analysis of live 3D microscopy images of cells to analyze their unique communications. Recently, he has been concentrating on generative AI, NLP applications, and the metaverse.