Toward a Collaborative Approach for Future Media
The AOMedia Research Workshop offers a platform for AOMedia member companies and prominent academics to present their most recent research findings and explore novel concepts in media codecs, media processing, and their industry applications.
Building on the success of our first workshop, co-located with QoMEX 2023, the AOM Research Workshop Europe 2025 aims to further cultivate collaboration between AOMedia member companies and the international academic community. This continued interaction between media technology and quality of experience experts will foster new perspectives within both fields, ultimately driving the evolution of future media standards and formats.
This year’s program will feature speakers from both AOM member companies and academia. AOM members will be encouraged to discuss open problems that invite academic contributions. For talks led by academics, speakers will be invited to consider what they believe are important research questions that will have a long-lasting impact on the industry.
Attending member companies include Amazon, Google (YouTube), Meta, Netflix, Samsung and Trinity College Dublin.
The workshop will consist of presentations of 20-min long, and a panel discussion. The workshop schedule is as follows:
- 9:00-9:10h – Welcome & introduction
- 9:10-9:30h – Introduction to IAMF Technology, Content Creation and Playback (Jani Huoponen, YouTube & Woohyun Nam, Samsung)
- 9:30-9:50h – Speed Meets Quality: How AWS Elemental Built a Broadcast-Quality Live AV1 Encoder (Ramzi Khsib, Amazon AWS)
- 9:50-10:10h – AV1 Film Grain Synthesis at Netflix Scale, and What’s Next (Li-Heng Chen, Netflix)
- 10:10-10:30h – Beyond Streaming: Applying AV1 to Cinema-Grade Virtual Production (Francois Pitie & Vibhoothi, Trinity College Dublin)
- 10:30-10:50h – Coffee break
- 10:50-11:10h – AOMedia Video Compression Research Beyond AV1: AVM Video Codec Architecture (Andrey Norkin, AOM Coding Working Group & Netflix)
- 11:10-11:30h – Latest AVM Coding Gain Result (Ryan Lei, Meta)
- 11:30-11:50h – CSFs and VDPs – Psychophysical Models for Optimizing Media (Rafal Manitiuk, Cambridge University)
- 11:50-12:10h – Learned Image Compression – Recent Achievements and Opportunities (Jona Balle, New York University)
- 12:10-12:40h – Panel Discussion: What Should We Innovate on Future Media? Open and Important Problems (Jona Balle, Rafal Manitiuk, Patrick Le Callet, Ramzi Khsib, moderated by Zhi Li)
- 12:40-14:00h – Lunch
Organizers: Zhi Li (Netflix), Ioannis Katsavounidis (Meta)
Please reach out to zli@netflix.com for questions and general inquiries.
Presentation title and abstracts
Title |
Speakers | Abstract |
Bio |
Introduction to IAMF Technology, Content Creation and Playback | Jani Huoponen, YouTube & Woohyun Nam, Samsung |
IAMF (Immersive Audio Model & Formats) is an audio container specification designed to revolutionize immersive audio experiences across a wide range of applications. This presentation gives a technical overview of the IAMF features and content creation with open source tools. We will also provide an overview of how IAMF enables enhanced spatial audio effects on TVs and mobile devices, focusing on its support for dynamic down-mixing technology that adapts spatial rendering based on content genre and object audio distribution. |
Woohyun Nam – With nearly two decades of experience in media technology, Woohyun Nam specializes in AI-driven signal processing across image, video, and audio domains. Since joining Samsung Electronics in 2013, he has led numerous innovations bridging intelligent media and immersive user experiences. As Head of Spatial Audio at Samsung Research, Woohyun drives the development and standardization of Eclipsa Audio, a next-generation spatial audio technology powering lifelike sound across Samsung’s products and services. He also represents Samsung in AOMedia, contributing to the STF WG, the IAMF subgroup, and the Audio Codec WG. His work is guided by a clear vision: realizing the philosophy of Spatial Audio for Everyone.
Jani Huoponen – With 25+ years of media industry product development, Jani Huoponen is a seasoned expert in developing cutting-edge audio and video technologies for consumer devices and streaming systems. Joining Google in 2010, he’s served as a product manager across key multimedia initiatives within Chrome, YouTube, and Stadia. Currently, Jani is the product lead for Eclipsa Audio on the Chrome Open Media team, driving significant advancements in immersive audio technology. He also plays a crucial role in the AOMedia Audio Codec WG as the renderer subgroup lead and is an active contributor to the AOMedia IAMF sub-WG, focusing on the next generation of immersive audio. |
Speed Meets Quality: How AWS Elemental Built a Broadcast-Quality Live AV1 Encoder | Ramzi Khsib, Amazon AWS | In this talk, we’ll explore how AWS Elemental doubled the frame rate of an existing open source AV1 implementation to meet broadcast requirements for real-time encoding, including low latency, high throughput, HRD conformance, and superior perceptual quality. We’ll detail both compute-oriented techniques—such as novel frame parallelization approach that maintains deterministic output—and algorithmic changes including tile-aware deblocking/CDEF filters and HEVC-inspired quantization improvements that preserve texture detail. The end result is a production-ready live AV1 encoder that delivers HEVC-like visual quality with 10-15% bitrate savings and significantly lower latency than existing solutions. We will also share recommendations for future open-source codec development to better support real-time applications from the outset, including considerations for parallelization strategies, rate control architecture, and quality tuning options. |
Ramzi Khsib (He/Him) is a Principal Software Development Engineer at AWS Elemental, where he leads compression and science initiatives. With over two decades of expertise in video processing and compression, Ramzi has been recognized with five Technology & Engineering Emmy Awards for his groundbreaking contributions to the field. As the lead architect for video compression at AWS Elemental and applied science , Ramzi has been instrumental in establishing the company as a global leader in video quality. His technical expertise spans pixel processing, video filtering, perceptual quality enhancement, and machine learning applications in video technology. Ramzi’s research focuses on the intersection of video compression, computer vision, and machine learning, with a particular emphasis on achieving superior compression efficiency while minimizing computational costs. His innovative approaches have helped shape the future of video delivery at scale. |
AV1 Film Grain Synthesis at Netflix Scale, and What’s Next | Li-Heng Chen, Netflix | Film grain is a crucial artistic element, yet its stochastic nature poses significant challenges for efficient video compression. The AV1 codec’s film grain synthesis (FGS) tool offers a solution, preserving artistic intent while yielding substantial bitrate savings. This presentation will detail the unique complexities and pipeline solutions involved in deploying and optimizing AV1 film grain synthesis across Netflix’s massive content library. We’ll discuss the methodologies developed to integrate FGS into a large-scale production pipeline, demonstrating its quantifiable benefits in terms of both bitrate efficiency and faithful preservation of creative intent. Finally, we’ll explore key operational insights gained from this deployment and discuss the future evolution of film grain preservation in next-generation video coding standards. | Li-Heng Chen is a Software Engineer on the Video Algorithms team at Netflix, contributing to the video encoding pipeline. He completed his Ph.D. in 2022 at The University of Texas at Austin’s Laboratory for Image and Video Engineering (LIVE), following his B.S. and M.S. degrees in electronics and communication engineering from National Chiao Tung University (2012) and National Taiwan University (2014). Before his doctoral studies, he was a Senior Engineer on the Video Encoder Team at MediaTek Inc. (2014-2018). His research background encompasses perceptual image and video quality assessment, image and video compression, and machine learning. |
Beyond Streaming: Applying AV1 to Cinema-Grade Virtual Production | Francois Pitie & Vibhoothi, Trinity College Dublin |
While the playbook for video compression in internet streaming is now well-established, an underexplored area for video compression lies within cinema production. In particular, On-Set Virtual Production (OSVP) , a video production technique popularised by “The Mandalorian”, is a different beast, crying out for a modern compression solution. OSVP has revolutionised filmmaking by replacing traditional green screens with massive LED walls that display photorealistic backdrops. These digital environments must appear flawless when captured by the camera during filming. Because cinema professionals are notoriously allergic to compression, the industry defaults to using near-lossless codecs like HAP or NotchLC, resulting in a huge bitrate of 2-3 Gb/s for UHD content. This demands complex, costly infrastructure with a high energy footprint. This talk shows that AV1 could be very effective in this application, presenting a clear case for its adoption in high-end post-production workflows. Our core investigation started with a simple question: can modern codecs match the quality of industry-standard intermediate codecs when captured in-camera? The answer is a clear yes. Our experiments show that AV1 delivers the same perceptual quality as HAP but with a 24x bitrate reduction. However, these experiments also revealed the real barrier to adoption: in a cinema environment where quality is king, you must hit your VMAF target perfectly on the first try. Standard methods for quality assurance are simply far too slow for on-set workflows, making producers understandably hesitant to switch. To make AV1 a truly viable and reliable option for VP, we developed a novel, lightweight QP prediction model. This tool allows an encoder to hit a specific quality target efficiently, removing the guesswork and risk that has held back the adoption of modern codecs. In this talk, I’ll walk you through our case for AV1, from the initial quality validation to the development of this enabling technology. You’ll see a practical path for adopting advanced, energy-efficient compression in the demanding world of Virtual Production. |
François Pitié is an Assistant Professor at Trinity College Dublin. He has 100+ publications with 2000+ citations in Video Processing and Computer Vision. He has secured €1.5M+ in funding, and received a Google Faculty Research Award, holds several patents, and developed algorithms used by Google, Foundry, and Weta Digital worldwide. Vibhoothi is a PhD student at Trinity College Dublin (Sigmedia) researching optimized transcoding and HDR compression. He serves on the Technical Review Committees for multiple IEEE/ACM journals and conferences. He is currently Co-chair of Extended Colour-Format Focus Group in AOMedia for Next-generation AOM Video Model (AVM) codec. He is also a member of VideoLAN Non-profit organization, and Xiph.org Media Foundation. |
AOMedia Video Compression Research Beyond AV1: AVM Video Codec Architecture | Andrey Norkin, AOM Coding Working Group & Netflix | After the successful finalization of the AV1 video codec, the Alliance for Open Media (AOMedia) has embarked on advancing video compression technologies that extend beyond AV1. This ongoing collaborative effort is based on a shared reference software known as AOMedia Video Model (AVM). In response to the efforts by AOMedia participants to enhance the performance of AVM compared to AV1, the latest results indicate a significant improvement across a diverse set of video sequences. The presentation will report the main architectural and video codec tools changes that bring these improvements in the codec performance. At present, AVM employs the hybrid video codec architecture that has been instrumental in the development of video codecs for several decades. Despite that, a number of important changes have been made in AVM relative to the AV1 architecture, which made it possible to achieve AVM’s enhanced performance. These modifications and their impact on the performance will also be part of this presentation. | Andrey Norkin is a research scientist at Netflix, working on video compression algorithms, encoding techniques for OTT video streaming, and High Dynamic Range (HDR) video. Andrey has been actively contributing to the Alliance for Open Media (AOM) where he is a co-chair of the Video Codec Working Group developing new video codec technologies. He previously contributed to development of other video codec standards, such as AV1 and HEVC. He also worked on video encoders for broadcasting during his previous job at Ericsson. Andrey holds a Doctor of Science degree from Tampere University of Technology. |
Latest AVM Coding Gain Result | Ryan Lei, Meta | This talk will focus on the overview of AVM common test condition design, the coding gain progress over the past AVM release and the latest coding gain and subjective evaluation result for AVM. | Dr. Ryan Lei is currently working as a Tech Lead and video codec specialist in the Media Foundation Core Video team at Meta. His focus is on algorithms and architecture for optimizing cloud based video processing, transcoding, and delivery at large scale for various Meta products. Ryan is also the co-chair of the Alliance for Open Media (AOM) testing subgroup and is actively contributing to the standardization of AV1 and AVM. Before joining Meta, Ryan worked at Intel as a principal engineer and codec architect. He worked on algorithm implementation and architecture definition for multiple generations of hardware based video codecs, such as AVC, VP9, HEVC and AV1. Ryan received his Ph.D. in Computer Science from the University of Ottawa. His research interests include image/video processing, compression and parallel computing. |
CSFs and VDPs – Psychophysical Models for Optimizing Media | Rafal Manitiuk, Cambridge University | CSF (Contrast Sensitivity Function) explains when patterns are visible to the human eye. VDPs (Visual Difference Predictors) build on CSFs to predict visibility differences (or quality degradation) between a pair of images. I will briefly introduce two such models, castleCSF and ColorVideoVDP, and demonstrate how they can be applied to optimize the performance in a range of media applications. | |
Learned Image Compression – Recent Achievements and Opportunities | Jona Balle, New York University | Since its emergence roughly eight years ago, the field of learned image compression has attracted considerable attention from the machine learning, information theory, and signal processing communities. Data-driven coding promises better adaptation to novel types of media as well as to improved models of human perception. In this talk, I will review some of the achievements so far, such as the nonlinear transform coding (NTC) framework, which has led to the JPEG AI standard. I will then shift my focus to perceptual modeling, a field of growing importance, and the relatively novel area of overfitted compression. The technique has been gaining traction in 3D scene compression, and is in some sense analogous to the hybrid coding framework found in many commercial codecs. I will demonstrate how overfitted codecs, together with powerful perceptual models, can deliver large gains in lossy compression performance with a relatively modest increase in decoding complexity over hybrid codecs. |
Jona Ballé is an Associate Professor at New York University, studying data compression, information theory and models of visual perception. They defended their master’s and doctoral theses on signal processing and image compression under the supervision of Jens-Rainer Ohm at RWTH Aachen University in 2007 and 2012, respectively. This was followed by a brief collaboration with Javier Portilla at CSIC in Madrid, Spain, and a postdoctoral fellowship at New York University’s Center for Neural Science with Eero P. Simoncelli, where Jona studied the relationship between perception and image statistics. While there, they pioneered using machine learning for end-to-end optimized image compression – this work ultimately led to the JPEG AI standard, finalized in 2025. From 2017 to 2024, Jona deepened their ties to industry as a Research Scientist at Google, before returning to NYU. Jona has served as a reviewer for top-tier publications in both machine learning and image processing, such as NeurIPS, ICLR, ICML, Picture Coding Symposium, and several IEEE Transactions journals. They have been active as a co-organizer of the annual Challenge on Learned Image Compression (CLIC) since 2018, and on the program committee of the Data Compression Conference (DCC) since 2022. |