Seminar on September 30, 2021
Title
New vision architectures beyond CNNs
Speaker
Dr. Alexander Kolesnikov, Google Brain
When and Where
September 30, 2021, 17:00-18:00
Online, for registration see IARAI website
Abstract
Convolutional Neural Networks (CNNs) have been dominating the field of computer vision for almost a decade. In this talk I will present two recent papers that propose new and highly competitiv earchitecture classes for computer vision. In the first part I will present the Vision Transformer model (ViT), which is almost identical to the standard transformer model used in natural language processing, but happens to work surprisingly well for vision applications. In the second part of the talk, I will present the MLP-Mixer model: an all-MLP architecture for vision. It can be seen as a simplified ViT model without self-attention layers. Nevertheless, it also demonstrates strong results across a wide range of vision applications.
Speaker’s Short Bio
Dr. Alexander Kolesnikov is a researcher in the Google Brain team. His current research interests include visual representation learning and data-efficient adaptation algorithms. Previously, Alexander obtained a MSc degree in applied mathematics and programming from the Moscow State University and a PhD degree in computer science from IST Austria.