Computer Science,
Control and
Geoinformation Doctorate

Seminar on September 30, 2021

Title

New vision architectures beyond CNNs

Speaker

Dr. Alexander Kolesnikov, Google Brain

When and Where

September 30, 2021, 17:00-18:00
Online, for registration see IARAI website

Abstract

Convolutional Neural Networks (CNNs) have been dominating the field of computer vision for almost a decade. In this talk I will present two recent papers that propose new and highly competitiv earchitecture classes for computer vision. In the first part I will present the Vision Transformer model (ViT), which is almost identical to the standard transformer model used in natural language processing, but happens to work surprisingly well for vision applications. In the second part of the talk, I will present the MLP-Mixer model: an all-MLP architecture for vision. It can be seen as a simplified ViT model without self-attention layers. Nevertheless, it also demonstrates strong results across a wide range of vision applications.

Speaker’s Short Bio

Dr. Alexander Kolesnikov is a researcher in the Google Brain team. His current research interests include visual representation learning and data-efficient adaptation algorithms. Previously, Alexander obtained a MSc degree in applied mathematics and programming from the Moscow State University and a PhD degree in computer science from IST Austria.