Hi there, my name is

Lucas Beyer

I'm a self-taught hacker and studied scientist dedicated to the creation of awesomeness
currently living, working, loving and playing in Zürich, Switzerland.
Awesomeness: noun Helping computers and robots understand the world.

B A

«Les études, l'arme standard du pays qu'on a du mal à faire tirer, donc on s'éduque seul.» - Alonzo

Things I've done

Academic

I'm currently a Staff Research Scientist at Google DeepMind (formerly Brain) in Zürich, co-leading our multimodal research effort and codebase.

I work closely with Xiaohua Zhai and Alexander Kolesnikov.

Publications

I have a growing list of publications at top tier conferences such as CVPR, NeurIPS, ICCV, ... See my Google Scholar or Semantic Scholar pages for the full list of over 50. However, here's a few of my favourite publications that you may have heard of, with a one-sentence TL;DR:
  • PaliGemma: A small sota VLM (SigLIP + Gemma-2B) for transfer.
  • NoFilter: Don't filter image-text data; that makes the resulting model less culturally diverse.
  • CapPa: Captioning pre-training fixes all of contrastive pre-training's issues (like binding).
  • RL-tuning: Fine-tune modern vision models with a short RL significantly improves them (concurrent to RLHF in NLP).
  • SigLIP: Sigmoid instead of SoftMax for CLIP is more scalable, open-sourcing best vision encoder and image-text model.
  • FlexiViT: Randomize patch-size during ViT training to have many model sizes in one.
  • Efficiency Misnomer: Don't plot only one of #params, FLOPs, or wall-clock. Show all three!
  • Scaling ViT: ViTs scale very well, sota on ImageNet, largest vision model by far at the time.
  • Patient and Consistent Distillation: Distill for super long, and with consistent inputs, works like magic: best ever ResNet50 with 83% on ImageNet.
  • MLP-Mixer: sota architecture based solely on MLPs. No attention, no convolution.
  • Vision Transformer (ViT): if you're here, you know this.
  • Are we done with ImageNet?: ImageNet ReaL labels, completely fixed validation set labels; multiple labels per image possible.
  • Big Transfer (BiT): Breakthrough in scaling vision models: need to scale model "diagonally", dataset, and duration to reap all benefits.
  • S4L: Very simply combining self-sup learning with semi-sup leads to sota semi-sup models.
  • Revisiting SSL: Self Supervised Learning alone is kinda doomed. Everything is too brittle.
  • In Defense of the Triplet loss: Triplet loss is actually great for ReID.
  • Biternion Nets: Use a normalized 2D output layer, von-Mises loss, and get great continuous orientation from rough discrete labels.

Talks and lectures

I'm regularly giving invited talks on anthing vision and multimodal, as well as teaching lectures on Transformers and Multimodal models. Reach out if you'd like me to give a talk or lecture. In case you're curious, I keep a list of talks, lectures, and orals I've held.

Projects

  • ca 2021-now: Co-leading the multimodal (vision-language) research team and infra at Google Brain and DeepMind.
  • ca 2018-2020: Research on pre-training and architectures for vision models at Google Brain Zürich.
  • Early 2018: Writing up my PhD thesis while chilling (erm.. melting) in Bangkok, Thailand.
  • 2014-2017: Strands: A service robot for security and elderly care trying to learn long-term patterns.
  • 2014-2017: Spencer: A robot trying to navigate and understand airports.
  • Summer 2017: Disentangling representations of identity and incidentals in the features learned by FaceNet for improving various downstream prediction tasks at Google.
  • Fall 2016: Robots learning from human demonstration at Kindred.
  • Summer 2016: Successfully solved image-gaze (predict what people in an image are looking at) at Google. (Link for Googlers: go/image-gaze.)
  • Winter 15/16: DROW: A detector in laser-data, for robots, using CNNs. (Preprint, more to follow.)
  • Summer 2015: Biternion Nets: Weakly Supervised Continuous Head Pose Regression using CNNs. (Video, GCPR 2015 preprint, DLSS 2015 poster, code, dataset mirror.)
  • August 2013: Paper about "GWAS on GPUs: Streaming Data from HDD for Sustained Performance". (EuroPar 2013 preprint and slides.)
  • July 2012: Diploma thesis: Exploiting Graphics Adapters for Computational Biology. (Code)
  • June 2012: Talk about high-performance genome studies (GWAS) at SIAM conference. (Extended slides with intro to genetics and high performance computing.)
  • April 2011: Project work (German) about Data-driven modelling of protein-protein interaction or, more precisely, analysis of the docking behaviour of PARP10 and PARP14.

Short Bio

For copy-paste for talk announcements and similar, as I've been asked for one quite frequently:

Lucas grew up in Belgium wanting to make video games and their AI. He went on to study mechanical engineering at RWTH Aachen in Germany, then did a PhD in robotic perception and computer vision there too. Now, he is a staff research scientist at Google DeepMind (formerly Brain) in Zürich, leading multimodal vision-language research.

And here's a picture, in case you need one.

Hobby

This is all very old, as most of my hobby time after studies/PhD has turned into one of: Time with my kid, more research, DOTA2, blogging (see below).

Libraries

  • DeepFried2: a Torch7-inspired deep-learning library on top of Theano.
  • PyDenseCRF: Python wrapper to Philipp Krähenbühl's dense (fully connected) CRFs with gaussian edge potentials.
  • Bouge: C++ skeletal animation library.
  • Go Colorful: A Go (golang) library for working with colors.
  • libheatmap: A high-performance C heatmap creation library. Has been used in at least 4 commercial products. Colorschemes.
  • PickerInputView: An iOS library for replacing the keyboard by a picker (selection wheel).
  • D3 BoundingBox: A D3.js component for making any element draggable and resizable. Demo.
  • CherryPy Spam Protector: A drop-in CherryPy tool protecting a handler from spam.
  • aTest: Light C++ unittesting framework. (Currently included in Arkana-FTS. To be released standalone. Go for Catch!)

Applications

  • Arkana-FTS: RTS game in the makings, big, slow and long-term project. Currently on ice.
  • Memory Which Does Not Suck: Offspring of a hackathon project, which Vladislav Supalov (aka th4t) and me keep polishing from time to time.
  • DotA-heroes: A web-app for choosing the right hero in DotA. (Note: The hero database is not complete yet. Try, for example, Magina the anti-mage.)
  • DotA-wards: A web-app teaching where to place wards in DotA. (Note: Database unfinished, I can't keep up with the game's map changes!)
  • My Ludum Dare 25 compo entry. Of course it sucks, I only had 24 hours to work on it due to real-life stuff. I'm glad people still had some short-lived fun with this.

Labs / microtools

  • 2DVis: interactive visualization if image datasets embedded into 2D. Currently available demos with model-free (pixel) embeddings of the QMUL head-pose dataset using: RGB t-SNE, RGB MDS, or Luv MDS.
    Outcome: tSNE always shows you pretty structure even when there really isn't.
  • Quaternion converter: Convert quaternion wxyz into axis-angle, euler, and matrix representations.
  • QMUL most similar: My very first foray into detecting neardups between train and test splits of the QMUL dataset.
    Outcome: This dataset is too easy: most test images have near duplicates in train, because it's a random split of frames from a video. This majorly marked my obsession for detecting and removing near-duplicates.
  • Confusion Matrix Viewer that also (lazily) shows all images from an entry of the matrix.
    Outcome: I don't think I learned anything from this.
  • cprod1 viz: My very first attempt at visualizing a text (products) dataset. Nothing remarkable.
  • Hero distances: Visualize the distance between heroes over time throughout a DOTA2 game. This was fun to make: parse replay files, and project distances to 1D live with my FastMap.js and two fixed pivot points: the ancients.
    Outcome: Just a fun visualization. Did not spend more time to make it pretty. We can clearly see ganks and pudge hooks.
  • timeupdate: A demo for tracking and sync'ing timestamps in a html/js video player. This was the start of a video annotation tool.

Contributions

I also try to contribute back to most Open-Source projects that I use.

Writing

Articles

Articles are lengthy writeups supposed to teach you something if you take the time to fully read them.

Snips

Snips are short, roughly screen-sized solutions to problems I encountered which I share in the hope they'll save someone time.

Curriculum Vitae

This is a modern CV: click on any section of my life to learn more about it, or expand all sections now.

However, it might be outdated. I also have a classical CV as a document, which you can also download as PDF in case you need one.

Education

PhD Student in Computer Vision

  • Subject: Deep Learning for Computer Vision on mobile robots, focus on low annotation effort.
  • Supervised by Prof. Dr. Bastian Leibe at the VCI, RWTH Aachen University.
  • Research projects: STRANDS and SPENCER.
  • Key technologies: Deep Learning in PyTorch, Theano, Tensorflow; ROS (Python and C++), LateX, OpenCV.

PhD Student in High-Performance Computing

  • Subject: High-performance Density Functional Theory.
  • Jointly supervised by Prof. Paolo Bientinesi, PhD and Prof. Dr. Stefan Bluegel at the AICES, RWTH Aachen University and the PGI-1/IAS-1, Forschungszentrum Jülich.
  • The aim was to implement a flexible DFT simulation system which allows researchers to quickly prototype new simulations on their laptop and then run those on a supercomputer without much additional overhead.
  • I didn't finish this PhD and changed into the field of computer vision, simply because quantum physics weren't fun to me.

Dipl.Ing. Student in Computational Engineering Science

  • At RWTH Aachen University (Germany)
  • Graduated as a Dipl.Ing. with a grade of 1.3.
  • Obtained state-scholarship in the years 2010 — 2011.
  • Project thesis: Data-based modelling of protein-protein interaction, graded 1.3.
    • This involved highly optimized brute-force search for patterns as well as optimization based on genetic-algorithms.
  • Final thesis: Exploiting Graphics Accelerators for Computational Biology, graded 1.0.
    • This was about solving a generalized least-squares problem (ANOVA) for huge amounts of data on the GPU.
    • This necessitates tricks such as triple-buffered processing and making use of multiple GPUs.

Schoolkid at Athénée César Franck (Belgium)

Employment History

Staff Research Scientist at Google Brain/DeepMind, Zürich, Switzerland

  • Fundamental Research on representation learning, computer vision, architectures, ...
  • Many, many publications and otherwise impactful works.

Intern doing research at Google, Venice, Los Angeles

  • Disentangling representations learned by FaceNet to improve prediction tasks (go/2fn).

AI Intern at Kindred, Toronto

  • Worked on robots learning to autonomously solve a task demonstrated by a human.
  • Got my hands dirty at all of robotics, data gathering, and learning deep models.
  • Since the startup is very secretive, there's not more to say.

Intern doing research at Google, Venice, Los Angeles

  • Worked on image-gaze: determining what people in an image are looking at.
  • Gathered and orchestrated annotation of a huge amount of data.
  • Designed, implemented and trained a complex deep network using TensorFlow.
  • Internally disseminated result and pipeline to be used in many applications.
  • Googlers can view the impressive results at go/image-gaze.

Student research assistant at AICES, RWTH Aachen University

  • Took care of a virtual reality 3D projector with glasses, sensors and 3D interaction devices.
  • Key technologies: C++, modern OpenGL, ZeroMQ, Python, Solaris.
  • The setup consisted of 8 workstations, each projecting a screen and controlled by a 9th master.
  • Wrote a library (from scratch) facilitating the use of plain OpenGL for virtual-reality.
  • Got headache whenever my math wasn't correct ☺
  • Wrote a multi-head video player.

Intern programmer at Mint medical GmbH, Heidelberg

  • Key technologies: C++, Qt, MITK (includes ITK and VTK), CMake.
  • Implemented state-of-the-art segmentation algorithm for volumetric data such as CT and MRI.
  • Incorporated into the mint Lesion™ product.

Student research assistant at LFB, RWTH Aachen University

  • Computer-assisted diagnosis for early stage pleural mesothelioma.
  • Key technologies: C++, Qt, MITK (includes ITK and VTK), OpenCL.
  • Turned research prototypes into a robust user-friendly application for oncologists.
  • See the project's website, though at time of writing the screenshots posted there are of the old research prototype, not of the end-user application!

Coach of the RWTH Aachen University ice-hockey team

  • Coached the University's competitive ice-hockey team.
  • Successfully managed up to 25 people.

Programmer at Digatron Power Electronics GmbH, Aachen

  • Worked on control systems for test and formation equipment for all kinds of batteries, ranging from batteries for mobile phones to automotive batteries to huge submarine batteries.
  • Key technologies: C++, MFC, RogueWave Stingray, .Net and MSSQL.
  • Designed and implemented an application from the ground up to the final deployment phase.
  • Designed and implemented new features in the core product.
  • Designed and implemented internal tooling.

Summer camp counselor, La Calamine

Awards

Skills and Qualifications

Programming Languages

  • Proficient in C++: more than 15 years of experience in open-source, industry and academia. In-depth knowledge of the STL, object-oriented concepts, design patterns, template metaprogramming, …
  • Proficient in Python: almost 10 years of experience in open-source and academia. Deep understanding of the LISP-y parts and, of course, duck-typing.
  • Very good knowledge of Julia, {Java|Coffee}script and HTML+CSS.
  • Past practical experience with Go, PHP, D, Clojure, Mathematica, Prolog, Matlab, C# and unfortunately Java and M[Sy]SQL.

Frameworks and Tools

These are tools, libraries and frameworks I have extensive practical experience using and actually enjoy(ed) using, grouped by domain. The many tools I've used only once or found painful are not listed.

  • Tooling: Vim, Git, SVN, CMake, Make, gcc, msvc++, valgrind, bash, fish, tmux, …
  • Gamedev: OpenGL (modern, GLSL), SDL, SFML, OpenAL, Löve, blender.
  • Number crunching: BLAS, LAPACK, OpenMP, CUDA, OpenCL, NumPy, Theano, TensorFlow, PyTorch.
  • Application development: Qt, MFC, .Net.
  • Webdev: Docker, CherryPy, MongoDB, MySQL (MariaDB), Nginx, Apache.
  • Most awesomest: Jupyter (ex IPython notebook).

Natural Languages

  • French and German: Mother tongues.
  • English: Proficient (RWTH Aachen University, CEFR level C1, Grade 1.7).
  • Dutch: Basic spoken and written knowledge.
  • Thai: Basic spoken knowledge.

Teaching

Teaching Assistant for "Advanced Machine Learning."

  • At RWTH Aachen University, Germany.
  • Employed by the Computer Vision research group, part of VCI.
  • Co-designed the deep-learning part from scratch.

Seminar Organizer for "Image Processing."

  • At RWTH Aachen University, Germany.
  • Employed by the Computer Vision research group, part of VCI.
  • Guided students through understanding and presenting the basics of image processing and computer vision.
  • With the help of Alexander Hermans.

Seminar Co-organizer for "3D Computer Vision with Kinect."

  • At RWTH Aachen University, Germany.
  • Employed by the Computer Vision research group, part of VCI.
  • Guided students through a wide variety of recent research papers about the usage of 3D cameras.
  • Together with Alexander Hermans and Umer Rafi.

Seminar Co-organizer for "Topics in High-Performance and Scientific Computing."

  • At RWTH Aachen University, Germany.
  • Employed by the HPAC research group, part of AICES.
  • Chose seminar topics and guided students together with Prof. Paolo Bientinesi, PhD.

Teaching Assistant for "Languages for Scientific Computing."

  • At RWTH Aachen University, Germany.
  • Employed by the HPAC research group, part of AICES.
  • Taught students about:
    • Prototyping numerical matrix-computations in MATLAB.
    • Symbolic, functional and logic programming in Mathematica.
    • Python and NumPy as an open alternative to MATLAB.
    • Number-crunching in C, including the low-level details of floating-points.

Tutor for "Systematic Software Engineering."

  • At RWTH Aachen University, Germany.
  • Employed by the SWC research group.
  • Taught students how to design and implement robust and maintainable software.
  • Topics ranged from requirements analysis (use-case diagrams) to modelling UML class-diagrams to various kinds of testing, down to actually writing the code.

Tutor for "Simulation technology for mechanical engineers."

  • At RWTH Aachen University, Germany.
  • Employed by both the AVT and the CATS research groups.
  • Taught students to describe the dynamics of mechanical and flow systems.
  • Taught students to simulate those in both MATLAB and OpenFOAM.

Tutor for "Simulation technology 2."

  • At RWTH Aachen University, Germany.
  • Employed by the AVT research group.
  • Taught students to obtain partial differential equations describing the dynamics of physical systems.
  • Taught students to implement those PDEs in both MATLAB and Simulink.

Tutor for "Datastructures and algorithms."

  • At RWTH Aachen University, Germany.
  • Employed by the Computer Graphics and Multimedia research group.
  • Explained all kinds of, well, datastructures and algorithms to students.
  • Topics ranged from lists/trees/graphs to geometric algorithms like convex hulls and segment intersections.

Tutor for "Introduction to programming."

  • At RWTH Aachen University, Germany.
  • Employed by the Programming Languages and Verification research group.
  • Taught programming to students using Java☹ and C++.
  • Topics covered basics as well as object-oriented and functional programming and recursion.

Contact

You can contact me by appending gmail's domain to this hostname, or running the following Julia statement:

join(["lucasb", "eyer", "be", join(["gmail", "com"], '.')], '.', '\u40')

Or, if you're more of a Pythonista (or just lazy), the following one is pretty similar:

'.'.join(['lucasb', 'eyer', '\x40'.join(['be', 'gmail']), 'com'])

In case you wonder what I look like, so we can sit down for a (Belgian) beer should we ever encounter, this is me.