Things I've done
Academic
I'm currently a Staff Research Scientist at Google DeepMind (formerly Brain) in Zürich, co-leading our multimodal research effort and codebase.
I work closely with Xiaohua Zhai and Alexander Kolesnikov.
Publications ¶
I have a growing list of publications at top tier conferences such as CVPR, NeurIPS, ICCV, ...
See my Google Scholar or Semantic Scholar pages for the full list of over 50.
However, here's a few of my favourite publications that you may have heard of, with a one-sentence TL;DR:
- PaliGemma: A small sota VLM (SigLIP + Gemma-2B) for transfer.
- NoFilter: Don't filter image-text data; that makes the resulting model less culturally diverse.
- CapPa: Captioning pre-training fixes all of contrastive pre-training's issues (like binding).
- RL-tuning: Fine-tune modern vision models with a short RL significantly improves them (concurrent to RLHF in NLP).
- SigLIP: Sigmoid instead of SoftMax for CLIP is more scalable, open-sourcing best vision encoder and image-text model.
- FlexiViT: Randomize patch-size during ViT training to have many model sizes in one.
- Efficiency Misnomer: Don't plot only one of #params, FLOPs, or wall-clock. Show all three!
- Scaling ViT: ViTs scale very well, sota on ImageNet, largest vision model by far at the time.
- Patient and Consistent Distillation: Distill for super long, and with consistent inputs, works like magic: best ever ResNet50 with 83% on ImageNet.
- MLP-Mixer: sota architecture based solely on MLPs. No attention, no convolution.
- Vision Transformer (ViT): if you're here, you know this.
- Are we done with ImageNet?: ImageNet ReaL labels, completely fixed validation set labels; multiple labels per image possible.
- Big Transfer (BiT): Breakthrough in scaling vision models: need to scale model "diagonally", dataset, and duration to reap all benefits.
- S4L: Very simply combining self-sup learning with semi-sup leads to sota semi-sup models.
- Revisiting SSL: Self Supervised Learning alone is kinda doomed. Everything is too brittle.
- In Defense of the Triplet loss: Triplet loss is actually great for ReID.
- Biternion Nets: Use a normalized 2D output layer, von-Mises loss, and get great continuous orientation from rough discrete labels.
Talks and lectures ¶
I'm regularly giving invited talks on anthing vision and multimodal, as well as teaching lectures on Transformers and Multimodal models.
Reach out if you'd like me to give a talk or lecture.
In case you're curious, I keep a list of talks, lectures, and orals I've held.
Projects ¶
- ca 2021-now: Co-leading the multimodal (vision-language) research team and infra at Google Brain and DeepMind.
- ca 2018-2020: Research on pre-training and architectures for vision models at Google Brain Zürich.
- Early 2018: Writing up my PhD thesis while chilling (erm.. melting) in Bangkok, Thailand.
- 2014-2017: Strands: A service robot for security and elderly care trying to learn long-term patterns.
- 2014-2017: Spencer: A robot trying to navigate and understand airports.
- Summer 2017: Disentangling representations of identity and incidentals in the features learned by FaceNet for improving various downstream prediction tasks at Google.
- Fall 2016: Robots learning from human demonstration at Kindred.
- Summer 2016: Successfully solved image-gaze (predict what people in an image are looking at) at Google. (Link for Googlers: go/image-gaze.)
- Winter 15/16: DROW: A detector in laser-data, for robots, using CNNs. (Preprint, more to follow.)
- Summer 2015: Biternion Nets: Weakly Supervised Continuous Head Pose Regression using CNNs. (Video, GCPR 2015 preprint, DLSS 2015 poster, code, dataset mirror.)
- August 2013: Paper about "GWAS on GPUs: Streaming Data from HDD for Sustained Performance". (EuroPar 2013 preprint and slides.)
- July 2012: Diploma thesis: Exploiting Graphics Adapters for Computational Biology. (Code)
- June 2012: Talk about high-performance genome studies (GWAS) at SIAM conference. (Extended slides with intro to genetics and high performance computing.)
- April 2011: Project work (German) about Data-driven modelling of protein-protein interaction or, more precisely, analysis of the docking behaviour of PARP10 and PARP14.
Short Bio ¶
For copy-paste for talk announcements and similar, as I've been asked for one quite frequently:
Lucas grew up in Belgium wanting to make video games and their AI.
He went on to study mechanical engineering at RWTH Aachen in Germany,
then did a PhD in robotic perception and computer vision there too.
Now, he is a staff research scientist at Google DeepMind (formerly Brain) in Zürich,
leading multimodal vision-language research.
And here's a picture, in case you need one.
Hobby
This is all very old, as most of my hobby time after studies/PhD has turned into one of: Time with my kid, more research, DOTA2, blogging (see below).
Libraries ¶
- DeepFried2: a Torch7-inspired deep-learning library on top of Theano.
- PyDenseCRF: Python wrapper to Philipp Krähenbühl's dense (fully connected) CRFs with gaussian edge potentials.
- Bouge: C++ skeletal animation library.
- Go Colorful: A Go (golang) library for working with colors.
- libheatmap: A high-performance C heatmap creation library. Has been used in at least 4 commercial products. Colorschemes.
- PickerInputView: An iOS library for replacing the keyboard by a picker (selection wheel).
- D3 BoundingBox: A D3.js component for making any element draggable and resizable. Demo.
- CherryPy Spam Protector: A drop-in CherryPy tool protecting a handler from spam.
- aTest: Light C++ unittesting framework. (Currently included in Arkana-FTS.
To be released standalone. Go for Catch!)
Applications ¶
- Arkana-FTS: RTS game in the makings, big, slow and long-term project. Currently on ice.
- Memory Which Does Not Suck: Offspring of a hackathon project, which Vladislav Supalov (aka th4t) and me keep polishing from time to time.
- DotA-heroes: A web-app for choosing the right hero in DotA. (Note: The hero database is not complete yet. Try, for example, Magina the anti-mage.)
- DotA-wards: A web-app teaching where to place wards in DotA. (Note: Database unfinished, I can't keep up with the game's map changes!)
- My Ludum Dare 25 compo entry. Of course it sucks, I only had 24 hours to work on it due to real-life stuff. I'm glad people still had some short-lived fun with this.
Labs / microtools ¶
- 2DVis: interactive visualization if image datasets embedded into 2D.
Currently available demos with model-free (pixel) embeddings of the QMUL head-pose dataset using:
RGB t-SNE,
RGB MDS, or
Luv MDS.
Outcome: tSNE always shows you pretty structure even when there really isn't.
- Quaternion converter: Convert quaternion wxyz into axis-angle, euler, and matrix representations.
- QMUL most similar: My very first foray into detecting neardups between train and test splits of the QMUL dataset.
Outcome: This dataset is too easy: most test images have near duplicates in train, because it's a random split of frames from a video. This majorly marked my obsession for detecting and removing near-duplicates.
- Confusion Matrix Viewer that also (lazily) shows all images from an entry of the matrix.
Outcome: I don't think I learned anything from this.
- cprod1 viz: My very first attempt at visualizing a text (products) dataset. Nothing remarkable.
- Hero distances: Visualize the distance between heroes over time throughout a DOTA2 game. This was fun to make: parse replay files, and project distances to 1D live with my FastMap.js and two fixed pivot points: the ancients.
Outcome: Just a fun visualization. Did not spend more time to make it pretty. We can clearly see ganks and pudge hooks.
- timeupdate: A demo for tracking and sync'ing timestamps in a html/js video player. This was the start of a video annotation tool.
Contributions ¶
I also try to contribute back to most Open-Source projects that I use.