S055 - Learning Retinal Representations from Multi-modal Imaging via Contrastive Pre-training

Emese Sükei, Elisabeth Rumetshofer, Niklas Schmidinger, Ursula Schmidt-Erfurth, Günter Klambauer, Hrvoje Bogunović

Contrastive representation learning techniques trained on large multi-modal datasets, such as CLIP and CLOOB, have demonstrated impressive capabilities of producing highly transferable representations for different downstream tasks. In the field of ophthalmology, large multi-modal datasets are conveniently accessible as retinal imaging scanners acquire both 2D fundus images and 3D optical coherence tomography to evaluate the disease. Motivated by this, we propose a CLIP/CLOOB objective-based model to learn joint representations of the two retinal imaging modalities. We evaluate our model\'s capability to accurately retrieve the appropriate OCT based on a fundus image belonging to the same eye. Furthermore, we showcase the transferability of the obtained representations by conducting linear probing and fine-tuning on several prediction tasks from OCT.
Hide abstract

Short paper

Schedule: Monday, July 10: Posters — 11:00–12:00 & 15:00–16:00
Wednesday, July 12: Virtual poster session - 8:00–9:00
Poster location: M43

Download slides