O191 - Frozen Language Model Helps ECG Zero-Shot Learning

Jun Li, Che Liu, Sibo Cheng, Rossella Arcucci, Shenda Hong

The electrocardiogram (ECG) is one of the most commonly used non-invasive, convenient medical monitoring tools that assist in the clinical diagnosis of heart diseases. Recently, deep learning (DL) techniques, particularly self-supervised learning (SSL), have demonstrated great potential in the classification of ECGs. SSL pre-training has achieved competitive performance with only a small amount of annotated data after fine-tuning. However, current SSL methods rely on the availability of annotated data and are unable to predict labels not existing in fine-tuning datasets. To address this challenge, we propose \textbf{M}ultimodal \textbf{E}CG-\textbf{T}ext \textbf{S}elf-supervised pre-training (METS), \textbf{the first work} to utilize the auto-generated clinical reports to guide ECG SSL pre-training. We use a trainable ECG encoder and a frozen language model to embed paired ECGs and automatically machine-generated clinical reports separately, then the ECG embedding and paired report embedding are compared with other unpaired embeddings. In downstream classification tasks, METS achieves around 10\% improvement in performance without using any annotated data via zero-shot classification, compared to other supervised and SSL baselines that rely on annotated data. Furthermore, METS achieves the highest recall and F1 scores on the MIT-BIH dataset, despite MIT-BIH containing different classes of ECGs compared to the pre-trained dataset. The extensive experiments have demonstrated the advantages of using ECG-Text multimodal self-supervised learning in terms of generalizability and effectiveness.
Hide abstract

Oral presentation

Schedule: Wednesday, July 12: Oral session 8 - Computer-assisted diagnosis — 14:00–15:00
Wednesday, July 12: Posters — 10:15–12:00 & 15:00–16:00
Poster location: W04