About

Welcome to my personal page! I am a Research Fellow at the National Institute of Informatics under the guidance of Prof. K Inoue, and am pursuing a Ph.D. in CS at Imperial College London, advised by Prof. A. Russo and Prof. M. Shanahan.

My research is focused on understanding Deep Learning models to improve their safety and robustness. Recently, I have been most excited by approaches to reverse-engineer the internal representations of Large Language Models through the use of Mechanistic Intepretability techniques. To this end, I co-lead the UnSearch Research Team, which aims to develop a systematic understanding of how/if transformers learn to represent goals, and how they “reason” with respect to their goals.

Alongside my research, I am also deeply passionate about teaching - having contributed to the development and instruction of several courses at Imperial College London, most notably those on ‘Maths for Machine Learning’ and ‘Deep Learning’.

I previously completed an MSc in AI and ML at Imperial, and an MPhys in Physics with Theoretical Physics at the University of Manchester. I am happy to answer questions about these courses, and to provide general advice on career progression and applications (though it may take me a few weeks to respond!).

Alex Spies