BSc CS @ RWTH Aachen Β· ML Researcher Β· Exploring diffusion models for language, but open for all interesting ideas
Wrapping up my bachelor's degree in CS at RWTH Aachen while doing research in the Machine Learning and Human Language Technology group with Prof. Ralf Schlüter, Prof. Hermann Ney, and Dr. Albert Zeyer. Currently figuring out how discrete diffusion language models can help with speech recognition. Got a first-author paper under review, and I am looking forward to starting my master's degree with a lot more to learn and build.
Diffusion language models are becoming a real alternative to autoregressive ones β they can attend bidirectionally and generate text in parallel. In this work we look at how to actually use them for speech recognition. We put together a practical guide for rescoring ASR hypotheses with masked diffusion LMs (MDLM) and uniform-state diffusion models (USDM), and also propose a new joint-decoding approach that combines CTC acoustic information with USDM language knowledge at each step. Both USDM and MDLM noticeably improve recognition accuracy.
RWTH Aachen University
KAIST (Korea Advanced Institute of Science and Technology)
HKUST (Guangzhou)
Machine Learning and Human Language Technology Group, RWTH Aachen
Eggersmann Gruppe
orionic UG
ICoM, RWTH Aachen
Experimented with two separate transformer models, Swin and ViT, to classify skin cancer from dermatoscopic images. A fun dive into medical imaging and vision transformers.
Took CLIP and adapted it for classifying plant diseases from leaf photos β compared fine-tuning vs. prompt-tuning to see which works better.
Built this for an EdTech client at orionic. It handles ~70% of support tickets on its own and cut response times by 98%. Probably the most useful thing I've shipped.
Got 3rd on the HRT case with a Sharpe of 2.7. Our pipeline stacked Ridge regression, LSTM for temporal patterns, a delayed stream model, and sentiment features.
36 hours of building market-making and arbitrage strategies at hackaTUM. Placed 2nd out of ~40 teams (1,000+ participants overall).
Built an AI voice agent for portfolio management β you talk to it, it gives you market insights and manages trades. Used Claude API and ElevenLabs.
Made a financial dashboard that pulls in market news and auto-generates reports for clients. Placed 2nd in the case.
A visual, intuitive walkthrough of how diffusion models work for text.