Member of Technical Staff, AI Infrastructure Team
About Verda
Verda is reimagining cloud infrastructure for AI workloads. We are a full-stack AI cloud company, meaning we install, operate, and optimize our compute for training and inference of AI models.
Join Verda while it’s still being built - not once it’s finished!
Your responsibilities
In this role, you will focus on improving the networking and communication layer behind large-scale LLM training workloads. You will optimize collective communication performance across distributed GPU clusters, helping improve throughput, utilization, and reliability for communication-bound workloads.
You will debug and analyze bottlenecks across the networking stack, building tooling and infrastructure for benchmarking, profiling, and regression testing of distributed training performance.
You will work closely with training, infrastructure, hardware, and networking teams to improve how workloads scale across clusters, contributing to both system reliability and overall training efficiency.
This role is highly collaborative and research-adjacent, requiring curiosity, initiative, and willingness to go deep into low-level communication systems and distributed training infrastructure.
Your key competencies
Experience with distributed systems, networking, or large-scale ML training infrastructure
Experience with communication libraries such as NCCL, MPI, NVSHMEM, or similar technologies
Experience with profiling and debugging tools such as Nsight Systems, NCCL logs, PyTorch Profiler, or perf
Strong systems thinking and ability to analyze performance bottlenecks across distributed environments
Self-starter mindset with ability to independently define and drive technical projects
Strong curiosity about low-level systems, networking, and large-scale AI infrastructure
Representative projects
Build tools to identify NCCL bottlenecks, slow ranks, and communication tail latency
Build dashboards and regression infrastructure for training network health and performance
Implement fault-tolerance mechanisms to reduce cluster idle time and improve training efficiency
Practicalities
Location: Helsinki, Finland or London, UK
Hybrid mode: Working from either our Helsinki or London office for three days a week
Employment type: Full-time and permanent
What's next
We’re building fast and this role needs the right person behind it. There’s no artificial deadline, but when we find who we’re looking for, we move.
If this sounds like your next move, apply now.
Please submit your application through our Careers page. We don’t accept applications sent by email.
- Department
- AI Lab
- Role
- Member of Technical Staff
- Locations
- Helsinki, London
- Remote status
- Hybrid
About Verda
Verda (formerly DataCrunch) is a technology company building the next generation of cloud infrastructure for AI – compute that's instant, on-demand and at scale. Headquartered in Helsinki, the company operates globally across Europe, the US and Asia. Verda employs over 100 people from nearly 30 nationalities and has raised over $200M in total funding from investors including Lifeline Ventures, byFounders, J12 Ventures, Skaala, Varma and Tesi, alongside leading financial institutions.