Machine Learning Engineer, ML Ops
Imagine a future where anyone can train and run large-scale AI workloads instantly - without worrying about infrastructure bottlenecks.
At Verda, we’re building a fully featured European cloud computing platform designed for high-performance AI workloads. Our mission is to make powerful compute accessible, scalable, and efficient for the teams building the future of AI.
We’re ambitious, curious, and pragmatic builders. We operate with low hierarchy, high ownership, and a strong bias for action. We’ve already achieved a lot, but we’re just getting started.
Now it’s your chance to join the ride. Join Verda while it’s still being built - not once it’s finished.
Your responsibilities
In this role, you will build and scale our internal ML platform, focusing on training and inference infrastructure running on Kubernetes. This includes developing systems for job scheduling, workflow orchestration, and reliable execution of large-scale workloads.
You will improve and maintain our inference stack, working on model packaging, serving frameworks, and performance optimization to ensure efficient and scalable deployments.
You will collaborate closely with infrastructure and engineering teams to maximize hardware performance and reliability, continuously improving how workloads run across our platform.
A key part of your role will be translating internal needs and platform insights into robust, reusable features that improve developer experience, scalability, and system efficiency.
Your key competencies
Strong ML engineering background with experience in both training and inference deployments.
Experience with software or infrastructure engineering, including CI/CD or GitOps workflows
Strong programming skills in Python (additional languages such as Golang are a plus)
Comfortable working in Linux environments, including debugging GPU performance issues (CUDA, drivers, networking, filesystems)
Strong systems thinking and ability to design scalable, reliable infrastructure
Experience with Kubernetes (operators, CRDs, job scheduling, GPU scheduling)
Nice to have
Familiarity with systems such as Kueue, Flyte, Ray, or Slurm
Proficiency with PyTorch (JAX is a plus)
Experience deploying inference workloads using vLLM, SGLang, TensorRT-LLM, or Triton
Knowledge of GPU networking and performance tuning (e.g., InfiniBand, NVLink, NCCL)
Experience working in high-performance computing or large-scale distributed systems
Practicalities
Location: Helsinki (hybrid, three days a week in the office) or remote EU
Employment type: Full-time and permanent
Why Verda
Cash + equity compensation along with various fringe benefits (e.g., healthcare, lunch, wellbeing, etc.)
Profitable operations with rapid, sustained growth
31 nationalities, with 6 different ones on the management team
An opportunity to work at the core of AI infrastructure, building systems that power large-scale training and inference workloads
What's next
We’re building fast and this role needs the right person behind it. There’s no artificial deadline, but when we find who we’re looking for, we move.
If this sounds like your next move, apply now.
Please submit your application through our Careers page. We don’t accept applications sent by email.
- Department
- Research & Development
- Role
- Machine Learning Engineer
- Locations
- Helsinki
- Remote status
- Hybrid
About Verda
Verda (formerly DataCrunch) is a technology company building the next generation of cloud infrastructure for AI – compute that's instant, on-demand and at scale. Headquartered in Helsinki, the company operates globally across Europe, the US and Asia. Verda employs over 100 people from nearly 30 nationalities and has raised over $200M in total funding from investors including Lifeline Ventures, byFounders, J12 Ventures, Skaala, Varma and Tesi, alongside leading financial institutions.