Imagine a future where anyone can train and run large-scale AI workloads instantly - without worrying about infrastructure bottlenecks.

At Verda, we’re building a fully featured European cloud computing platform designed for high-performance AI workloads. Our mission is to make powerful compute accessible, scalable, and efficient for the teams building the future of AI.

We’re ambitious, curious, and pragmatic builders. We operate with low hierarchy, high ownership, and a strong bias for action. We’ve already achieved a lot, but we’re just getting started.

Now it’s your chance to join the ride. Join Verda while it’s still being built - not once it’s finished.

Your responsibilities

In this role, you will build and scale our internal ML platform, focusing on training and inference infrastructure running on Kubernetes. This includes developing systems for job scheduling, workflow orchestration, and reliable execution of large-scale workloads.

You will improve and maintain our inference stack, working on model packaging, serving frameworks, and performance optimization to ensure efficient and scalable deployments.

You will collaborate closely with infrastructure and engineering teams to maximize hardware performance and reliability, continuously improving how workloads run across our platform.

A key part of your role will be translating internal needs and platform insights into robust, reusable features that improve developer experience, scalability, and system efficiency.

Your key competencies

Strong ML engineering background with experience in both training and inference deployments.
Experience with software or infrastructure engineering, including CI/CD or GitOps workflows
Strong programming skills in Python (additional languages such as Golang are a plus)
Comfortable working in Linux environments, including debugging GPU performance issues (CUDA, drivers, networking, filesystems)
Strong systems thinking and ability to design scalable, reliable infrastructure
Experience with Kubernetes (operators, CRDs, job scheduling, GPU scheduling)

Nice to have

Familiarity with systems such as Kueue, Flyte, Ray, or Slurm
Proficiency with PyTorch (JAX is a plus)
Experience deploying inference workloads using vLLM, SGLang, TensorRT-LLM, or Triton
Knowledge of GPU networking and performance tuning (e.g., InfiniBand, NVLink, NCCL)
Experience working in high-performance computing or large-scale distributed systems

Practicalities

Location: Helsinki (hybrid, three days a week in the office) or remote EU

Employment type: Full-time and permanent

Why Verda

Cash + equity compensation along with various fringe benefits (e.g., healthcare, lunch, wellbeing, etc.)
Profitable operations with rapid, sustained growth
31 nationalities, with 6 different ones on the management team
An opportunity to work at the core of AI infrastructure, building systems that power large-scale training and inference workloads

What's next

We’re building fast and this role needs the right person behind it. There’s no artificial deadline, but when we find who we’re looking for, we move.

If this sounds like your next move, apply now.

Please submit your application through our Careers page. We don’t accept applications sent by email.

Machine Learning Engineer, ML Ops