Verda is a technology company building the next generation of cloud infrastructure for AI. We operate GPU clusters across Europe, the US, and Asia, and we run some of the most demanding AI/ML workloads in production today — from frontier-model training to latency-sensitive inference at scale. We're a low-hierarchy team that ships pragmatically and gives engineers real ownership of the systems they build.

About the Role

Running ML/AI workloads in containers and serverless environments looks simple on a slide and is anything but in production. Between the object store and the GPU sit a dozen layers — image pulls, network filesystems, page cache, model loaders, runtime initialization — and each one quietly contributes to how long a workload takes to become useful and how fast it runs once it is.

We're hiring a Performance Engineer to own that surface for Verda's container and serverless GPU platforms. You'll characterize where time and throughput actually go across the stack, and turn that understanding into concrete platform improvements. Cold-start latency is one of the more visible expressions of the problem — a 70B model that takes ninety seconds to load is a ninety-second outage from the user's perspective — but the same fundamentals shape steady-state inference throughput, training step time, and checkpoint behavior. We want someone who understands how all of those parts interact.

What you’ll do

Profile and optimize the end-to-end path for containerized ML/AI workloads: image distribution, runtime startup, weight loading, inference hot path.
Design and tune the storage layer that sits between S3-compatible object stores and GPU nodes — prefetchers, caching tiers, network filesystems and local NVMe layouts.
Drive measurable wins on time-to-first-token, training step time, and cold-start latency across both internal services and customer workloads.
Benchmark and characterize real workloads, not synthetic ones, and turn the findings into platform changes.
Work across compute, networking, and platform teams to remove bottlenecks end to end, not just at one layer.
Publish internal (and occasionally external) write-ups so the rest of the org — and our customers — understand the trade-offs.
Keep up to date with the evolving ML/AI ecosystem.

What we’re looking for

Production experience making ML/AI workloads measurably faster.
Strong grasp of Linux storage and container runtime internals.
Hands-on experience with at least one distributed or network filesystem and with S3-compatible object storage — including performance-killing cases (small-object overhead, range-request patterns, eventual consistency, multipart tuning).
Comfort with model-serving runtimes (vLLM, SGLang etc) and the formats they consume (safetensors, GGUF, sharded checkpoints).
An end-to-end view: comfortable reasoning about NIC, switch, filesystem, cache, container runtime, and GPU as one system rather than separate silos.

Nice to have

Experience of systems level programming (Go, Rust or Python)
Experience of checkpoint/restore of CPU & GPU workloads
Experience with container image acceleration (eStargz, SOCI, Nydus) or building a caching layer (FUSE-based, in-kernel, or sidecar) for ML weights and datasets.
Familiarity with RDMA, GPUDirect Storage, or NVMe-oF.
Prior work on serverless GPU platforms, model registries, or Kubernetes-based ML infrastructure.
Contributions to open-source storage, ML runtime, container, or kernel projects.
Background in performance engineering on bare metal — not just clouds where someone else owns the hardware.

Practicalities

Location: Helsinki, Finland

Hybrid mode: This role requires presence in our Helsinki at least 3 days per week

Employment type: Full-time and permanent

What we offer

Cash + equity compensation along with various fringe benefits (e.g., healthcare, lunch, wellbeing, etc.). A team that takes performance seriously, real GPUs to test against, and a problem space where the wins are visible to every customer.

Performance Engineer, Containers/Serverless