ML Infrastructure Engineer

Maven Robotics
Maven Robotics

Software Engineering, Other Engineering, Data Science

Posted on Jun 22, 2026
Role Description

We are looking to recruit an exceptional Infrastructure Engineer to own and build the backend systems that power machine learning at Maven Robotics. In this role, you will design and scale the core infrastructure used by our AI and robotics teams to manage data, run compute workloads, store artifacts, monitor systems, and support rapidly growing engineering workflows.

You should be excited about distributed systems, backend services, data infrastructure, GPU compute, and high-reliability internal platforms. The ideal candidate has successfully built and operated similar systems before and can independently drive complex infrastructure projects from architecture through production operation. The underlying systems may be sophisticated, but the interfaces and workflows they expose should be reliable, intuitive, and easy for engineers to use.

In this role you will:

  • Own the architecture, implementation, reliability, and evolution of Maven’s machine learning infrastructure.
  • Build backend services and platforms for managing data, artifacts, jobs, logs, metadata, and compute resources across cloud and on-premise environments.
  • Design scalable systems for workload orchestration, storage, observability, security, and infrastructure automation.
  • Build intuitive internal tools and abstractions that make complex infrastructure easy for engineers to use.
  • Lead technical and commercial discussions with cloud and ML compute providers, including capacity planning, performance, reliability, and cost.
    Qualifications

    Must-have:

    • Significant experience designing, building, and operating production backend, distributed, or compute infrastructure.
    • A track record of independently owning complex infrastructure projects from architecture through deployment and ongoing operation.
    • Strong programming ability in Python, Go, Rust, C++, or a similar backend or systems language.
    • Experience operating GPU compute infrastructure and orchestrating distributed workloads using Kubernetes, Ray, ZenML, or similar systems.
    • Experience designing and operating storage systems, observability platforms, infrastructure-as-code, and secure access controls.
    • Experience managing large-scale GPU fleets or hybrid cloud and on-premise compute environments.
    • Experience building internal developer platforms, CLIs, SDKs, or other self-service infrastructure tools.
    • Strong technical judgment, leadership, and communication skills, with the ability to drive decisions across teams and external partners.
    • Self-starter attitude with the ability to identify priorities and deliver durable solutions in a fast-paced startup environment.

    Nice-to-have:

    • Familiarity with GPU architecture, accelerator-aware software design, and profiling compute-intensive workloads.
    • Exposure to infrastructure supporting large-scale robot learning workloads, including policy training, simulation, and multimodal data pipelines.
    • Familiarity with SOC 2 controls, security practices, and audit readiness.
    Apply Now