Senior/Staff DevOps Engineer

Learn to Win

Learn to Win

Software Engineering

Remote

Posted on May 19, 2026

About Ethos

Ethos is on a mission to bridge the human readiness gap by transforming how training is developed, consumed, and aligned with strategic business outcomes. As a well-funded Series A startup ($40M+ raised), we’re a trusted partner to 150+ enterprise customers across the U.S. military, life sciences, manufacturing, supply chain, and professional sports.

We’re expanding our engineering team to deliver a best-in-class learning platform—smarter, faster, and more optimized. We’ve gone all-in on AI tooling in our development process, and we’re accepting and expanding upon the best new practices for creating software in this era.

About the Role

You’ll lead the deployment and operationalization of our SaaS products across Commercial Cloud, government networks, and bespoke/air-gapped customer environments. As a Senior engineer, you’ll own end-to-end infrastructure delivery, elevate DevOps practices, and collaborate closely with Software and Product. As a Staff engineer, you’ll additionally shape platform engineering strategy, set technical direction for distributed systems at scale, and influence design patterns that enable AI workloads and complex data pipelines. You’ll treat AI tooling as core to your daily workflow — for IaC, pipelines, incident response, and toil reduction — and help shape the agentic operations patterns and AI workloads our platform runs.

If you love solving hard deployment problems, care deeply about security and reliability, can scale modern cloud platforms with rigor, and embrace AI-augmented operations as the way forward, this role is for you.

What You’ll Do

  • Design & Operate the Platform: Architect, implement, and run secure, scalable, multi-tenant infrastructure (infra as code, immutable artifacts, GitOps).
  • AI-Augmented Operations & Platform Work: Use AI coding and agentic tools (Claude Code, Cursor, Copilot, MCP-based ops agents) for IaC authoring, pipeline development, log/trace analysis, postmortem drafting, and toil reduction; build and improve agentic workflows for the team.
  • CI/CD & Release Engineering: Build and harden pipelines (build, test, scan, sign, promote, deploy) for multi-environment delivery—including disconnected/air-gapped workflows.
  • Observability & Reliability: Establish SLOs; instrument systems for metrics/logs/traces; drive incident response and postmortems; reduce MTTR and change failure rate.
  • Security & Compliance by Design: Integrate supply-chain security (SBOMs, signing, provenance), secrets management, and baseline hardening (CIS/STIG-aligned).
  • Cost & Performance: Optimize infrastructure spend and performance (capacity planning, autoscaling, right-sizing, storage/egress strategies).
  • Technical Leadership: Lead design reviews, author RFCs, mentor engineers, and raise the quality bar for platform changes.
  • Gov/Constrained Deployments: Support IL-4/IL-5-aligned patterns, RMF documentation support, and offline artifact promotion processes where needed.
  • (Staff) Strategy & Standards: Define platform roadmaps, establish consistent deployment and infrastructure patterns, and guide cross-team adoption of best practices.

Measures of Success (First 6–12 Months)

  • Availability & Reliability: Meet or exceed service SLOs; reduce MTTR by ≥30%.
  • Delivery Velocity: Increase deployment frequency by ≥2× while keeping change failure rate ≤15%.
  • Pipeline Efficiency: Cut CI pipeline duration by ≥25% and reduce flaky tests significantly.
  • Security Posture: Achieve ≥95% pass rate for supply-chain/security gates (image signing, SBOM scans, vulnerability thresholds); reduce MTTR for CVEs to ≤14 days for high severity.
  • Cost & Drift: Deliver ≥15% infra cost savings without performance regressions; keep infra drift near zero via GitOps and policy as code.
  • Gov/Offline Readiness: Stand up an artifact promotion flow (build → scan → sign → export) suitable for disconnected deployments with documented runbooks.

30/60/90 Day Plan

First 30 Days — Map & Baseline

  • Deep-dive on current cloud topology, CI/CD, observability, security controls, and on-call.
  • Inventory build and runtime artifacts; document deployment environments and promotion paths.
  • Baseline reliability and delivery metrics (SLOs, MTTR, deploy frequency, CFR, pipeline timing).
  • Establish and prove the effectiveness of your personal workflow with AI tooling.

60 Days — Design & Deliver

  • Harden CI/CD: add SBOM generation, signing (e.g., Cosign/Sigstore), and policy gates.
  • Implement or refine infrastructure modules (Terraform) and Helm/Kustomize charts with GitOps flows.
  • Establish service SLOs and golden signals; wire alerts and dashboards for top services.
  • Pilot artifact export/import flow for air-gapped/disconnected deployments; write runbooks.

90 Days — Scale & Standardize

  • Standardize CI/CD pipelines and infrastructure modules across existing services.
  • Migrate priority services to hardened delivery paths; deprecate legacy workflows.
  • Land cost/performance wins (e.g., autoscaling policies, instance/storage class right-sizing).

Basic Qualifications

  • 5+ years building and operating cloud platforms; 3+ years deploying SaaS in production.
  • Strong with Terraform, Helm/Kustomize, and containers (Docker, Kubernetes).
  • Deep AWS experience (e.g., VPC, EKS, EC2, S3, RDS, ECR, IAM/KMS, Route 53; CloudFront desirable).
  • CI/CD expertise (e.g., GitHub Actions, CircleCI, or Argo Workflows) and GitOps (Argo CD or Flux).
  • Observability across metrics, logs, and traces (e.g., Prometheus/Grafana, OpenTelemetry, ELK).
  • Proven track record in IaC, scalable system design, and quality tooling (automated tests, canaries/blue-green, feature flags).
  • Excellent communication; comfortable partnering with Product, Security, and Customer teams.
  • Thrives in a startup environment—ownership, autonomy, and pragmatic delivery.
  • Active, fluent use of AI development/operations tools as part of your daily workflow.
  • Secret Clearance or eligibility and willingness to obtain one.

Preferred Qualifications

  • Supply-chain security (SBOMs, SLSA concepts, image signing, provenance) and vulnerability management (e.g., Trivy/Grype, Snyk; Chainguard experience a plus).
  • Experience identifying/mitigating CVEs and setting policy thresholds.
  • Background with DoD/regulated customers; familiarity with IL-4/IL-5, Platform One patterns, and RMF documentation workflows.
  • Knowledge of STIG/CIS hardening, air-gapped architectures, and offline update mechanisms.
  • Experience operating AI/ML workloads in production (GPU scheduling, model artifact management, inference serving, vector DBs, queuing/streaming) or building agentic ops workflows / MCP-based integrations (alert triage, runbook automation, IaC review agents).

Tooling you might touch

We use technologies similar to and including some of these to build our products:

  • AI development tools (Claude Code, Cursor, GitHub Copilot, MCP servers);Terraform modules; Helm/Kustomize; Kubernetes (EKS); GitHub Actions/Workflows; Argo CD/Flux; Docker/OCI; Prometheus/Grafana, Datadog, OpenTelemetry; Loki/ELK; LaunchDarkly/Flagsmith; Cosign/Sigstore, Trivy/Grype/Snyk; AWS (VPC, EKS, EC2, S3, RDS, ECR, IAM/KMS, Route 53, CloudFront); HashiCorp Vault/Parameter Store/Secrets Manager.

Compensation & Benefits

  • Competitive base salary (Senior: $150k-$190k; Staff: $170k-210k) based on location and experience with significant equity upside
  • Subsidized health insurance, 401(k), life insurance, and cell phone stipend.
  • Remote-first culture with up to 10% travel for offsites.
  • Work eligibility: Applicants must be authorized to work in the U.S.

One Final Note

We’re committed to building a diverse, inclusive, and authentic workplace. If you’re excited about this role but your experience doesn’t perfectly align with every qualification, please apply—you may be just the right candidate.

EEO & accommodations: Ethos is an Equal Opportunity Employer. We welcome applicants of all backgrounds and provide reasonable accommodations throughout the hiring process.