OUR SECTORS
At European Tech Recruit, our sectors cover a wide range of industries within the field of technology.
tech jobs in the US or globally?
tech jobs in the US or globally?
At European Recruitment, our sectors cover a wide range of industries within the field of technology
At European Recruitment, our sectors cover a wide
range of industries within the field of technology
At European Recruitment, our sectors cover a wide
range of industries within the field of technology
Client services
Learn about the range of client services we offer at European Tech Recruit, and browse through our case sudies.
tech jobs in the US or globally?
tech jobs in the US or globally?
At European Recruitment, our sectors cover a wide range of industries within the field of technology
About us
Learn about European Tech Recruit's mission, values, our team, and our commitment to DE&I.
tech jobs in the US or globally?
tech jobs in the US or globally?
At European Recruitment, our sectors cover a wide range of industries within the field of technology
Site Reliability Engineer
What we’re looking for
We need someone with 3+ years of experience in SRE, Production Engineering, or Infrastructure roles who has built and owned automation, observability, and tooling systems end-to-end in production. You should be comfortable working across a multi-cloud environment with strong distributed systems instincts and a track record of improving platform reliability and reducing operational burden. Bonus points if you have exposure to GPU/AI-ML infrastructure or accelerated compute workloads.
What you’ll do
-
Build and own the observability stack – dashboards, alerts, and distributed tracing using tools like OpenTelemetry, Prometheus, and Grafana – to provide high-granularity visibility into Mithril’s multi-cloud GPU orchestration platform
-
Define and implement SLIs and SLOs across Mithril’s API layer and internal orchestration services, partnering with Product and Platform teams to ensure new features are designed for operability from the start
-
Develop automation in Python (or Go) to eliminate repetitive operational tasks — from provider API reconciliation to automated health checks and capacity rebalancing
-
Maintain and extend Terraform/Pulumi modules and Kubernetes configurations to manage a growing multi-cloud provider footprint
-
Participate in on-call rotation, drive rigorous root cause analysis for production incidents, and implement durable fixes to prevent recurrence
-
Work directly with the founding engineering team to shape how infrastructure engineering operates as the company scales — this is a greenfield opportunity to build the playbook, not inherit a rigid system
Apply Now
By applying to this role, you acknowledge that we may collect, store, and process your personal data on our systems.
For more information, please refer to our
Privacy
Notice