Senior Production Engineer (Reliability)
CoreWeaveGPU Cloud company
Livingston$182,000 - $242,000Senior
Software Engineering
About the role
Senior Production Engineer to own and improve critical reliability tooling for CoreWeave's AI cloud.
- •Production Engineering ensures CoreWeave’s cloud delivers world-class reliability, performance, and operational excellence.
- •We are hiring a Senior Production Engineer to take direct, hands-on ownership of critical tooling that drives reliability and delivery success.
- •Key Responsibilities Take hands-on ownership of critical systems and frameworks, driving their architecture, implementation, and long-term evolution.
- •Lead end-to-end delivery of engineering projects that improve availability, scalability, operational automation, and failure recovery.
- •Build and maintain observability, alerting, automated remediation, and resilience testing for the systems you support.
- •Participate in incident response as a subject-matter expert; drive deep root-cause investigations and implement lasting fixes.
- •Ship production code regularly in Python, Go, or similar languages, and participate in on-call rotations.
- •Requirements 7+ years of engineering experience building and operating distributed systems or cloud platforms.
- •Demonstrated ability to debug complex production issues end-to-end, across services, infrastructure layers, and automation.
- •Strong programming or scripting ability (Python, Go, or similar), with experience shipping and operating production services and tools.
Tech stack
PythonGoKubernetesCI/CDDockerLinuxGitREST APIgRPC
Match insights
Tech:Python, Go, Kubernetes, CI/CD, Docker
Level:Senior