
Senior Site Reliability Engineer
Jobgether • US • Canada
No Relocation
Posted: May 12, 2026
Additional Content
Job Description
- This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior Site Reliability Engineer in United States. This is an exciting opportunity for a highly skilled Site Reliability Engineer to help build and scale the reliability foundation of a cutting-edge AI-driven platform. In this role, you will lead strategic reliability initiatives across complex cloud infrastructure, AI workloads, and developer enablement systems. You will work at the intersection of platform engineering, observability, automation, and AI operations, helping teams deliver resilient and scalable services with confidence. The environment is fast-paced, collaborative, and innovation-focused, offering strong technical ownership and leadership influence. Ideal candidates are passionate about cloud-native infrastructure, operational excellence, and enabling high-performing engineering teams. This role is fully remote within the United States and offers the chance to shape reliability practices for next-generation AI-powered systems.
- Accountabilities: Own and drive platform reliability initiatives, including defining and managing SLIs, SLOs, and error budgets across production services and AI-driven workloads. Design and implement resilient infrastructure patterns for AI pipelines, including observability, failure detection, graceful degradation, and workload isolation. Lead incident response processes, disaster recovery planning, and post-incident reviews focused on long-term operational improvements. Partner closely with Software Engineering and AI Engineering teams to establish reliability standards, deployment best practices, and scalable CI/CD workflows. Develop and maintain observability solutions using monitoring, tracing, logging, and telemetry tools to ensure visibility across services and AI operations. Manage infrastructure as code, cloud cost optimization initiatives, and automation strategies to improve operational efficiency and scalability. Build and enhance Internal Developer Platforms (IDP), service catalogs, and self-service tooling that empower engineering teams. Mentor junior and intermediate engineers, contributing to technical growth, knowledge sharing, and engineering excellence across the organization. Requirements: Bachelor’s degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience. 6–8 years of experience in Site Reliability Engineering, Platform Engineering, or DevOps with demonstrated technical leadership responsibilities. Deep expertise with AWS services, Kubernetes, Docker, Terraform, GitOps methodologies, and cloud-native infrastructure patterns. Strong experience with observability platforms, distributed tracing, monitoring systems, and operational tooling. Proficiency in Python and/or Bash scripting, along with experience supporting microservices architectures and CI/CD pipelines. Familiarity with Internal Developer Platform tools such as Backstage or similar solutions is highly desirable. Experience supporting AI/ML infrastructure, LLM integrations, or agentic systems is considered a major asset. Excellent analytical, communication, mentoring, and problem-solving skills, with the ability to navigate complex technical environments. Experience with FinOps, disaster recovery planning, policy-as-code, or regulated environments is a plus. Benefits: Competitive salary range of approximately $149,100 – $157,800 USD Comprehensive medical, dental, and vision coverage 401(k) matching program Flexible vacation policy Company-sponsored training and professional development opportunities Annual wellness and fitness reimbursement programs Inclusive and collaborative remote work environment Opportunities for community involvement and charitable engagement Access to wellness resources and employee support initiatives Occasional travel opportunities for collaboration and team engagement.
- How Jobgether works: We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team. We appreciate your interest and wish you the best! Why Apply Through Jobgether? Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time. #LI-CL1
- We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.
- apply for this job