Site Reliability Engineer
About the job
• Develop and enhance software applications and configuration to better align with operational needs. Collaborate closely with the development team to achieve the company’s overarching goals.
• Deploy, maintain, and optimize our comprehensive observability stack, including metrics, logs, and traces. Design and refine alerting strategies to transition from reactive monitoring to proactive.
• Manage and provision cloud infrastructure using modern Infrastructure as Code tools.
• Leverage innovative GenAI tools to boost SRE efficiency. This involves developing and maintaining systems that utilize AI for in-depth data analysis, automated incident diagnostics, and improved deployment reliability checks.
• Participate in on-call rotation to ensure production reliability.
Requirements
- A Bachelor’s degree in Computer Science, Engineering, or 1+ years of experience in a relevant technical operations or platform role.
- Possess a solid understanding of core SRE concepts and cloud computing principles.
- Demonstrate skill in at least one modern programming or scripting language (e.g., Python, Java, Bash) for automation and tooling development.
- Experience working within Windows, Linux, or Unix environments.
- Proven ability to approach complex, ambiguous production issues with a systematic, data-driven methodology.
🔍 ATS Optimization Keywords
Below are skills and terms extracted directly from this job posting to improve Applicant Tracking System (ATS) visibility. This unique feature helps candidates tailor their applications more effectively — a feature exclusive to JobTailor job listings.
Hard Skills
- software development
- cloud infrastructure management
- Infrastructure as Code
- programming
- scripting
- data analysis
- automated incident diagnostics
- deployment reliability
- observability
- alerting strategies
Soft Skills
- collaboration
- problem-solving
- systematic approach
- data-driven methodology
- adaptability
- communication
- critical thinking
- proactive monitoring
- efficiency improvement
- production reliability
Certifications & Qualifications
- Bachelor’s degree in Computer Science
- Bachelor’s degree in Engineering
