Job Description
Role Overview
Seeking a Site Reliability Engineer (SRE) with strong Azure Infrastructure expertise to support cloud, automation, and reliability engineering initiatives. Ideal for someone with a balance of systems engineering, automation, and operational reliability.
Key Responsibilities
- Design, build, and support scalable and reliable infrastructure solutions.
- Perform integration, validation, and performance testing.
- Troubleshoot and resolve infrastructure and system issues.
- Improve system reliability using SLIs, SLOs, SLAs, and automation.
- Integrate and optimize monitoring/observability tools.
- Participate in incident response and drive post-mortem improvements.
- Collaborate with cross-functional teams and maintain clear documentation.
Required Skills
- Azure Infrastructure (3 5 years).
- Linux (RHEL 7+).
- Terraform & Ansible.
- Scripting: Python, Go, or Bash.
- Networking fundamentals.
- DNS, LDAP, Kerberos, Centrify.
- NFS, SAN, NAS.
- Windows Server 2019+.
- Strong SRE principles and automation mindset.
- Excellent communication and teamwork skills.
Job Tags