Site Reliability Engineer
Satisfy your curiosity by digging deep into distributed systems to fully understand how they work and how to improve their resiliency and reliability.
Troubleshoot, firefight, and stabilize service incidents when they occur.
Perform incident analysis to gather findings and identify follow-up actions that lead to more reliable products.
Help implement reliability proven practices like circuit breakers, retries, caching strategies by partnering with teams or coding it up yourself and submitting a pull request.
Experience in a Software, Infrastructure, Systems, and/or Site Reliability Engineering role
A successful track record of troubleshooting distributed systems during service incidents while remaining level-headed
Knowledge of Kubernetes, NGINX, and networking
Experience in a software development environment
A strong curiosity for the unknown and not stopping until you have a solid understanding
An understanding of what makes up the incident lifecycle