Site Reliability Engineer
While we proactively help our customers understand active risks and shut them down, when all else fails, we are there for them financially and with services to help mitigate damage and come back stronger after an incident.
Help us protect the world against cyber risk and give business owners a trusted support system and fighting chance.
We have over 25,000 customers, ranging from small and mid-sized businesses to Fortune 500 companies. Founded in 2017, Coalition has raised $125M from a number of top tier global investment firms including Ribbit Capital, Greenoaks Capital, Valor Equity Partners, Felicis Ventures, and Vy Capital. Headquartered in San Francisco, Coalition’s team is distributed across more than 15 locations globally, including Austin, Washington DC, Denver, Canada and Portugal.
We are looking for a Site Reliability Engineer (Remote) who has the experience, ability, and mental fortitude to instrument and monitor the breadth of our full system stack (hosts, applications, and performance). In this role you will work closely with our engineering and information security teams to enhance the automated system provisioning and deployment subsystems within codified infrastructure. You will work with developers to create more robust and scalable services independent of cloud implementations. You will help to isolate, trap, and respond from the inevitability of system failure and develop strategies for continuous monitoring and analysis to reduce both downtime and required manual intervention.
3+ years of combined experience in SRE/DevOps roles in a full stack engineering environment
2+ years of experience in automated system provisioning, configuration, and Infrastructure as Code (Cloudformation, Terraform, Ansible, etc)
Demonstrate proficiency with containerization and orchestration tools such as Kubernetes, Swarm, ECS
Experience with CI/CD systems for example: Jenkins, Travis, or CircleCI
Demonstrate proficiency in Python, GO or other scripting and systems languages
Experience working with fault tolerance services and the iterative development of highly-available systems
Some experience with one or more Infrastructure as a Service cloud providers (AWS/Azure/DigitalOcean/Google Cloud)
Excellent organizational, verbal, and written communication skills
Bachelor’s or Master’s degree in Computer Science, related field, or equivalent experience
Skills considered as a good plus
Experience with converting monolithic applications to microservices and service discovery technology
Prior experience with full-stack monitoring from system level metrics to SLOs, failure-based testing approaches, and monitoring strategies
Understanding of networking, systems engineering and hardware, data center architecture
Exposure to systems security requirements and basic information assurance techniques
Exposure to Kafka, AMQP, Kinesis, job queue and other pub/sub queuing systems
Exposure to vulnerability scan results and reports
Exposure to information security domain and data breaches
Knowledge of Scrum & Agile Methodologies
Enjoy a highly fulfilling, mission-driven culture
Health, dental, and vision benefits for you and your family
Life insurance and disability benefits
Paid Parental Leave
Wellness and commuter benefits
Flexible working hours
Open vacation days
We embrace distributed work; some benefits will vary by location
You are an owner! We offer stock options to each of our employees