28 day(s) ago

Senior/Staff Site Reliability Engineer (Golang, Kubernetes)

Negotiable Salary


United States
English: Advanced, Upper Intermediate, Native Speaker
Experience: 7+ years
Employment: Full-time

Fair.com

Santa Monica
Fair is a FinTech company that provides a new way to shop, get approved and pay for your next car—all on your phone. It gives customers the freedom to drive the car they want for as long as they want, and the flexibility to turn it in at any time. Fair is headquartered in Santa Monica, California.

Our name pretty much says it all. It's our culture. It's the way we treat our customers, our network of dealers and our fast-growing family of employees. We believe in hard work, and we believe hard work should be rewarded. That's why we offer equity incentives, 100% coverage of medical, vision and dental premiums for employees and their families, 100% paid parental leave for 4 months, cellphone reimbursement, 401(k) retirement plans and free lunch 5 days a week for every employee. The way we see it, better to be more than fair than not to live up to our name.

General overview of the project(s)

The Platform team provides the foundation that empowers Fair’s engineers to build incredible things. We are expanding our team team with a remote Senior Site Reliability Engineer position.

Reporting to the Platform Engineering Manager, you are customer-centric, and you have demonstrated expertise in multiple areas of software engineering but are passionate about building and operational excellence across our platform and delivering “5 Nines Availability”. This is a great opportunity for you if you have experience dealing with issues of scale, debugging low-level production problems, and improving the availability of systems.

Responsibilities

Oversee the site reliability and operation of our infrastructure and platform
Design and champion SRE best practices from idea conception to delivery
Maintain and evolve our Golang-based applications to provide great experience for our customers (other engineers in the organization)
Help grow the SRE wing of our platform team
Maximally automate processes to promote human-free operations
Improve metrics on quality of service, incidents and availability
Participate in our follow the sun on-call duties while focusing on reducing incidents and need for help with creative technological solutions or processes
Help troubleshoot and debug production issues across all of our services

Requirements

Bachelor's degree in computer science, applied mathematics or related field or 5 years of equivalent work experience
8+ years of relevant work experience
Relevant certifications in AWS, Kubernetes, Linux, database administration, networking, security, Six Sigma are welcomed
5+ years of experience in reliability engineering, software engineering, systems engineering, platform engineering, SRE, ops, or similar fields
Expert knowledge of Go
Expert knowledge of Linux
Deep systems, cloud and infrastructure knowledge
Experienced with Python, Ruby and Bash scripting languages
Familiarity with AWS, Docker, Kubernetes, and Terraform
Experience with integration and end-to-end testing in microservices environments
Experience with microservices, distributed logging and tracing
Experience managing monitoring systems
Experience building CI/CD pipelines
Familiarity with security concepts and best practices
Experience with computer networks

Benefits

100% coverage of medical, dental and vision benefits for employees AND their families
Equity incentives
Unlimited vacation package
Up to four months 100% paid parental leave
Cell Phone reimbursement
401(k)
Employee referrals rewards
Diverse and inclusive culture
Leadership, mentorship, and learning programs

Similar Jobs