To further drive our vision of premier stability and rapid feature delivery, we are looking for a Senior Site Reliability Engineer to join our team. As a Senior SRE, you should feel exceptionally comfortable bringing architectural design proposals to the table for consideration among your colleagues on our platform and infrastructure development teams. You will be one of the principal technical designers helping push our cloud-native platform toward the future. You will be responsible for driving the implementation of flexible cloud architectures with an automation-first emphasis; manual user intervention likely makes you uneasy and maybe even a little twitchy. We would expect a successful candidate for this position to be a self-starter with the ability to complete tasks independently. Though you will have access to technical leadership and senior engineers at your disposal, you should feel well acquainted with tackling complex problems without significant oversight.
Observability is everything and you know this all too well. If you can't measure it, you can't prove it works; if you can't prove it works, it must be assumed it doesn't work. This is a philosophy you love (and maybe obsess over). If you can't observe how a new feature is behaving, you feel excited to dive into the application code and make the necessary improvements yourself. You have honed your craft to be proficient at integrating multiple technologies together to form a single, coherent view of platform health.
Security is not an afterthought and must be part of every design decision, from prototype to production. You have an adept understanding of cloud and microservice security best practices, and you're not afraid to call out and propose solutions for security policy violations as you see them.
When challenged with building out a new feature in the infrastructure, you are confident in your designs, ready to defend them in a room with many other senior technical minds. You also recognize that the best designs come from collaboration, not dictation, and are willing to bring proposals to the table with an expectation that there will likely be collaborative changes to your initial design.
This position will require you to carry a company paid mobile device and participate in 24/7 on-call rotations alongside your engineering colleagues.
· Collaborate with our team of DevOps engineers, providing technical guidance and helping establish best practices
· Drive the implementation of modern observability philosophies throughout the stack
· Propose forward-thinking technical designs for our cloud platform, with an emphasis on security and reliability
· Work alongside technical leadership to organize technical roadmaps into achievable work
· Lead observability instrumentation of services throughout the stack
· Be a championing voice for the security of resource access and implementation
· Design and write tooling that aids developers in build management and rapid deployment
· Participate in after hours on-call support rotations
· Minimum of 5 years extensive hands-on experience in a wide variety of AWS technologies; multi-cloud experience is preferred
· Minimum of 4 years experience with containers and infrastructure as code, preferably Docker and Terraform.
· Minimum of 4 years experience in disciplined software engineering with a focus on design, development, and implementation of highly-scalable/available applications
· Deep understanding of observability stack management (monitoring, alerting, structured logging, APM, etc.)
· Extensive professional experience in one or more of the following languages: Go, Python, Ruby
· Intimate understanding in developing and supporting production systems built on cloud services, using high-availability best practices
· Hands-on experience developing and maintaining CI/CD pipelines, preferably in git/GitLab
· Deep understanding of RESTful and Websocket based APIs
· Bachelor's degree in computer science, related field, or equivalent training and professional experience
· Excellent teamwork skills, flexibility, and ability to handle multiple tasks
· Comfortable communicator, able to clearly detail designs and implementations on an individual level and in large group settings
Bonus Points for
· Familiarity with a variety of monitoring stacks (Influxdb, Grafana, Solarwinds, Netflow, syslog, Wireshark)
· Familiarity with HashiCorp Consul
· Familiarity with Datadog
· Familiarity with Perforce
· Familiarity with Atlassian products (OpsGenie, Bamboo, JIRA, Confluence)
· Experience working with developers in an agile environment
· Experience in the games industry, preferably launching multiple online-enabled AAAs
· Knowledge about Gearbox-owned IPs
Your application was submitted successfully.