Intermediate Site Reliability Engineer, Environment Automation

GitLab is an open core software company that develops the most comprehensive AI-powered DevSecOps Platform, used by more than 100,000 organizations. Our mission is to enable everyone to contribute to and co-create the software that powers our world.

As a Site Reliability Engineer at GitLab, you are responsible for keeping all user-facing services and other GitLab production systems running smoothly. Our SREs are a blend of pragmatic operators and software craftspeople that apply sound engineering principles, operational discipline, and mature automation to our operating environments and the GitLab codebase.

What you’ll do
  • Automate operational tasks for Environment Automation SRE.
  • Develop a good warning system for reliable maintenance tasks.
  • Plan monitoring and alerting systems based on customer usage patterns.
  • Respond to user emergencies and support requests.
  • Implement new security measures for GitLab infrastructure.
  • Collaborate with engineering stakeholders to resolve architectural bottlenecks.
What you’ll bring
  • Experience in running and operating production workloads.
  • Strong programming skills - preferably with Ruby and/or Go.
  • Strong background with Infrastructure as Code technologies like Terraform and Ansible.
  • Able to reason about large systems and their operations.
  • Enjoy working collaboratively across teams.