Middle Site Reliability Engineer, Online Retailer

DataArt
Mid
Online interview
B2B
Wrocław Lublin

Project description

About the vacancy

Our client is one of the biggest online retailers worldwide with annual revenue of £1 billion. Over the years we helped the client develop web-portals, mobile apps, delivery control systems, staff management tools, data storage, and much more. The systems we’ve built together are in operation 24/7, contributing to the client’s success.

Site Reliability Engineering is a new role, first introduced by Google, that combines the skills of developers and ops to deliver more reliable, scalable software. The goal is to analyze a diverse set of applications (primarily built using Java, Oracle, AWS, Google Cloud services, and several other technologies) and bind them into a reliable self-healing suite, working within defined reliability requirements. This requires proactive work to ensure observability, analyze potential bottlenecks, and suggest their fixes before they become a production incident.

This position may be of interest to DevOps engineers who would like to get closer to the code or get valuable specialization with a focus on JVM stack. The position may also appeal to developers who are interested in how large scale systems operate and what happens to the code is compiled.

Responsibilities

  • Analyze and improve the availability, latency, performance, and efficiency of the applications
  • Proactive support of production applications (both in-office and out of hours) across a range of domains, these are mainly written in Java and use Oracle databases
  • Improve the monitoring and alerting of the applications
  • Capacity planning and provisioning
  • Improve and standardize build pipelines, identify and reduce any areas of manual toil through automation
  • Consult in areas of reliability and scalability for the development of new applications
  • Work together with teams in other departments to find solutions
  • Conduct periodic on-call duties

Who we're looking for?

Must have

  • Experience in analyzing and troubleshooting production systems
  • Experience with modern software development, preferably in Java
  • Deep Understanding of Linux and UNIX-based systems
  • Familiarity with Agile software development practices
  • Understanding of TDD principles
  • Solid knowledge of SQL and modern databases
  • Experience with CI/CD-systems
  • Experience with networking (TCP/UDP, ICMP, DNS, etc), OSI Layers, infrastructure services, and security
  • Experience with software monitoring and alerting systems
  • Good English communication and problem-solving skills

Would be a plus

  • Familiarity with cloud technologies
  • Experience with Docker and Kubernetes
  • Experience with NoSQL databases

Healthcare
  • Healthcare package
  • Healthcare package for families
Leisure package
  • Leisure package
  • Leisure package for families
Kitchen
  • Cold beverages
  • Hot beverages
  • Fruits
  • Snacks
Traning
  • Conferences
  • Trainings
Parking
  • Car parking
  • Bicycle parking
Other
  • Shower
  • Chill room
  • Integration events

Our company

DataArt

Wrocław, Lublin 3000+
Tech skills
  • JavaScript
  • .NET
  • Java

Check out similar job offers