What you will do:
- Devise innovative ideas for solving hard technical problems involving distributed systems, scale and security and translate these ideas into designs and implementation; suggest novel approaches to problems and drive the team to continually think outside of the box
- Engage in and improve the whole lifecycle of services from inception and design, through deployment, operation, and refinement.
- Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity.
- Ensure critical system security through the use of best in class cloud security solutions.
- Work with R&D teams to plan a CI/CD process to deliver predictable builds of measurable quality, to characterize and optimize the overall system performance
Who are you?
- Experience with deploying, running and operating complex, high available and large-scale web applications on Amazon Web Services/Google Cloud Platform or similar.
- Good understanding of reliability engineering.
- Good understanding of monitoring, observability and analysis principles.
- Good understanding of scalable distributed systems and microservices architecture patterns.
- Deep understanding of Linux OS internals and troubleshooting.
- Experience with containerization and container orchestration.
- Good understanding of deployment strategies.
- Good understanding of security-related technologies and concepts.
- Experience in one or more of the following: Python, Go, Perl or Ruby.
- Experience with “infrastructure as code” tooling.
- Good understanding and experience with databases.