Site Reliability Engineer (SRE)
We’re revolutionizing the way humanity eats, and there’s a lot of room for optimization and growth. That’s where you come in. Your ingenuity will help us continue to drive innovation, making an impact on the reliability, performance, and scalability of Skip’s industry-leading technology.
As a Site Reliability Engineering (SRE), you will be a contributor to the critical team that ensures the stability, recoverability and observability of our platform. As part of those responsibilities, you will also assist working on follow-up activities around recovering our platform from occasional major incidents, pinpointing potential areas to the Engineering group of improvement to prevent such incidents from happening or reducing their overall impact. Our team provides you with a considerable amount of autonomy and a unique opportunity for technical development.
What you will be doing:
- Tool stability: provide support to the tools owned by SRE and used by Engineering and other operational teams
- Infrastructure stability: work on initiatives to ensure that the group is implementing the plan to keep the platform stable and scalable
- Automate and eliminate toil: look into manual tasks or errors that are being addressed by Operations teams and optimize the platform to reduce those
- Production Incidents: investigate options to address high impacting/urgent incidents
- Platform monitoring, alerts and dashboards: implement, maintain and monitor the tools, and its outputs, that enable our platform's observability
- Help drive infrastructure mapping and dependability/risk assessments
- Embodiment of Skip’s values, culture, and dedication
- Support to the SRE team to bridge their tools, understanding, and experience into the product teams
- Assignments of implementation of the technical roadmap of the overall platform, with a focus on the SRE-related initiatives
- Strong knowledge of AWS services and how to leverage them at scale
- Experience implementing monitoring tools, baselining and maintaining them
- Knowledge of Infrastructure-as-a-Code driven deployments, preferably Terraform
- Practical experience with git and git workflow
- Understanding of how a cloud-based Disaster Recovery environment operates, and the ability to enable and provide maintainability to it
- Software-centric mindset, preferably with some application development background
- Agility mindset: we deliver often, and we deliver fast
- Proven aptitude for working with scaled up and cloud solutions
- Problem-solving mindset, seeing them as an opportunity for improvement
- Experience implementing automation is desirable
What It's Like To Work At Skip:
Picture this: you, dressed in your fave casual attire, amongst a team of friendly and passionate colleagues. You feel pride knowing your input and uniqueness is not only embraced, but make an impact on a major Canadian company, and its satisfied customers. As the company grows, so do you — you meet and surpass new challenges every day.
That’s just a small taste of what it’s like to work at one of Canada’s leading tech companies. If you’re hungry for opportunity, growth, and something meaningful in a dynamic, yet casual environment, we’d love to hear from you.
Note: All employees will be asked to sign a Consent for Disclosure of Personal Information in order to complete a background check. Job offers will be conditional upon results that the Company determines to be satisfactory.