SR. Service Reliability Manager
The Service Reliability Management (SRM) team operates globally across all platforms and markets. You’ll report into the team Tech Manager and work alongside the existing mixed-seniority team.
You’ll be working with a web-scale architecture with super high per minute transactions that need to be highly scalable and resilient. We also push a high volume of change approaching nearly 2000 changes per week to our platforms.
We want to increase the speed at which we can deliver change, yet doing so within the guardrails of what we define as commercial availability, utilizing, and building data to enable us to make better decisions. With those points at hand, if you’re reaching for the Google SRE book instead of your ITIL volumes, you could be what we are looking for….
- Basic analytical and data analysis skills using Microsoft Excel or Google Sheets
- Experience in using:
- Agile development processes (primarily scrum and kanban)
- AWS Cloud Services
- Relational and non-relational databases
- Good understanding of:
- Service-oriented architectures and Microservices
- Service Reliability Engineering
- Continuous Integration / Deployment
- Caches and caching strategies
- Payment Service Providers
- PCI-DSS and GDPR
- Contact Centre operations
- Be part of the Major Incident Team, which for us means managing senior business & tech stakeholders during the incidents, driving/owning activities the next day(s) until we are safe, facilitating & chairing tech wide Post-Mortems where you will get to deep dive across existing processes and tech solutions to identify areas needing improvement, and finally closing out in a business report.
- We own; Availability Management, Major Incident Management, the Incident process, producing impactful operational Management Info and making sure engineering staff are trained in our global operational processes via OnCall boot camps. We are constantly identifying incremental opportunities, and driving them across these processes.
- Some of those changes require working with and influencing other teams or individuals, including senior tech leadership.
- Understanding and creating useful Management Information (MI) that helps us build a picture of how reliable we are, and being the “point of truth” for Incident Impact understanding
- Either using that MI to influence behavior or use it to spot trends and call out areas in need of extra love and effort to reduce risk.
- To improve all of that, we are aiming to embed SLI/SLOs across the entire tech stack to provide better guardrails and insights than our current data sources.
- A focus on the Canadian market operation, but heavily involved in all parts of the global business
What It’s Like To Work At Skip
Picture this: you, dressed in your fave casual attire, amongst a team of friendly and passionate colleagues. You feel pride knowing your input and uniqueness is not only embraced, but makes an impact on a major Canadian company, and its satisfied customers. As the company grows, so do you — you meet and surpass new challenges every day.
Those mentioned above are just a taste of what it’s like to work at one of Canada’s leading tech companies. If you’re hungry for opportunity, growth, and something meaningful in a dynamic, yet casual environment, we’d love to hear from you.
Note: All employees will be asked to sign a Consent for Disclosure of Personal Information in order to complete a background check. Job offers will be conditional upon results that the Company determines to be satisfactory.