Site Reliability Engineer
at
Tecsys Inc.
Site Reliability Engineer
Location
Toronto, Ontario, Canada
Date Posted
December 23, 2024
**Job Posting: Site Reliability Engineer**
**About Us**
Tecsys is a fast-growing innovator offering supply chain solutions to industry-leading healthcare systems, hospitals, and pharmacy businesses to distributors, retailers, and 3PLs. We work with industry leaders to transform their supply chains through technology. If you thrive on tackling interesting challenges with continuous learning opportunities, then Tecsys could be a good fit for you!
**About The Role**
We are looking for a Site Reliability Engineer to work within our Network and Security Operations Center department. Our NOC team is focused on improving the reliability and uptime of our platform and applications in a data-driven way to support internal and external customers' needs.
**Your Responsibilities**
- Collaborate with other Engineering teams to support services before they go live through activities such as system design consulting, developing software platforms, capacity planning, and launch reviews.
- Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.
- Develop tools & automation on Azure & AWS to continuously reduce the need for manual intervention.
- Scale systems sustainably through automation and advocate for changes that improve reliability and velocity.
- Participate in on-call rotations and practice sustainable incident response and blameless postmortems.
- Implement automated solutions for continuous integration and delivery (CI/CD).
- Implement monitoring, logging, alerting, and SLA reporting.
- Create and maintain technical documentation.
- Apply SRE best practices and take command of high-severity incidents to facilitate their resolution.
- Support planning and deployment teams to enable stability, predictability, and scale in our continued growth.
- Collaborate with the Platform Engineering team and work cross-functionally with internal teams and vendors to manage growth globally, focusing on performance, availability, and reliability.
**Requirements**
- Bachelor’s degree in computer science or a related technical discipline.
- At least 5 years of systems engineering experience with demonstrable technical skills in new platform development, orchestration, product ownership, and iterative design and deployment.
- Experience designing and deploying large-scale systems, multi-vendor platforms, and globally distributed infrastructure.
- Strong knowledge of system design, high-performance computing, file, block, and storage technologies.
- High proficiency in executing projects with full stack automation.
- Ability to self-organize, collaborate, and manage efforts across diverse teams.
- Self-starter attitude with a willingness to ask questions and challenge the status quo.
- Familiarity with Datadog and Rapid7 Insight preferred.
- Experience with AWS or Azure required.
- Basic knowledge of Java or .Net-based development required.
- Knowledge of GitLab or Jenkins required.
- Experience with a SaaS company is preferred.
- Knowledge of FedRAMP compliance is a strong asset.
- Strong English communication skills, both written and spoken.
**Additional Requirements**
- Participate in an on-call escalation rotation.
- Occasional travel (quarterly offsites, conferences - less than 10%).
At Tecsys, we are committed to fostering a diverse and inclusive workplace where all employees feel valued, respected, and empowered. We welcome and encourage applicants from all backgrounds, experiences, and perspectives to join our team. Tecsys is an equal opportunity employer. Accommodation is available for applicants selected for an interview.
**Please Note:**
To apply for this position, you must be a Canadian Citizen, Permanent Resident of Canada, or have a valid Canadian work permit.