Site Reliability Engineer (SRE) – Containerized Microservices
Aylo · Montréal
Job description
About the role
We are looking for a skilled Site Reliability Engineer to ensure the reliability, scalability, and performance of our production systems built on containerized micro‑services. You will join a dynamic, international team that values diversity, inclusion, and innovative problem‑solving.
Key responsibilities
- Own reliability, availability, and performance of production systems in a Kubernetes‑based environment.
- Monitor system health with Grafana dashboards, alerts, and observability tools.
- Manage and operate Kubernetes clusters via Rancher, handling deployments, scaling, and troubleshooting.
- Lead incident management using OpsGenie, including on‑call rotations and post‑incident reviews.
- Troubleshoot across application, infrastructure, messaging, database, and container layers.
- Develop automation scripts with Bash, Go, and Python to improve operational efficiency.
- Support and optimize CI/CD pipelines in GitLab for smooth releases.
- Collaborate with development teams to enhance application reliability and observability.
- Monitor and resolve performance issues in MySQL, Redis, Kafka, and RabbitMQ.
- Maintain operational documentation, runbooks, and knowledge bases in Jira and Confluence.
- Perform root‑cause analysis and implement preventative measures while ensuring security and compliance.
- Leverage AI‑powered engineering tools to accelerate troubleshooting and documentation.
Required profile
- 3+ years of experience in Site Reliability Engineering, DevOps, Production Support, or Systems Engineering.
- Bachelor’s degree in Computer Science or a related field.
- Hands‑on experience with Grafana, Kubernetes, Docker, and Rancher.
- Proven incident‑management experience using OpsGenie.
- Strong background with GitLab/Git, CI/CD pipelines, and release processes.
Required skills
- Grafana
- Kubernetes
- Docker
- Rancher
- OpsGenie
- Bash
- Go
- Python
- GitLab
- Git
- CI/CD pipelines
- MySQL
- Redis
- Kafka
- RabbitMQ
- Jira
- Confluence
What we offer
- Hybrid work environment with flexibility for remote collaboration.
- Opportunity to work on cutting‑edge AI‑assisted tooling.
- Collaborative international team across Montreal, Austin, and Nicosia.
Questions fréquentes
Why are you reporting this job?
Apply in 30 seconds
Enter your email to apply. An account will be created automatically.
By continuing, you accept our terms of use.
Already have an account? Login
Published 1 day ago
Expires 1 month from now
5 views · 0 applications
Boost your chances
Upload your CV — we will match you with relevant openings.
Analyzing your CV...
Aylo
Montréal
Related job offers
-
Architecte de solutions JD Edwards
StrategieInfo Montréal -
Stratège IA – Pilotage de projets IA
Moov AI Montréal -
Conseiller(ère) Certinia PSA – Implémentation Salesforce
Deloitte Montréal -
Project Manager (Remote)
Crossing Hurdles Canada -
Platform Engineer – Azure & Databricks Developer
Banque Scotia Toronto