Site Reliability Engineer (SRE) – Containerized Microservices
Aylo · Montréal
Description du poste
About the role
We are looking for a highly skilled Site Reliability Engineer to ensure the reliability, scalability, and performance of our production systems. You will work in a containerized, micro‑services environment, handling incident response, root‑cause analysis, and continuous improvement.
Key responsibilities
- Own reliability, availability, and performance of production systems in a Kubernetes‑based environment.
- Monitor system health with Grafana dashboards, alerts, and observability tools.
- Manage and operate Kubernetes clusters via Rancher, including deployments, scaling, and troubleshooting.
- Lead incident management using OpsGenie, participate in on‑call rotations, escalations, and post‑incident reviews.
- Troubleshoot across application, infrastructure, messaging, database, and container layers.
- Develop automation scripts and tools using Bash, Go, and/or Python to improve operational efficiency.
- Support and optimize CI/CD pipelines in GitLab, ensuring smooth deployments.
- Collaborate with development teams to enhance application reliability, performance, and observability.
- Monitor and resolve issues in MySQL and Redis databases.
- Support distributed messaging systems such as Kafka and RabbitMQ.
- Maintain operational documentation, runbooks, and knowledge bases using Jira and Confluence.
- Perform root‑cause analysis and implement preventative measures.
- Ensure compliance with security, data privacy, and regulatory standards.
- Leverage AI‑powered engineering tools to accelerate troubleshooting and documentation.
Required profile
- Minimum 3 years of experience in Site Reliability Engineering, DevOps, Production Support, or Systems Engineering.
- Bachelor’s degree in Computer Science or a related field.
- Hands‑on experience with Grafana, Kubernetes, Docker, and Rancher.
- Experience with OpsGenie for incident management and on‑call coordination.
- Strong background with GitLab/Git, including CI/CD pipeline creation and maintenance.
Required skills
- Grafana
- Kubernetes
- Docker
- Rancher
- OpsGenie
- GitLab
- Git
- CI/CD pipelines
- Bash
- Go
- Python
- MySQL
- Redis
- Kafka
- RabbitMQ
- Jira
- Confluence
Questions fréquentes
Pourquoi signalez-vous cette offre ?
Postulez en 30 secondes
Entrez votre email pour postuler. Un compte sera cree automatiquement.
En continuant, vous acceptez nos conditions d'utilisation.
Deja un compte ? Connexion
Publie il y a 1 heure
Expire dans 1 mois
1 vues · 0 interesses
Boostez vos chances
Importez votre CV : nous vous proposons les offres qui matchent votre profil.
Analyse de votre CV en cours...
Aylo
Montréal
Offres similaires
-
Adjoint exécutif – Support du CIO et des équipes IT
Gildan Montréal -
Cybersecurity Incident Response Lead
Crédit Agricole CIB Montréal -
Analyste en sécurité de l'information (Montréal/Québec)
Institut national de santé publique du Québec (INSPQ) Montréal -
Junior IT & Operations Support
Ratehub.ca Toronto -
IT Intern – Student Support Role
McAsphalt Industries Limited Scarborough