Site Reliability Engineering Consultant

Category: Cyber Security
Main location: Canada, Nova Scotia, Halifax
Alternate Location(s): Canada, New Brunswick,
Canada, Newfoundland and Labrador,
Canada, P. E. I.,
Position ID: J0324-1056
Employment Type: Full Time

Position Description:

As a Site Reliability Engineer (SRE), you will play a critical role in ensuring the reliability, performance, and availability of our systems. Your expertise in managing infrastructure, automating processes, and implementing best practices will contribute to seamless operations. You’ll collaborate with cross-functional teams, aligning development and operations around shared goals.

Your future duties and responsibilities:

Infrastructure Management:
• Run and maintain infrastructure using tools like Terraform, Kubernetes, and Helm.
• Deploy and manage services on Google Cloud Platform (GCP).
• Implement best SRE practices, document improvements, and drive infrastructure enhancements.
Monitoring and Alerting:
• Improve monitoring and alerting systems to detect incidents promptly and reduce false positives.
• Monitor service health using SRE principles, including defining Service Level Indicators (SLIs), Service Level Objectives (SLOs), and tracking error budgets.
Tech Stack:
• Programming Languages: Proficiency in scripting languages such as Python.
• Big Data Technologies: Familiarity with Apache Hadoop, Spark, and Kafka.
• Database Management: Experience with both relational databases (e.g., PostgreSQL, MySQL, Oracle) and NoSQL databases (e.g., MongoDB, Cassandra).
• Cloud Platforms:
• Google Cloud Platform (GCP):
- Deploy scalable, fault-tolerant systems using GCP services (e.g., Google App Engine, Kubernetes Engine, Compute Engine).
- Set up and manage virtual machines, containers, and serverless functions.
- Utilize managed databases (Cloud SQL, Firestore) for data storage.
- Implement authentication and authorization using GCP Identity and Access Management (IAM).
- Monitor application performance with GCP Stackdriver.
Collaboration:
• Work closely with development teams to align on shared goals.
• Troubleshoot incidents using built-in integrations with tools like Cloud Build.
• Provide real-time observability across logs, metrics, and events.

Required qualifications to be successful in this role:

Education: Bachelor’s or Master’s degree in Computer Science, Information Technology, or related fields.
Experience:
• Minimum 3-5 years of experience in SRE or related roles.
• Strong IT knowledge and skills.
• Familiarity with data analysis, CI/CD pipelines, and database management.
• Exposure to cloud services (GCP) is highly desirable.
Desired Skills:
• Automation: Ability to automate end-to-end processes.
• Security: Understanding of data security best practices.
• Troubleshooting: Proficiency in incident resolution and troubleshooting.
• Communication: Effective communication and collaboration skills.

#LI-NB5

Skills:

  • Systems Engineering
  • Continuous Integration

What you can expect from us:

Together, as owners, let’s turn meaningful insights into action.

Life at CGI is rooted in ownership, teamwork, respect and belonging. Here, you’ll reach your full potential because…

You are invited to be an owner from day 1 as we work together to bring our Dream to life. That’s why we call ourselves CGI Partners rather than employees. We benefit from our collective success and actively shape our company’s strategy and direction.

Your work creates value. You’ll develop innovative solutions and build relationships with teammates and clients while accessing global capabilities to scale your ideas, embrace new opportunities, and benefit from expansive industry and technology expertise.

You’ll shape your career by joining a company built to grow and last. You’ll be supported by leaders who care about your health and well-being and provide you with opportunities to deepen your skills and broaden your horizons.

Come join our team—one of the largest IT and business consulting services firms in the world.