Senior Site Reliability Engineer

New
Senior Site Reliability Engineer
Job Title:
Senior Site Reliability Engineer
Salary:
0
Location:

Travel Requirements
No travel
Educational Specialization
Computer Science
Work Options
On-site
Company Size
500+ employees
Experience Level
Senior-Level
Educational Level
Bachelor's degree
Skills
Site Reliability Engineering, Kubernetes, AWS/GCP/Azure, Monitoring & Alerting, Incident Management, Automation, Distributed Systems, Problem-Solving
Job Type
Full-time

Senior Site Reliability Engineer Moniepoint Inc. • Lagos, Nigeria • via MyJobMag • We are seeking an experienced Site Reliability Engineer SRE responsible for ensuring our systems run smoothly and efficiently while engineering solutions to improve visibility, eliminate repetitive tasks, and increase system resilience.

The ideal candidate will balance real-time on-call responsibilities with strategic engineering work to achieve sustainable and scalable service reliability.

  • Participate in on-call rotations as the primary technical lead for detecting, triaging, and resolving service degradation, outages, or reliability issues across all environments.
  • Act as the Incident Commander during major incidents: initiating war room or bridge calls, coordinating cross-functional teams, providing timely and clear status updates to all stakeholders and leading/documenting blameless Root Cause Analyses RCAs to identify the root causes of issues and drive long-term fixes.
  • Develop automation to eliminate manual and repetitive operational tasks toil related to reliability and operations across both applications and infrastructure to improve efficiency and system resilience.
  • Create and maintain monitoring dashboards and alerts to monitor application and infrastructure health.
  • Participate in feature development discussions to ensure services are built with observability from the ground up.
  • Define and track Service Level Indicators SLIs and Service Level Objectives SLOs in collaboration with Product and Engineering teams.
  • Investigate and resolve customer complaints escalated beyond L1 and L2 support, especially those involving performance, reliability, or complex system behavior.

Minimum of 4 years of experience supporting enterprise applications in an SRE or similar role.

  • Knowledge of distributed systems, microservices architecture and software design patterns.
  • Experience with cloud platforms such as AWS, GCP, or Azure.
  • Strong knowledge of Kubernetes and container orchestration tools.
  • Experience using application performance monitoring tools, OpenTelemetry, and observability platforms such as New Relic, Datadog, ELK, or SigNoz
  • Excellent problem-solving and troubleshooting skills as an on-call engineer, with the ability to resolve complex infrastructure and application issues.
  • Proficient in setting up and maintaining monitoring dashboards and alerts using Grafana and Prometheus.

See web results for Moniepoint Inc.

Disclaimer: This job description has been formatted by AI for readability. Please verify all details with the employer before applying.

Developer tools are disabled.

You can copy content with CNTR + C or CMD + C