Site Reliability Engineer will play a critical role in driving innovation and growth for the Banking Solutions, Payments and Capital Markets business. In this role, the candidate will have the opportunity to make a lasting impact on the company's transformation journey, drive customer-centric innovation and automation, and position the organization as a leader in the competitive banking, payments, and investment landscape.

Lead the design and evolution of observability, monitoring, and alerting systems to ensure end-to-end visibility and proactive issue detection.
Implement scalable automation frameworks for infrastructure provisioning, deployment pipelines, and operational tasks.
Ensure application reliability, availability, and performance, minimizing downtime and optimizing response times
Own incident management processes, including high-severity incident response, root cause analysis, and continuous improvement initiatives.
Mentor and guide colleagues, fostering a culture of ownership, resilience, and operational excellence.
Collaborate with architecture, security, and product leadership to align reliability goals with business objectives.
Lead capacity planning and performance optimization efforts across distributed systems and cloud-native environments.
Champion disaster recovery and business continuity planning, ensuring readiness for large-scale events.
Participate in on-call rotations and provide 24/7 support for critical incidents.

What You Bring:

10+ Yrs of proven experience in a Lead SRE/DevOps/Infrastructure Engineering role within complex, high-availability environments.
Deep expertise in cloud platforms (AWS, Azure, or GCP) and Infrastructure as Code (Terraform, CloudFormation, etc.).
Proven Expertise setting up SLOs and SLIs and use of Error Budgets.
Experience with Containerizing “monolithic” legacy apps and database migration.
Strong background in monitoring tools (Prometheus, Grafana, DataDog) and logging frameworks (Splunk, ELK Stack).
Advanced proficiency in scripting and automation (Python, Bash, Ansible).
Hands-on experience with CI/CD pipelines (Jenkins, GitLab CI/CD, Azure DevOps).
Demonstrated leadership in incident response and post-mortem culture.
Ability to take Incident command and lead a Severity 1 call with a calm, data-driven approach.
Bring a Change – Agent mindset with the ability to influence cross-functional teams and drive change at scale.
Excellent communication, negotiation, and stakeholder management skills.

What we offer you:

A work environment built on collaboration, flexibility and respect.
Competitive salary and attractive range of benefits designed to help support your lifestyle and wellbeing.
Varied and challenging work to help you grow your technical skillset.

Privacy Statement

FIS is committed to protecting the privacy and security of all personal information that we process in order to provide services to our clients. For specific information on how FIS protects personal information online, please see the Online Privacy Notice.

Apply now

See more open positions at Railz

Privacy policy Cookie policy