Site Reliability Engineer
ref nr: 368/2/2026/WM/94614
In Antal we have been dealing with recruitment for over 20 years. Thanks to the fact that we operate in 10 specialised divisions, we have an excellent orientation in current industry trends. We precisely determine the specific nature of the job, classifying key skills and necessary qualifications. Our mission is not only to find a candidate whose competences fit the requirements of the given job advertisement, but first and foremost a position which meets the candidate’s expectations. Employment agency registration number: 496.
Site Reliability Engineer
📄 Contract role
🏢 Department: Market Securities & Services
For our Client – a leading international financial institution and one of the largest investment banks globally – we are looking for a Site Reliability Engineer to join the Market Securities & Services IT division.
The role sits within the Counterparty Credit Risk (CCR) Technology team, responsible for delivering critical risk calculation platforms used globally. The team is currently building the next generation of Counterparty Credit Risk Engines, including cloud migration and development of in-house analytical libraries to replace vendor solutions.
This is a unique opportunity to join a growing engineering team in Kraków and contribute to a strategic, multi-year transformation programme.
About the Team & Technology Landscape
The new CCR platform is based on microservices architecture and leverages modern open-source technologies. It runs across Google Cloud Platform and on-premise infrastructure.
Technologies include:
Java SE, Spring Boot, Spring Cloud, Apache Beam, Apache Flink, GCP, Redis, REST APIs, Ansible, Jenkins.
The organisation is heavily investing in Agile ways of working, DevOps practices, CI/CD pipelines, and Cloud technologies.
Your Responsibilities
-
Manage application support operations with focus on resiliency, availability and performance
-
Coordinate production incident resolution and conduct post-mortems / root cause analysis
-
Investigate and resolve complex production issues across distributed systems
-
Contribute to continuous service improvement and knowledge base documentation
-
Actively engage in Incident, Problem and Service Management processes
-
Apply SRE principles to enhance reliability, scalability and observability
-
Develop and improve monitoring, alerting and incident detection mechanisms
-
Support hybrid cloud environments and automation initiatives
-
Work in a 2-shift rotation (8:00 AM start / 4:00 PM start)
-
Participate in weekend and on-call rotations
What We Are Looking For
-
4+ years of experience supporting and/or developing distributed systems (Java-based environments)
-
Strong troubleshooting and analytical skills
-
Experience with disaster recovery processes
-
Hands-on experience with application lifecycle and CI/CD tooling (JIRA, Confluence, Jenkins, Ansible)
-
Experience supporting complex, cross-platform systems (Java / Python environments)
-
Knowledge of Agile/Kanban delivery models
-
Experience implementing monitoring and logging frameworks (e.g. Grafana, InfluxDB, Prometheus, Splunk, Loki or similar)
-
Basic knowledge of relational databases (Oracle, PostgreSQL)
-
Understanding of cloud platforms (preferably GCP)
-
Familiarity with Unix/Linux environments
-
Ability to lead technical discussions with global support teams
-
Strong communication skills and ability to work across regions
Technical Requirements
-
Core Java knowledge
-
Application support experience
-
Monitoring tools (Grafana, InfluxDB, Prometheus or similar)
-
Basic cloud knowledge (GCP preferred)
-
Automation tools (Jenkins, Ansible)
-
Knowledge of relational databases (Oracle, PostgreSQL)
Why Apply?
-
Opportunity to work on business-critical global risk platforms
-
Participation in a large-scale cloud and architecture transformation
-
Modern technology stack and DevOps culture
-
Hybrid working model (2 days per week in Kraków office)
-
Long-term project within a stable, global financial environment