Description
Our Client, a Global Leading Top Fortune Company is offering a unique opportunity to be part of a Global Shared Services in Puerto Rico. This unique start-up offers a lifetime professional opportunity to obtain Global experience and exposure working from Puerto Rico. Be part of the Brain and Heart of this Multinational operation’s mission of offering solutions through edge technology to their worldwide client base. Contact Careers to become part of an exciting and attractive Global Company without the need to relocate out of Puerto Rico.
The Director of Site Reliability Engineering (SRE) Shared Services provides executive-level leadership responsible for leading the strategy, operations, and technical excellence of enterprise data infrastructure. This role is responsible for ensuring highly available, secure, and resilient platforms that support critical data services. The successful candidate will build and mentor a multidisciplinary team, drive operational improvements, and implement automation practices that embed reliability into engineering workflows.
Responsibilities:
- Define and execute the vision for SRE and infrastructure operations, ensuring alignment with organizational objectives.
- Lead a team of engineers and specialists focused on compute, network, cloud, security, observability, and disaster recovery.
- Establish and enforce best practices for high availability, performance, and operational resilience across hybrid environments.
- Oversee cloud and on-premises infrastructure, integrating Kubernetes orchestration, monitoring, and automation tools.
- Implement Infrastructure-as-Code, CI/CD pipelines, and automated workflows to improve service reliability.
- Manage identity and access governance, information security operations, and vulnerability management programs.
- Drive observability and telemetry initiatives to proactively detect, prevent, and resolve incidents.
- Collaborate with cross-functional teams to ensure seamless delivery of critical data services.
- Foster a culture of reliability, accountability, and continuous improvement across engineering and operations teams.
- Oversee security operations center ensuring threat detection and security.
- Ensure audit reediness and compliance with company’s certifications.
Requirements
- Bachelor’s or Master’s degree in Computer Science, Information Systems, or related technical field.
- 10+ years of IT operations or infrastructure experience,
- 5+ years leading SRE or reliability-focused teams.
- Hands-on experience with hybrid cloud environments (Azure, AWS, GCP) and production Kubernetes deployments.
- Proficiency in Linux, Windows Server, or Unix systems, networking, and cloud-native infrastructure.
- Experience with Infrastructure-as-Code (Terraform, Helm, Ansible, ArgoCD) and CI/CD pipelines.
- Deep knowledge of identity governance, including IAM, RBAC, and ABAC.
- Strong expertise in observability and monitoring tools (Grafana, Prometheus, Datadog, ELK, Azure Monitor).
- Proven experience leading security operations, incident response, and vulnerability management.
- English proficiency required
- Preferred Certifications: Certified Kubernetes Administrator (CKA) or Security Specialist (CKS), AWS DevOps Engineer, Azure Solutions Architect Expert, CISSP or CISM, ITIL 4 Foundation.
Careers Inc. job postings are legally privileged and may not be copied, reproduced, displayed, modified, transmitted, used for misrepresentation, and/or distributed through any website, social media, network, database, platform, or related. Failure to comply will result in legal action.
Connecting to LinkedIn ...