Specialize in developing scalable methods for building, deploying, and supporting cloud, on-prem and store focused enterprise services and systems. Work closely with Software Engineers to deploy and operate solutions; automate and streamline processes; build and maintain tools for deployment, monitoring of platform, and troubleshoot and resolve issues in all environments while guiding and mentoring other members on the team. Demonstrate the company s core values of respect, honesty, integrity, diversity, inclusion and safety.Skills and Experience:
Azure and GCP Implementation: Minimum 2 years of hands-on experience with Azure and GCP.
System Operations and Observability: Proficiency in system operations and observability practices.
CI/CD Pipeline Design: Strong background in designing and implementing CI/CD pipelines.
Application Deployment: Experience with releasing and deploying both mobile and web applications.
Software Development or SysAdmin/Networking: Familiarity with either software development or system administration/networking.
APM Tool Knowledge: Practical use of Application Performance Monitoring (APM) tools.
Large System Design: Experience with designing and implementing large-scale systems.
High-Level Individual Contributor: Proven track record of performing at a high level individually for 5+ years.
Existing Team Lead: Alternatively, at least 2 years of experience as an existing team lead.
Nice-to-Have: Knowledge of Singlestore, LaunchDarkly, and internal developer platforms (IDP).
Minimum
Bachelor's Degree Computer Science or equivalent related experience (12+ yrs) & strong theoretical fundamentals (data structures, algorithms, lock-free data structures, multi-threaded architectures)
Any experience with Nginx, HAProxy, Squid
3+ years of experience managing System Observability experience (ELK, Datadog, New Relic, Azure Monitor, Grafana)
3+ years of experience with technologies such as Kafka, RabbitMQ, SQS, Ansible, Terraform, Docker and Kubernetes, Jenkins, Spinnaker, Azure DevOps, TeamCity
Any experience with always-on and high-volume web server stack, Azure/GCP PaaS and Azure/Google networking, provisioning native Managed Apps & CI/CD pipelines
6+ years of experience in the cloud SRE/DevOps/Infrastructure
Proven knowledge of technology to support omnichannel experiences
Solid Understanding of SSH, VPN, TCP/IP, DNS, HTTP(S), network routing and subnets
Fluent in Shell Scripting with experience implementing automation and monitoring using shell scripting and other related tools
Knowledge of Linux architecture, security, administration, performance monitoring/tuning, troubleshooting, and production operations
Proven knowledge of service-oriented architecture/Cloud
Desired
Master's Degree computer science, information systems, or related technical field
3+ years of experience configuring and managing cloud infrastructure (AWS, GCP, Azure)
6+ years of experience in designing/working in high volume eCommerce applications
Advocate Industry Best Practices: Stay at the forefront of DevSecOps thinking and implementation, both internally and externally. Be the driving force behind adopting cutting-edge practices.
Cloud Administration and Engineering: Manage Azure and GCP environments, ensuring smooth operations, maintenance, and engineering tasks.
Development Standards: Develop, document, and enforce development standards across cloud platforms (Azure, GCP, and On-Premises).
Self-Service Tools: Design self-service tools for observability, monitoring, and alerting.
High-Level Representation: Represent the DevOps domain at a high level, collaborating with cloud enablement, networking, and other relevant teams.
Solution Design and Review: Collaborate with product and engineering teams to design and review solutions. Break down tasks and optimize development and release processes.
Production Environment Changes: Apply broad changes in the production environment, coordinating with development team releases.
Incident Escalation: Serve as an escalation point for major incidents in production.
Subject Matter Expertise: Be an expert in both technical and business domains.
Design and build infrastructure & systems that provide high levels of scalability, reliability, and performance for Kroger s stack, while balancing security, maintainability, reliability and operational excellence
Work with the engineering team to continuously implement and improve reliable and speedy build environments for DEV & QA; provide timely build status updates; automate as much as possible to improve efficiency and quality
Ensure traceability, observability, and retrievability of system behavior
Build logging, monitoring, and alerting systems to identify bottlenecks and assist with debugging, analysis, and optimization in cloud, on-prem & store environments
Improve operational efficiency through automation and deployment or development of new tools
Experiment with and recommend new technologies that simplify or improve Kroger s stack
Craft solid and clearly explained designs, playbooks, and documentation, for consumption by teammates and the larger engineering organization
Determine methods and procedures on new assignments and may coordinate activities of other personnel
Participate in an off-hours on-call rotation, and perform periodic off-hours work during maintenance windows
Must be able to perform the essential job functions of this position with or without reasonable accommodation