DeVry University strives to close our society’s opportunity gap and address emerging talent needs by preparing learners to thrive in careers shaped by continuous technological change. Through innovative programs, relevant partnerships, and exceptional care, we empower students to meaningfully improve their lives, communities, and workplaces.
Our colleague experience is an area of obsessive focus. At DeVry University, we care about you. Because, only through you can we deliver our unique Care Formula to our learners and partners.
Opportunity
The Senior IT Reliability Analyst is responsible for improving the overall reliability and resilience of university technology to ensure minimal disruption to the University. The Senior IT Reliability Analyst will drive continual improvement in operational maturity as measured by how prepared the university is to detect, triage, mobilize, respond, and resolve outages or system failures. The ISenior IT Reliability Analyst will focus especially on deploying automation to reduce manual effort and prevent operations incidents.
Responsibilities:
Drive continuous improvement in software quality and infrastructure reliability and resilience.
Perform analytics on previous incidents to understand root causes and better predict and prevent future issues.
Create dashboards and reports to communicate key metrics.
Deploy technology to improve performance, scalability, and stability of systems.
Track performance against SLOs in partnership with monitoring teams or other stakeholders, and ensure systems continue to meet SLOs over time.
Remain current with site reliability engineering methods and trends such as observability-driven development and chaos engineering.
May oversee, design, implement, and manage DevOps capabilities using continuous integration/continuous delivery toolsets and automation.
Collaborate with development teams to promote the concept of reliability engineering during all phases of the software development lifecycle to detect and correct performance issues and meet availability goals.
Deliver software to automate manual operational work (i.e., “toil”).
Work with stakeholders such as product owners to define service level objectives (SLOs) for system operations such as mean time to detect (MTTD), mean time to triage (MTTT), mean time to mobile (MTTM), mean time to acknowledge (MTTA), and mean time to resolve (MTTR).
Participate in operational support, including major incidents (MI), and on-call rotation shifts for supported systems and products.
Conduct blameless postmortems to troubleshoot priority incidents.
Use automation to reduce the probability and/or impact of problem recurrence.
Identify, evaluate, and recommend monitoring tools and diagnostic techniques to improve system observability.
Participate in system design consulting, platform management, capacity planning and launch reviews.
Collaborate and share lessons learned regarding performance and reliability issues with all stakeholders including developers, other SMEs, operations teams, and project management teams.
Participate in communities of practice to share knowledge and foster continuous improvement.
Remain current with site reliability engineering methods and trends such as observability-driven development and chaos engineering.
Qualifications
Bachelor’s degree (or equivalent years of experience).
Minimum 5 years IT experience with 3+ years of relevant work experience. SRE (Site Reliability Engineering) experience preferred.
Prior experience in a corporate IT environment.
Experience with incident and response management.
Strong problem solving and analytical skill and strong interpersonal and written and verbal communication skills.
Highly adaptable to changing circumstances. Interest in continuously learning new skills and technologies.
Experience with IT Service Management (ITSM) tools (e.g. ServiceNow, PagerDuty).
Experience with IT enterprise architecture tools (e.g. LeanIX)
Experience with working in cloud ecosystems (e.g. AWS, Microsoft Azure).
Experience with monitoring and observability tools (e.g. Splunk, Nagios, SmartBear).
Experience with programming and scripting languages (e.g. Java, C#, C++, Python, Bash, PowerShell).
Experience with Agile and DevOps development methodologies.
Experience with container technologies and supporting tools.
Experience with configuration management systems (e.g. Puppet, Ansible, Chef, Salt, Terraform).
Experience working with continuous integration/continuous deployment tools (e.g. Git, Jenkin,).
DeVry University offers competitive wages and benefit options, including:
401(k) and Roth Plan w/match
Medical, Dental and Vision Coverage
Paid Parental Leave
Health Advocacy Service
Family and Domestic Partner Coverage
Tax Savings Account (FSA and HSA)
Short-Term/Long-Term Disability Coverage
Life, Accident, AD&D, Critical Illness Insurance
Fertility Coverage
Wellness Programs
Volunteer Time Off
Remote and Flex Work Options
Technology Stipend
Paid Tuition Program
Auto/Homeowners, Pet and Legal Insurance
Exclusive Discount Programs
Adoption Assistance
Career Development Programs
Mental Health Care Programs
Family Care Services
2nd.MD, a virtual expert medical consultation service
● Bachelor’s degree (or equivalent years of experience). ● Minimum 5 years IT experience with 3+ years of relevant work experience. SRE (Site Reliability Engineering) experience preferred. ● Prior experience in a corporate IT environment. ● Experience with incident and response management. ● Strong problem solving and analytical skill and strong interpersonal and written and verbal communication skills. ● Highly adaptable to changing circumstances. Interest in continuously learning new skills and technologies. ● Experience with IT Service Management (ITSM) tools (e.g. ServiceNow). ● Experience with IT enterprise architecture tools (e.g. LeanIX) ● Experience with working in cloud ecosystems (e.g. AWS, Microsoft Azure). ● Experience with monitoring and observability tools (e.g. Splunk, Nagios, SmartBear). ● Experience with programming and scripting languages (e.g. Java, C#, C++, Python, Bash, PowerShell). ● Experience with Agile and DevOps development methodologies. ● Experience with container technologies and supporting tools. ● Experience with configuration management systems (e.g. Puppet, Ansible, Chef, Salt, Terraform). ● Experience working with continuous integration/continuous deployment tools (e.g. Git, Jenkin,).