Site Reliability Engineering Manager
Nuance Communications, Inc. is the pioneer and leader in conversational AI innovations that bring intelligence to everyday work and life. The company delivers solutions that understand, analyze and respond to human language, amplifying human intelligence. With decades of domain and artificial intelligence expertise, Nuance works with thousands of organizations – in healthcare, telecommunications, automotive, financial services, retail, and more – to create stronger relationships and better experiences for their customers.
Join our Healthcare team...caring for clinicians the way they care for patients. Beyond words. We create technology that lets clinicians capture and document care quickly and easily so they can focus their attention on their patients
Summary: The Site Reliability team keeps the Dragon Medical One cloud and our customers protected. The team uses innovation by continually improving, developing or expanding feature enhancements that enable early detection, automated probable root cause identification, and rapid resolution. We partner to help Product Teams build systems that are resilient, as well as providing the support, engagement, and contribution to identifying and solving resiliency gaps to ensure that service availability remains optimal.
- Keep the customer facing services available at top performance by maintaining the constant health of the supporting systems.
- Incident Management - Technical support role during major incidents
- Ability to operate in the high-pressure environment and troubleshoot complex issues quickly, while successfully handling multiple priorities.
- Enhance our monitoring systems by developing new software systems to promote early detection, full stack and cross platform correlation, and automated probable root cause identification
- Automate Site Reliability Engineering operations by developing software applications and API Integrations to connect disparate systems
- Participate in the technical review of incidents, partnering with R&D teams
- Develop debugging tools to assist engineers in diagnosing production service problems
- Participate in on-call rotation
- Apply software development workflows to operational environments
- Partnering with application teams on new services/features and capacity planning per business needs.
- Perform tasks related to securing and keeping the products, tools, and processes that you are responsible for securing.
Number of Years of Work Experience: 8. Minimum 5+ years hands-on DevOps experience. At least 3 years of experience managing teams
- Lead a team of Site Reliability Engineers on projects for users and be directly responsible for uptime.
- Develop stakeholder relationships, working with teams across the business to ensure upstream and downstream processes exist to support meeting SLAs and OLAs.
- Own end-to-end availability and performance of key services and build automation to prevent problem recurrence.
- Lead by example, mentor the team and establish credibility through quality technical execution.
- Manage on-call rotations across continents, using a follow-the-sun model.
- Design, write and deliver software to improve the availability, scalability, latency and efficiency of Azure's services.
- Foster a culture that leverages metrics to make informed decisions and guide change.
- Experience in support of distributed systems with Linux & Windows knowledge.
- Experience in a role with hands on complex Technical Problem Solving as a daily duty
- Self Motivated
- Ability to work independently and as part of a team
- Excellent Communication Skills
- Be curious and ask questions
- Systems/Network administration background
- Knowledge of administrative tools and protocols
- Knowledge of Infrastructure as Code tools such as Azure ARM Templating or Terraform
- Knowledge of Configuration Management tools such as SaltStack, Puppet or Ansible
- Understanding and experience in cloud infrastructure and platforms, such as Azure
- Agile development experience/understanding
- Python /PowerShell or other scripting experience
- Experience in implementations, administration and maintaining monitoring systems
- Experience working on Federal projects.
- Experience with the Security Content Automation Protocol (SCAP)
- Experience with Security Technical Implementation Guides (STIG)
Education: BS in computer science or related discipline
Nuance offers a compelling and rewarding work environment. We offer market competitive salaries, bonus, equity, benefits, meaningful growth and development opportunities and a casual yet technically challenging work environment. Join our dynamic, entrepreneurial team and become part of our continuing success.
Nuance Communication Inc. is an equal opportunity employer. We evaluate qualified applicants without regard to race, age, color, religion, sex, national origin, disability, veteran status, gender identity, sexual orientation and other legally protected characteristics. The EEO is the Law poster and its supplement is available here. If you need a reasonable accommodation because of a disability for any part of the employment process, please call 781-565-5086 – Human Resources Department and let us know the nature of your request and your contact information.