PROCESSING APPLICATION
Hold tight! We’re comparing your resume to the job requirements…
ARE YOU SURE YOU WANT TO APPLY TO THIS JOB?
Based on your Resume, it doesn't look like you meet the requirements from the employer. You can still apply if you think you’re a fit.
Job Requirements of Manager - Site Reliability Egineering:
-
Employment Type:
Full-Time
-
Location:
Irving, TX (Onsite)
Do you meet the requirements for this job?
Manager - Site Reliability Egineering
Major Activities
- Lead, mentor, and grow a team of SREs, fostering effective collaboration and high-performance culture with a focus on reliability, innovation, and continuous improvement.
- Oversee the design, implementation, and monitoring of the reliability and performance of GCP-hosted services.
- Define and enforce Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Service Level Agreements (SLAs) for all services. Establish and improve incident management processes, including post-mortem analysis and root cause investigation, to prevent recurrence of failures.
- Lead efforts to automate operational processes, reducing manual work and improving system reliability. Ensure systems are architected with fault tolerance, scalability, and recovery in mind.
- Oversee the deployment and maintenance of cloud infrastructure. Drive continuous improvement in cost efficiency by optimizing GCP resource usage and scaling strategies. Ensure infrastructure is built to handle scale, resiliency, and security requirements. Partner with engineering teams to align infrastructure needs with application development.
- Ensure comprehensive monitoring of systems, applications, and infrastructure with appropriate alerting mechanisms. Define and track key metrics to evaluate the health and performance of GCP-hosted services. Implement dashboards and reporting mechanisms for stakeholders to track system performance and reliability.
- Lead the effort to create clear, actionable incident reports and improve reporting processes for transparency. Work closely with development and product teams to ensure reliable application delivery and troubleshooting. Influence application architecture design decisions to ensure reliability and operational scalability. Advocate for a strong DevOps culture with an emphasis on automation and continuous integration/deployment (CI/CD).
Other duties as assigned
Minimum Education
- Bachelor’s degree in computer science, Information Technology, or a related field (or equivalent work experience).
Minimum Type of Experience the Job Requires
- 7+ years of experience in Site Reliability Engineering with familiarity in DevOps or a similar area, with at least 2-3 years in a managerial or leadership capacity.
- Solid understanding of cloud platforms such as Google Cloud, Oracle cloud and experience with infrastructure-as-code (Terraform, CloudFormation).
- Proven experience with monitoring, observability and logging platform/tools (Prometheus, Grafana, ELK stack, Datadog, GCP cloud observability etc.).
- Good understanding of containerization and orchestration tools like Docker, Kubernetes, Helm, CI/CD pipeline, development and deployment strategies.
- Experience in leading incident response and disaster recovery efforts. Expertise in managing large-scale distributed systems and microservices architecture.
- Experience in retail industry with good understanding of ecommerce applications.
Other
- Excellent leadership and team management skills with the ability to foster a collaborative, inclusive, and productive environment.
- Strong problem-solving and troubleshooting skills.
- In-depth knowledge of security best practices and vulnerability management.
- Ability to balance technical depth with strategic decision-making to drive business outcomes.
- Exceptional communication skills, both verbal and written.
Preferred Education
- Master’s degree in computer science, Information Technology, or a related field (or equivalent work experience).
Preferred Type of Experience the Job Requires
- Google Cloud Professional DevOps Engineer, Google Cloud Professional Cloud Architect
- Kubernetes Certification (CKA/CKAD), Terraform Associate, or similar certifications.
Applicants in the U.S. must satisfy federal, state, and local legal requirements of the job.
At The Michaels Companies Inc, our purpose is to fuel the joy of creativity. As the leading creative destination in North America, we operate over 1,300 stores in 49 states and Canada and online at
and . The Michaels Companies, Inc. also owns Artistree, a manufacturer of custom and specialty framing merchandise, and , a dedicated handmade goods marketplace. Founded in 1973 and headquartered in Irving, Texas, Michaels is the best place for all things creative. For more information, please visitAt Michaels, we prioritize the wellbeing of our teams by providing robust benefits for both full-time and part-time Team Members. Our benefits include health insurance (medical, dental, and vision), paid time off, tuition assistance, generous employee discounts, and much more. For more information, visit
.Michaels is an Equal Opportunity Employer. We are here for all Team Members and all Makers to create, innovate and be better together.
Michaels is committed to the full inclusion of all qualified individuals. In keeping with this commitment, Michaels will assure that people with disabilities are provided reasonable accommodations. Accordingly, if a reasonable accommodation is required to fully participate in the job application or interview process, to perform the essential functions of the job, and/or to receive all other benefits and privileges of employment, please contact Customer Care at