About Metaphysic
Metaphysic is the industry leader in developing AI technologies and machine learning research to create photorealistic content at internet scale. We were recently named TIME100 Most Influential Companies for 2023 and are focused on the ethical development of AI to support the genius of human performance. We're run by experienced founders and backed by some of the top investors in the world. We’re only just getting started in defining the next generation of content creation. Join our fast growing team and help bring our groundbreaking vision to life.
The Role: Metaphysic is seeking a Site Reliability Engineer who will be integral to maintaining and enhancing the reliability and efficiency of our cutting-edge IT infrastructure. This role requires a unique blend of technical expertise, keen insight into agile and DevOps practices, and a passion for AI. As an SRE at Metaphysic, you will collaborate closely with our talented team and users to ensure the highest level of performance and reliability of our hybrid infrastructure, which is vital for our groundbreaking AI-driven content production.
Your Mission:
- Dive deep into implementing new tools, optimizations, and automation that make Metaphysic stand out.
- Be the technical focal point for infrastructure for the team and towards stakeholders, connecting developers, management, and artists.
- Ensure our systems are compliant and airtight against cybersecurity threats.
- Implement and maintain systems that ensure optimal performance and reliability of our IT infrastructure.
- Proactively identify and resolve issues to increase efficiency, minimize downtime, and improve user experience.
- Embrace transparency – document, peer at code, and collaborate with the team.
Key Responsibilities:
- Monitor and maintain the health of our hybrid infrastructure, encompassing both on-premise and cloud-based environments.
- Develop automation tools and processes to improve system reliability and efficiency.
- Collaborate with developers, engineers, and external partners to optimize infrastructure operations.
- Manage capacity planning, availability management, and disaster recovery strategies.
- Uphold compliance standards and ensure stringent network and system security.
- Develop, manage, and enhance infrastructure both on-premise and in the cloud to ensure systems are scalable, resilient, and reliable.
- Build and improve tools and automation to efficiently utilize infrastructure.
- Lead the team in complex changes and incidents, postmortems, and root cause analysis to help us continuously improve.
Must-Haves:
- Team player, willing to learn and share knowledge and experience with others.
- Previous experience as a DevOps/SRE engineer or a similar software engineering role.
- Proficiency with Git, and GitHub workflows (including Github actions, Gitlab CI/CD).
- Ansible is your superhero tool, especially on-prem.
- A solid grasp of CI/CD pipelining.
- Working knowledge of databases, both SQL and NoSQL solutions.
- Working experience with network and security solutions.
- Proven track record delivering Linux solutions (RH and Debian-based like Ubuntu).
- Containers are your friends – whether it's podman, docker, or Kubernetes.
- Infrastructure monitoring, such as collecting, processing, and analysis of metrics from tools such as Cloudwatch, Prometheus, and Zabbix.
- Effective communication and collaboration skills, with a focus on user relations and stakeholder management.
- Writing technical documentation.
Bonus Points for:
- A Bachelor’s degree in IT, Computer Science, or a related field.
- Experience with IaC tools like Terraform.
- Experience in remote work environments, showcasing self-reliance and proactive engagement.
- A passion for AI and an interest in contributing to ethical, groundbreaking technology.
As part of our team, you’ll enjoy:
- The hustle of a startup with the impact of a global business.
- Tremendous opportunity to join one of the best and fastest growing AI companies in the world.
- Working with an extraordinary team of smart, creative, fun and highly motivated people.
- You will be joining a fantastic culture & a team, all highly supportive, collaborative, transparent and are all very passionate about our tech and mission.
- Flexible working hours, including remote working - this role is solely remote :)
Metaphysic is an equal-opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.