Job Description
We are seeking a visionary Senior AI Infrastructure Engineer to architect the foundation of our 2026 generative AI roadmap. As we stand on the precipice of a new era in artificial intelligence, we need a technical leader who can bridge the gap between cutting-edge machine learning research and robust, scalable production systems.
In this role, you will be responsible for building the high-performance compute clusters and inference pipelines that power our next-generation LLMs. If you are passionate about pushing the boundaries of what is possible in AI and thrive in a fast-paced, innovative environment, we want to meet you.
What you will do:
- Architect and manage large-scale GPU clusters for high-throughput inference.
- Optimize deep learning models for edge deployment and latency-sensitive applications.
- Implement Kubernetes-based orchestration strategies for AI workloads.
- Collaborate with research teams to translate theoretical models into scalable software.
- Ensure 99.99% uptime and data integrity across distributed AI systems.
Requirements:
- 5+ years of experience in systems engineering, DevOps, or MLOps.
- Expert proficiency in Python, PyTorch, and TensorFlow.
- Strong experience with cloud platforms (AWS/GCP/Azure) and containerization (Docker/Kubernetes).
- Deep understanding of distributed systems, networking, and database technologies.
- Experience with Terraform and Infrastructure as Code (IaC).
- Bachelor’s degree in Computer Science or equivalent technical field.
Responsibilities
- Design and deploy scalable GPU clusters for high-throughput inference.
- Optimize deep learning models for edge deployment.
- Implement CI/CD pipelines for AI model training and deployment.
- Monitor system performance and troubleshoot complex infrastructure issues.
- Drive automation initiatives to reduce manual overhead in ML lifecycle.
Qualifications
- Bachelor’s degree in Computer Science, Engineering, or related field.
- Proven track record of deploying large-scale AI systems.
- Strong knowledge of Linux internals and shell scripting.
- Excellent problem-solving skills and ability to work in a team-oriented, collaborative environment.
- Experience with security best practices in cloud infrastructure.