Home Job Details
N
Information Technology 🏢 Full Time ⭐️ Verified

Senior AI Infrastructure Engineer 2026

Nebula Dynamics
San Francisco
Estimated Salary
USD 180.000 – USD 250.000
Live Update
17 Mei 2026
Deadline
17 Mei 2027

Job Description

We are seeking a visionary Senior AI Infrastructure Engineer to architect the foundation of our 2026 generative AI roadmap. As we stand on the precipice of a new era in artificial intelligence, we need a technical leader who can bridge the gap between cutting-edge machine learning research and robust, scalable production systems.


In this role, you will be responsible for building the high-performance compute clusters and inference pipelines that power our next-generation LLMs. If you are passionate about pushing the boundaries of what is possible in AI and thrive in a fast-paced, innovative environment, we want to meet you.


What you will do:

  • Architect and manage large-scale GPU clusters for high-throughput inference.
  • Optimize deep learning models for edge deployment and latency-sensitive applications.
  • Implement Kubernetes-based orchestration strategies for AI workloads.
  • Collaborate with research teams to translate theoretical models into scalable software.
  • Ensure 99.99% uptime and data integrity across distributed AI systems.

Requirements:

  • 5+ years of experience in systems engineering, DevOps, or MLOps.
  • Expert proficiency in Python, PyTorch, and TensorFlow.
  • Strong experience with cloud platforms (AWS/GCP/Azure) and containerization (Docker/Kubernetes).
  • Deep understanding of distributed systems, networking, and database technologies.
  • Experience with Terraform and Infrastructure as Code (IaC).
  • Bachelor’s degree in Computer Science or equivalent technical field.

Responsibilities

  • Design and deploy scalable GPU clusters for high-throughput inference.
  • Optimize deep learning models for edge deployment.
  • Implement CI/CD pipelines for AI model training and deployment.
  • Monitor system performance and troubleshoot complex infrastructure issues.
  • Drive automation initiatives to reduce manual overhead in ML lifecycle.

Qualifications

  • Bachelor’s degree in Computer Science, Engineering, or related field.
  • Proven track record of deploying large-scale AI systems.
  • Strong knowledge of Linux internals and shell scripting.
  • Excellent problem-solving skills and ability to work in a team-oriented, collaborative environment.
  • Experience with security best practices in cloud infrastructure.

Required Skills

Python Kubernetes PyTorch AWS Docker Terraform Machine Learning Deep Learning MLOps Scalable Systems Linux

Ready to Take This Challenge?

Make sure your resume is ready. Submit your application now before the deadline.

Apply Now

Related Jobs

Similar job recommendations for you

View All