Machine Learning Engineer Job at Evolve Group, Santa Rosa, CA

ZTZzeW03ZlRTa2Z2dGFLOXN1bTdyenh3YXc9PQ==
  • Evolve Group
  • Santa Rosa, CA

Job Description

Machine Learning Engineer

Tech start-up

San Fransisco based

We’ve partnered with one of the most ambitious and technically rigorous AI research labs in the world. Based in San Francisco, this team is building foundation models entirely from scratch.

They are now hiring ML Infrastructure Engineers to design and scale the systems that power large-scale, distributed model training. If you’ve built infrastructure that runs across hundreds of GPUs, thrive under technical complexity, and want to work side-by-side with elite AI researchers — this is the role.

Key Responsibilities:

  • Build and scale distributed training systems for large-scale model training across LLMs, vision, and robotics.
  • Set up and run large-scale training across many GPUs using tools like Kubernetes, DeepSpeed, and FSDP.
  • Troubleshoot system issues (GPU errors, network problems) and build tools to monitor and recover from failures.
  • Optimize PyTorch pipelines, sharding, and sampling strategies.
  • Collaborate closely with researchers to support novel model training at scale.

Requirements:

  • 3–15 years in ML infrastructure, systems, or research engineering roles.
  • Proven experience scaling distributed training for large models.
  • Strong with PyTorch, CUDA, NCCL, Kubernetes.
  • Familiar with setting up distributed training clusters.
  • Deep understanding of PyTorch dataloaders, data sharding, and sampling.
  • Strong communicator with a collaborative, mission-driven mindset.

This is a fully in-person role based in San Francisco , it's ideal for engineers excited to build at the edge of what's possible in AI.

Job Tags

Immediate start,

Similar Jobs

Stealth Startup

New College Grad/Entry Level - Sourcer/Recruiter/HR Job at Stealth Startup

 ...Exciting Entry-Level Recruiter/Sourcer Opportunity with a Fast-Growing Startup Recruiting Firm! Are you a recent college grad ready to jump into the fast-paced world of recruiting? Are you excited about helping innovative early-stage startups build strong, high-performing... 

Marriott International, Inc

Esthetician Job at Marriott International, Inc

 ...protected basis, including disability, veteran status, or other basis protected by applicable law. Combining timeless glamour with a vanguard spirit, St. Regis Hotels & Resorts is committed to delivering exquisite experiences at more than 50 luxury hotels and resorts in... 

RCT Systems

Cyber Security Engineer Job at RCT Systems

Join to apply for the Cyber Security Engineer role at RCT Systems .RCT requires multiple cybersecurity professionals to conduct penetration...  ...protected class.Additional Information:Seniority level: Entry levelEmployment type: Full-timeJob function: Information... 

Vanguard-IP

Patent Prosecution Attorney (Boston) Job at Vanguard-IP

 ...in the state in which they are officed. Admission to the USPTO is also required. RESPONSIBILITIES Primary duties will include patent prosecution, client counseling, drafting patent applications, freedom to operate opinions and due diligence. SUMMARY Vanguard... 

Glodom Language Solutions Co., Ltd.

Native Translator:English to Japanese/ Korean(Remote) Job at Glodom Language Solutions Co., Ltd.

Requirements Native in Japanese/ Korean Degree in translation, interpretation, language studies, or related field Experience in translation, interpretation, localization Curious, quick learner (enjoy learning how things work), and attentive to details Flexible...