Machine Learning Engineer Job at Evolve Group, San Jose, CA

TTBFL2VQb3NKU3Q2bFdvZVpNNklwTUpWVWc9PQ==
  • Evolve Group
  • San Jose, CA

Job Description

Machine Learning Engineer

Tech start-up

San Fransisco based

We’ve partnered with one of the most ambitious and technically rigorous AI research labs in the world. Based in San Francisco, this team is building foundation models entirely from scratch.

They are now hiring ML Infrastructure Engineers to design and scale the systems that power large-scale, distributed model training. If you’ve built infrastructure that runs across hundreds of GPUs, thrive under technical complexity, and want to work side-by-side with elite AI researchers — this is the role.

Key Responsibilities:

  • Build and scale distributed training systems for large-scale model training across LLMs, vision, and robotics.
  • Set up and run large-scale training across many GPUs using tools like Kubernetes, DeepSpeed, and FSDP.
  • Troubleshoot system issues (GPU errors, network problems) and build tools to monitor and recover from failures.
  • Optimize PyTorch pipelines, sharding, and sampling strategies.
  • Collaborate closely with researchers to support novel model training at scale.

Requirements:

  • 3–15 years in ML infrastructure, systems, or research engineering roles.
  • Proven experience scaling distributed training for large models.
  • Strong with PyTorch, CUDA, NCCL, Kubernetes.
  • Familiar with setting up distributed training clusters.
  • Deep understanding of PyTorch dataloaders, data sharding, and sampling.
  • Strong communicator with a collaborative, mission-driven mindset.

This is a fully in-person role based in San Francisco , it's ideal for engineers excited to build at the edge of what's possible in AI.

Job Tags

Immediate start,

Similar Jobs

PRN Healthcare

Correctional Medical-Surgical Telemetry Nurse Job at PRN Healthcare

 ...Job Description PRN Healthcare is seeking a travel nurse RN Correctional Med Surg / Telemetry for a travel nursing job in Columbus, Ohio. Job Description & Requirements ~ Specialty: Med Surg / Telemetry ~ Discipline: RN ~ Start Date: 10/20/2025~ Duration... 

TRANE TECHNOLOGIES

Maintenance Technician-Electrician (3rd Shift) ) Job at TRANE TECHNOLOGIES

 ...computerized machine controls, repair and maintain machines, equipment, buildings, and grounds. Responsibilities: Makes major electrical, mechanical, and construction repairs and installation involving schematics and/or blueprint interpretation. Troubleshoots... 

JABIL CIRCUIT, INC

Quality Assurance Engineer (SALISBURY) Job at JABIL CIRCUIT, INC

 ...SUMMARY The Lead Test Engineer (Test Engineer IV) is responsible for designing, developing...  ...test procedures, hardware, and software in a dynamic server manufacturing environment...  ...computer monitor screens a great deal of time. WORK ENVIRONMENT The work environment... 

Connected Health Care

Travel Physical Therapist Job at Connected Health Care

 ...Job Description Connected Health Care is seeking a travel Physical Therapist for a travel job in Rancho Mirage, California. Job...  ...art galleries and theaters.\n\t Relaxation: Take advantage of luxury spas and wellness resorts in the Coachella Valley.\n\t Dining... 

D Hospitality Design LLC

Interior Designer Job at D Hospitality Design LLC

 ...-task effectively. This position offers a dynamic work environment providing great hands-on experience in all facets of the interior design process working on commercial/hospitality projects. We are working on a number of large and well-known name brand hospitality...