AI Infrastructure Engineer

Full Time

full time

14 Sep 2025

Verified by Turrior

Content + Source + Freshness • 12 Dec 2025 • 95% confidence

84 / 100

Offer value

A compelling opportunity with high impact in AI infrastructure at a Series A startup, promising to shape foundational backend systems.

High-demand role in AI infrastructure at a Series A startup
Opportunity to shape foundational backend systems
Collaborative, dynamic work environment
Requires strong backend engineering skills (Python)

Pros

Joining a growing company with significant potential
Possibility to influence the development of core backend systems
Collaborative work environment with cross-functional teams

Cons

Early-stage startup may entail more uncertainty
Demanding requirements for technical proficiency and experience
Limited structure compared to more established organizations

Who it's for

Mid to Senior • On-site

Good fit

Mid-level backend engineers
Candidates eager to influence foundational systems
Tech professionals interested in startups

Not recommended for

New graduates or inexperienced developers
Individuals seeking only stable, corporate environments
Those resistant to rapid changes and innovation

Motivation fit

Desire to be part of a foundational phase in a startupInterest in scaling backend systems for AIWillingness to adapt and innovate amid challenges

Key skills

Backend development (Python)Distributed systems architectureJob orchestration expertiseCollaboration across multidisciplinary teams

Score: 84/100 AI verified analysis

About the job

About the Role

We’re hiring an AI Infrastructure Engineer to shape and scale the backend systems that power our AI platform. As a Series A company, your work will be foundational, enabling safe, efficient, and reliable AI workflows from end to end.

What You’ll Do

Design and implement scalable backend architectures for AI workloads (inference, orchestration, monitoring).
Own distributed job orchestration with Temporal and related systems.
Improve data pipeline performance by designing smarter caching strategies (e.g., file deduplication, hot/cold storage, Redis caching layers) to reduce redundant compute and API calls.
Build observability, monitoring, retries, and fault tolerance into all workflows.
Manage infrastructure reliability, incident response, and performance.
Develop tooling and platform infrastructure to support rapid growth.
Partner with ML engineers to bring models to production at scale.

What We’re Looking For

4+ years of backend engineering (Python is a must).
Strong background in distributed systems, job orchestration, and task queues.
Deep knowledge of concurrency, parallelism, and multithreading—including async/await, event loops, thread pools, synchronization primitives, deadlocks, and race conditions—is a must. You should know how to design systems that maximize throughput without sacrificing correctness or safety.
Hands-on experience with Temporal, Redis, Airflow, Celery, RabbitMQ (or similar).
Experience with LLM serving and routing fundamentals (rate limiting, streaming, load balancing, budgets).
Comfortable with containers & orchestration: Docker, Kubernetes.
Familiarity with cloud platforms (AWS/GCP) and IaC (Terraform).
Experience with multiple storage systems: S3, Postgres, MongoDB, Redis, and Elasticsearch.
Track record scaling systems in startups or fast-paced environments.
Understanding of deploying, monitoring, and optimizing AI/ML systems in production with strong CI/CD practices.

Why You’ll Love Working Here

Play a foundational role at a fast-growing Series A startup that is shaping the future of AI in enterprise workflows.
Collaborate across Product, ML, and Platform teams, being the bridge between AI logic and scalable execution.
Build infrastructure that enables real value for large enterprises: low-code, secure, and scalable AI workflows.
Join a company that’s scaling thoughtfully and values developer experience.