ethoslife logo

Senior AI Engineer

ethoslifeRemote US


No Relocation

Posted: January 22, 2026

Job Description

About the Role

We’re building several LLM-powered copilots across critical workflows (e.g., underwriting productivity, agent enablement, customer support, operations/compliance, fraud). We need an AI engineer to own the LLM + retrieval + context layer that makes these copilots accurate, auditable, fast, and cost-efficient. 

Typical stack: Python/FastAPI, Postgres + vector (pgvector/Pinecone/Weaviate), OpenSearch, optional graph DB, Kubernetes + GPUs, OTEL/Datadog

Duties and Responsibilities:

  • Production RAG: indexing, retrieval, hybrid search, reranking, query rewriting, grounding, citations
  • Context Graph: entity resolution + linking + provenance; graph + vector retrieval; supports multi-hop context
  • LLM orchestration: tool/function calling, structured outputs, routing across model tiers, failure modes
  • GPU/inference cost optimization: batching, caching/KV reuse, quantization, autoscaling; optimize $/session + latency
  • Safety + compliance: PII/PHI handling, redaction, audit logs, deterministic replay, hallucination mitigation
  • LLMOps: eval harness (golden sets, regression, adversarial), monitoring for quality/cost/drift
  • Design/ship the end-to-end pipeline: retrieve → assemble context → generate → cite → log/monitor
  • Improve quality and trust via evaluation, feedback loops, and clear evidence-backed outputs
  • Partner with product, security, and domain teams; write crisp design docs; raise engineering bar
  • Ship RAG v1 with citations + measurable quality metrics
  • Deliver Context Graph v1 that improves retrieval on real copilot tasks
  • Reduce cost/latency with a concrete inference optimization plan shipped to prod

Qualifications and Skills:

  • 7+ years building production systems; 2+ years hands-on LLMs/RAG
  • Proven RAG experience (embeddings, vector DBs, hybrid search, reranking, eval)
  • Strong backend/distributed systems + observability
  • Track record shipping in high-stakes environments with auditability/correctness
  • Knowledge graph / entity resolution / provenance systems
  • GPU inference optimization (vLLM/TGI/TensorRT-LLM, quantization AWQ/GPTQ, batching)
  • Regulated domain experience (insurance/fintech/healthcare)

#LI-Remote #LI-MK1

The US national base salary range for this full-time position is $146,000 - $236,000. Our salary ranges are determined by role, level, and location. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position across all US locations. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. 

Please note that the compensation details listed in US role postings reflect the base salary only and do not include applicable bonus, equity, or benefits. 

You can find further details of our US benefits at https://www.ethoslife.com/careers/

Additional Content

About the Role

We’re building several LLM-powered copilots across critical workflows (e.g., underwriting productivity, agent enablement, customer support, operations/compliance, fraud). We need an AI engineer to own the LLM + retrieval + context layer that makes these copilots accurate, auditable, fast, and cost-efficient. 

Typical stack: Python/FastAPI, Postgres + vector (pgvector/Pinecone/Weaviate), OpenSearch, optional graph DB, Kubernetes + GPUs, OTEL/Datadog

Duties and Responsibilities:

  • Production RAG: indexing, retrieval, hybrid search, reranking, query rewriting, grounding, citations
  • Context Graph: entity resolution + linking + provenance; graph + vector retrieval; supports multi-hop context
  • LLM orchestration: tool/function calling, structured outputs, routing across model tiers, failure modes
  • GPU/inference cost optimization: batching, caching/KV reuse, quantization, autoscaling; optimize $/session + latency
  • Safety + compliance: PII/PHI handling, redaction, audit logs, deterministic replay, hallucination mitigation
  • LLMOps: eval harness (golden sets, regression, adversarial), monitoring for quality/cost/drift
  • Design/ship the end-to-end pipeline: retrieve → assemble context → generate → cite → log/monitor
  • Improve quality and trust via evaluation, feedback loops, and clear evidence-backed outputs
  • Partner with product, security, and domain teams; write crisp design docs; raise engineering bar
  • Ship RAG v1 with citations + measurable quality metrics
  • Deliver Context Graph v1 that improves retrieval on real copilot tasks
  • Reduce cost/latency with a concrete inference optimization plan shipped to prod

Qualifications and Skills:

  • 7+ years building production systems; 2+ years hands-on LLMs/RAG
  • Proven RAG experience (embeddings, vector DBs, hybrid search, reranking, eval)
  • Strong backend/distributed systems + observability
  • Track record shipping in high-stakes environments with auditability/correctness
  • Knowledge graph / entity resolution / provenance systems
  • GPU inference optimization (vLLM/TGI/TensorRT-LLM, quantization AWQ/GPTQ, batching)
  • Regulated domain experience (insurance/fintech/healthcare)

#LI-Remote #LI-MK1

The US national base salary range for this full-time position is $146,000 - $236,000. Our salary ranges are determined by role, level, and location. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position across all US locations. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. 

Please note that the compensation details listed in US role postings reflect the base salary only and do not include applicable bonus, equity, or benefits. 

You can find further details of our US benefits at https://www.ethoslife.com/careers/