
Senior AI Engineer
ethoslife • Remote US
Posted: January 22, 2026
Job Description
About the Role
We’re building several LLM-powered copilots across critical workflows (e.g., underwriting productivity, agent enablement, customer support, operations/compliance, fraud). We need an AI engineer to own the LLM + retrieval + context layer that makes these copilots accurate, auditable, fast, and cost-efficient.
Typical stack: Python/FastAPI, Postgres + vector (pgvector/Pinecone/Weaviate), OpenSearch, optional graph DB, Kubernetes + GPUs, OTEL/Datadog
Duties and Responsibilities:
- Production RAG: indexing, retrieval, hybrid search, reranking, query rewriting, grounding, citations
- Context Graph: entity resolution + linking + provenance; graph + vector retrieval; supports multi-hop context
- LLM orchestration: tool/function calling, structured outputs, routing across model tiers, failure modes
- GPU/inference cost optimization: batching, caching/KV reuse, quantization, autoscaling; optimize $/session + latency
- Safety + compliance: PII/PHI handling, redaction, audit logs, deterministic replay, hallucination mitigation
- LLMOps: eval harness (golden sets, regression, adversarial), monitoring for quality/cost/drift
- Design/ship the end-to-end pipeline: retrieve → assemble context → generate → cite → log/monitor
- Improve quality and trust via evaluation, feedback loops, and clear evidence-backed outputs
- Partner with product, security, and domain teams; write crisp design docs; raise engineering bar
- Ship RAG v1 with citations + measurable quality metrics
- Deliver Context Graph v1 that improves retrieval on real copilot tasks
- Reduce cost/latency with a concrete inference optimization plan shipped to prod
Qualifications and Skills:
- 7+ years building production systems; 2+ years hands-on LLMs/RAG
- Proven RAG experience (embeddings, vector DBs, hybrid search, reranking, eval)
- Strong backend/distributed systems + observability
- Track record shipping in high-stakes environments with auditability/correctness
- Knowledge graph / entity resolution / provenance systems
- GPU inference optimization (vLLM/TGI/TensorRT-LLM, quantization AWQ/GPTQ, batching)
- Regulated domain experience (insurance/fintech/healthcare)
#LI-Remote #LI-MK1
The US national base salary range for this full-time position is $146,000 - $236,000. Our salary ranges are determined by role, level, and location. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position across all US locations. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training.
Please note that the compensation details listed in US role postings reflect the base salary only and do not include applicable bonus, equity, or benefits.
You can find further details of our US benefits at https://www.ethoslife.com/careers/
Additional Content
About the Role
We’re building several LLM-powered copilots across critical workflows (e.g., underwriting productivity, agent enablement, customer support, operations/compliance, fraud). We need an AI engineer to own the LLM + retrieval + context layer that makes these copilots accurate, auditable, fast, and cost-efficient.
Typical stack: Python/FastAPI, Postgres + vector (pgvector/Pinecone/Weaviate), OpenSearch, optional graph DB, Kubernetes + GPUs, OTEL/Datadog
Duties and Responsibilities:
- Production RAG: indexing, retrieval, hybrid search, reranking, query rewriting, grounding, citations
- Context Graph: entity resolution + linking + provenance; graph + vector retrieval; supports multi-hop context
- LLM orchestration: tool/function calling, structured outputs, routing across model tiers, failure modes
- GPU/inference cost optimization: batching, caching/KV reuse, quantization, autoscaling; optimize $/session + latency
- Safety + compliance: PII/PHI handling, redaction, audit logs, deterministic replay, hallucination mitigation
- LLMOps: eval harness (golden sets, regression, adversarial), monitoring for quality/cost/drift
- Design/ship the end-to-end pipeline: retrieve → assemble context → generate → cite → log/monitor
- Improve quality and trust via evaluation, feedback loops, and clear evidence-backed outputs
- Partner with product, security, and domain teams; write crisp design docs; raise engineering bar
- Ship RAG v1 with citations + measurable quality metrics
- Deliver Context Graph v1 that improves retrieval on real copilot tasks
- Reduce cost/latency with a concrete inference optimization plan shipped to prod
Qualifications and Skills:
- 7+ years building production systems; 2+ years hands-on LLMs/RAG
- Proven RAG experience (embeddings, vector DBs, hybrid search, reranking, eval)
- Strong backend/distributed systems + observability
- Track record shipping in high-stakes environments with auditability/correctness
- Knowledge graph / entity resolution / provenance systems
- GPU inference optimization (vLLM/TGI/TensorRT-LLM, quantization AWQ/GPTQ, batching)
- Regulated domain experience (insurance/fintech/healthcare)
#LI-Remote #LI-MK1
The US national base salary range for this full-time position is $146,000 - $236,000. Our salary ranges are determined by role, level, and location. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position across all US locations. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training.
Please note that the compensation details listed in US role postings reflect the base salary only and do not include applicable bonus, equity, or benefits.
You can find further details of our US benefits at https://www.ethoslife.com/careers/