Weekday AI logo

Engineering Expert (PhD) - AI Systems Evaluation

Weekday AIUnited States


No Relocation

Posted: February 25, 2026

Job Description

This role is for one of our clients

Compensation: $73.29 per hour

PhD-level engineers are sought to support high-impact collaborations with advanced AI research teams. This role focuses on improving the accuracy, rigor, and reliability of general-purpose conversational AI systems, particularly in engineering-related contexts.

AI systems used in professional engineering scenarios must demonstrate strong applied reasoning, quantitative accuracy, and alignment with real-world systems. This project centers on evaluating and enhancing how models interpret, reason about, and explain engineering concepts across multiple disciplines.

This role is for one of our clientsCompensation: $73.29 per hourPhD-level engineers are sought to support high-impact collaborations with advanced AI research teams. This role focuses on improving the accuracy, rigor, and reliability of general-purpose...

Key Responsibilities

  • Develop and refine prompts to guide AI behavior in engineering-specific scenarios
  • Evaluate model-generated responses for technical correctness, applied reasoning, completeness, and practical relevance
  • Fact-check technical claims using authoritative public sources and domain expertise
  • Annotate outputs by identifying conceptual gaps, flawed assumptions, and factual inaccuracies
  • Assess clarity, structure, and appropriateness of explanations for various audiences
  • Ensure responses align with expected conversational standards and system-level guidelines
  • Apply structured evaluation frameworks, taxonomies, and benchmarking standards consistently

Required Qualifications

  • PhD in Engineering or a closely related field
  • Deep expertise in one or more of the following domains:
    • Mechanical & Physical Systems Engineering
    • Electrical, Electronic & Computer Engineering
    • Chemical, Materials & Process Engineering
    • Civil, Environmental & Infrastructure Engineering

  • Strong familiarity with large language models (LLMs) and their practical applications
  • Excellent written communication skills with the ability to clearly explain complex technical concepts
  • High attention to detail and ability to detect subtle technical inaccuracies
  • Experience reviewing, editing, or critiquing technical or academic writing

Preferred Experience

  • Applied research, industry engineering workflows, or systems design
  • Experience with reinforcement learning from human feedback (RLHF), model evaluation, or structured data annotation
  • Teaching, mentoring, or explaining engineering concepts to non-expert audiences
  • Familiarity with structured evaluation rubrics, benchmarks, or quality assurance frameworks

What Success Looks Like

  • You consistently identify technical inaccuracies, incomplete reasoning, or flawed assumptions in engineering-related AI outputs
  • Your structured feedback measurably improves the rigor, clarity, and correctness of model responses
  • You produce consistent, reproducible evaluation artifacts that strengthen model performance over time
  • Engineering-focused AI systems demonstrate greater reliability and trustworthiness as a result of your evaluations

Contract & Payment Terms

  • Engagement will be structured as an independent contractor agreement
  • Fully remote with flexible scheduling
  • Projects may be extended, shortened, or concluded early based on performance and evolving needs
  • Assignments will not require access to confidential or proprietary information from any employer, client, or institution
  • Payments are processed weekly via Stripe or Wise based on services rendered
  • Visa sponsorship is not available; H1-B and STEM OPT candidates cannot be supported at this time

Additional Content

This role is for one of our clients

Compensation: $73.29 per hour

PhD-level engineers are sought to support high-impact collaborations with advanced AI research teams. This role focuses on improving the accuracy, rigor, and reliability of general-purpose conversational AI systems, particularly in engineering-related contexts.

AI systems used in professional engineering scenarios must demonstrate strong applied reasoning, quantitative accuracy, and alignment with real-world systems. This project centers on evaluating and enhancing how models interpret, reason about, and explain engineering concepts across multiple disciplines.

This role is for one of our clientsCompensation: $73.29 per hourPhD-level engineers are sought to support high-impact collaborations with advanced AI research teams. This role focuses on improving the accuracy, rigor, and reliability of general-purpose...

Key Responsibilities

  • Develop and refine prompts to guide AI behavior in engineering-specific scenarios
  • Evaluate model-generated responses for technical correctness, applied reasoning, completeness, and practical relevance
  • Fact-check technical claims using authoritative public sources and domain expertise
  • Annotate outputs by identifying conceptual gaps, flawed assumptions, and factual inaccuracies
  • Assess clarity, structure, and appropriateness of explanations for various audiences
  • Ensure responses align with expected conversational standards and system-level guidelines
  • Apply structured evaluation frameworks, taxonomies, and benchmarking standards consistently

Required Qualifications

  • PhD in Engineering or a closely related field
  • Deep expertise in one or more of the following domains:
    • Mechanical & Physical Systems Engineering
    • Electrical, Electronic & Computer Engineering
    • Chemical, Materials & Process Engineering
    • Civil, Environmental & Infrastructure Engineering

  • Strong familiarity with large language models (LLMs) and their practical applications
  • Excellent written communication skills with the ability to clearly explain complex technical concepts
  • High attention to detail and ability to detect subtle technical inaccuracies
  • Experience reviewing, editing, or critiquing technical or academic writing

Preferred Experience

  • Applied research, industry engineering workflows, or systems design
  • Experience with reinforcement learning from human feedback (RLHF), model evaluation, or structured data annotation
  • Teaching, mentoring, or explaining engineering concepts to non-expert audiences
  • Familiarity with structured evaluation rubrics, benchmarks, or quality assurance frameworks

What Success Looks Like

  • You consistently identify technical inaccuracies, incomplete reasoning, or flawed assumptions in engineering-related AI outputs
  • Your structured feedback measurably improves the rigor, clarity, and correctness of model responses
  • You produce consistent, reproducible evaluation artifacts that strengthen model performance over time
  • Engineering-focused AI systems demonstrate greater reliability and trustworthiness as a result of your evaluations

Contract & Payment Terms

  • Engagement will be structured as an independent contractor agreement
  • Fully remote with flexible scheduling
  • Projects may be extended, shortened, or concluded early based on performance and evolving needs
  • Assignments will not require access to confidential or proprietary information from any employer, client, or institution
  • Payments are processed weekly via Stripe or Wise based on services rendered
  • Visa sponsorship is not available; H1-B and STEM OPT candidates cannot be supported at this time