Project Manager - R01559594

AI overview

Lead cross-functional projects with a focus on AI evaluation frameworks and responsible AI practices, ensuring agents meet high standards of safety and effectiveness.
Project Manager Primary Skills
  • Project Delivery Management, Manage Fixed Price Delivery, Estimations and Metrics, Client Management, Communications Management, Agile, Agile Metrics and Reporting, Professional Scrum Master-CSM/ PSM1, Manage Outcome Based Delivery, Requirements Creation/User Stories, Digital Accumen, Agile Coaching, Change Management (Project Management), Backlog Grooming
  • Job requirements
  • Job Title: AI Agent Evaluation Engineer JD: We are seeking a highly motivated and technically proficient AI Agent Evaluation Engineer to join our growing AI team. This crucial role will be responsible for defining, developing, and executing robust Agent evaluation frameworks and test strategies, with a significant focus on Responsible AI and Safety Evals, for our agents built using the Google Agent Development Kit (ADK). The ideal candidate will bridge the gap between AI development and reliable deployment, ensuring our agents are safe, ethical, effective, and meet high-quality performance standards. The role will be of 70% Automation and 30% Manual Testing Key Responsibilities ● Evaluation (Evals) Development: ○ Develop synthetic testing environments and simulation strategies to stress-test agents under various real-world conditions. ○ Design, implement, and maintain scalable and repeatable evaluation datasets and metrics to test agent performance, robustness, safety, and alignment (e.g., faithfulness, hallucination, prompt injection). ○ Specifically focus on building Evals for agents utilizing the Google Agent Development Kit (ADK) and related Google AI/ML services (e.g., Vertex AI, Gemini models). ● Responsible AI and Safety Evals (New Focus): ○ Develop and execute adversarial testing, jailbreaking, and red-teaming methodologies to identify potential harm, bias, toxicity, and unauthorized behavior in agent responses. ○ Implement and measure adherence to established ethical guidelines, safety policies, and content filtering mechanisms. ○ Work with policy and legal teams to ensure agent evaluations cover regulatory compliance and fairness objectives. ● Test Strategy & Execution: ○ Define comprehensive QA strategies, including functional, integration, regression, and user acceptance testing (UAT) specifically for conversational and goal-oriented AI agents. ○ Develop and execute detailed Test artefacts such as test plans,test cases, test Scenarios for agent features, tool use, memory, and reasoning capabilities. ● Bug Detection & Management: ○ Identify, document, prioritize, and track bugs using Jira, performance degradations, and alignment failures in agent behavior. ○ Collaborate closely with AI/ML Engineers and Researchers to analyze root causes and validate fixes. ● Automation & Tools: ○ Integrate evaluation pipelines into the CI/CD process to enable continuous quality assurance and fast iteration cycles. ● Reporting & Insights: ○ Analyze and interpret evaluation results, providing clear, actionable insights and quality reports to stakeholders and development teams, with a specific focus on safety metrics and risk mitigation. Required Skills & Qualifications ● Experience: 6+ years in Software QA, with at least 2 years focused on testing or evaluating AI/ML systems, conversational agents, or Large Language Models (LLMs). ● Safety Evals Expertise (Mandatory): Direct experience in designing and executing safety evaluations (red teaming, adversarial testing), bias detection, and measuring toxicity/harmful content in generative AI models. ● Agent/LLM Evals: Proven experience developing and running general evaluations (Evals) for LLM-powered applications knowing libraries like PyTest (Must) ● Google ADK Familiarity (Mandatory): Direct or strong conceptual understanding of the Google Agent Development Kit (ADK) and its components. ● Programming: Strong proficiency in Python is mandatory for script development, data processing, and automation. ● Cloud & MLOps: Familiarity with Google Cloud Platform (GCP) services relevant to AI/ML (e.g., Vertex AI) and integrating testing into MLOps workflows. ● Tools and Libraries: Langsmith, DeepEval, Ragas, Giskard, Hugging face.
  • Brillio is a global leader in Enterprise Digital Transformation Solutions, partnering with companies to drive business improvement and competitiveness through innovative technology solutions.

    View all jobs
    Ace your job interview

    Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

    Project Manager Q&A's
    Report this job
    Apply for this job