AIThinkers - Where Human Insight and Intelligent Systems Work as One

We are looking for an Agentic Development & Evals Ops Engineer to design and develop AI agents and support evaluation operations for AI systems. The role involves building agent-based workflows, monitoring AI performance, and creating evaluation frameworks to ensure the reliability and accuracy of AI models.

Key Responsibilities:

Design and develop AI agents and agentic workflows using modern AI/LLM frameworks.
Build and maintain evaluation pipelines (Evals Ops) to test AI agent performance, accuracy, and reliability.
Implement automated evaluation frameworks for LLM outputs and agent behaviours.
Collaborate with engineering and product teams to improve AI model quality and performance.
Develop monitoring systems for AI agents to track metrics such as response quality, latency, and accuracy.
Create and maintain documentation, evaluation reports, and testing datasets.
Work on continuous improvement loops by integrating feedback into agent training and system updates.
Ensure scalability, reliability, and governance of AI systems.

Required Skills

Experience with Agentic Development frameworks (LangChain, AutoGen, CrewAI, etc.).
Knowledge of LLMs and AI systems such as OpenAI, Anthropic, or similar platforms.
Experience with evaluation frameworks for AI/LLMs (Evals Ops).
Strong programming skills in Python.
Experience with prompt engineering and AI workflow automation.
Knowledge of data analysis and model performance evaluation.
Familiarity with CI/CD pipelines and experimentation workflows.

Preferred Skills

Experience with AI observability tools.
Knowledge of vector databases (Pinecone, Weaviate, FAISS).
Experience working with RAG (Retrieval Augmented Generation) systems.
Understanding of AI safety, evaluation metrics, and benchmarking.

Qualifications

Bachelor’s or Master’s degree in Computer Science, AI, Data Science, or related field.

2+ years of experience in AI/ML development or related roles.

Job Description

Start something great.