We are looking for an Agentic Development & Evals Ops Engineer to design and develop AI agents and support evaluation operations for AI systems. The role involves building agent-based workflows, monitoring AI performance, and creating evaluation frameworks to ensure the reliability and accuracy of AI models.
Key Responsibilities:
- Design and develop AI agents and agentic workflows using modern AI/LLM frameworks.
- Build and maintain evaluation pipelines (Evals Ops) to test AI agent performance, accuracy, and reliability.
- Implement automated evaluation frameworks for LLM outputs and agent behaviours.
- Collaborate with engineering and product teams to improve AI model quality and performance.
- Develop monitoring systems for AI agents to track metrics such as response quality, latency, and accuracy.
- Create and maintain documentation, evaluation reports, and testing datasets.
- Work on continuous improvement loops by integrating feedback into agent training and system updates.
- Ensure scalability, reliability, and governance of AI systems.
Required Skills
- Experience with Agentic Development frameworks (LangChain, AutoGen, CrewAI, etc.).
- Knowledge of LLMs and AI systems such as OpenAI, Anthropic, or similar platforms.
- Experience with evaluation frameworks for AI/LLMs (Evals Ops).
- Strong programming skills in Python.
- Experience with prompt engineering and AI workflow automation.
- Knowledge of data analysis and model performance evaluation.
- Familiarity with CI/CD pipelines and experimentation workflows.
Preferred Skills
- Experience with AI observability tools.
- Knowledge of vector databases (Pinecone, Weaviate, FAISS).
- Experience working with RAG (Retrieval Augmented Generation) systems.
- Understanding of AI safety, evaluation metrics, and benchmarking.
Qualifications
- Bachelor’s or Master’s degree in Computer Science, AI, Data Science, or related field.
2+ years of experience in AI/ML development or related roles.