This agent evaluates your agent on a number of criteria, from expected output format, toxicity, to did it actually do what the prompt instructed?