A Re-examination of chatbot evaluation metrics