HaRMoNEE at SemEval-2024 Task 6: Tuning-based Approaches to Hallucination Recognition

Published in Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), 2024

This paper presents the Hallucination Recognition Model for New Experiment Evaluation (HaRMoNEE) team’s winning (#1) and #10 submissions for SemEval-2024 Task 6: Shared-task on Hallucinations and Related Observable Overgeneration Mistakes (SHROOM)’s two subtasks. This task challenged its participants to design systems to detect hallucinations in Large Language Model (LLM) output. Team HaRMoNEE proposes two architectures: (1) fine-tuning an off-the-shelf transformer-based model, and (2) prompt tuning large-scale Large Language Models (LLMs). One submission from the fine-tuning approach outperformed all other submissions for the model-aware sub-task; one submission from the prompt-tuning approach is the 10th-best submission on the leaderboard for the model-agnostic subtask. Our systems also include pre-processing to prune out irrelevant data fields, various fine-tuning approaches and multiple prompts, post-processing of each system, and evaluation.

Timothy Obiso, Jingxuan Tu, and James Pustejovsky. 2024. HaRMoNEE at SemEval-2024 Task 6: Tuning-based Approaches to Hallucination Recognition. In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), pages 1322–1331, Mexico City, Mexico. Association for Computational Linguistics.