Robust Document Distance with Wasserstein-Fisher-Rao Metric

Zihao Wang BNRist, Department of Computer Science and Technology, RIIT, Institute of Internet Industry, Tsinghua University Datong Zhou Department of Mathematical Sciences, Tsinghua University Ming Yang Department of Computer Science and Technology, Tsinghua University Yong Zhang BNRist, Department of Computer Science and Technology, RIIT, Institute of Internet Industry, Tsinghua University Chenglong Bao Yau Mathematical Sciences Center, Tsinghua University Hao Wu Department of Mathematical Sciences, Tsinghua University

Machine Learning mathscidoc:2206.43017

ACML, 2020.11
Computing the distance among linguistic objects is an essential problem in natural language processing. The word mover’s distance (WMD) has been successfully applied to measure the document distance by synthesizing the low-level word similarity with the framework of optimal transport (OT). However, due to the global transportation nature of OT, the WMD may overestimate the semantic dissimilarity when documents contain unequal semantic details. In this paper, we propose to address this overestimation issue with a novel Wasserstein-Fisher-Rao (WFR) document distance grounded on unbalanced optimal transport theory. Compared to the WMD, the WFR document distance provides a trade-off between global transportation and local truncation, which leads to a better similarity measure for unequal semantic details. Moreover, an efficient prune strategy is particularly designed for the WFR document distance to facilitate the top-k queries among a large number of documents. Extensive experimental results show that the WFR document distance achieves higher accuracy that WMD and even its supervised variation s-WMD.
No keywords uploaded!
[ Download ] [ 2022-06-16 15:19:09 uploaded by Baocl ] [ 586 downloads ] [ 0 comments ]
@inproceedings{zihao2020robust,
  title={Robust Document Distance with Wasserstein-Fisher-Rao Metric},
  author={Zihao Wang, Datong Zhou, Ming Yang, Yong Zhang, Chenglong Bao, and Hao Wu},
  url={http://archive.ymsc.tsinghua.edu.cn/pacm_paperurl/20220616151909232488376},
  booktitle={ACML},
  year={2020},
}
Zihao Wang, Datong Zhou, Ming Yang, Yong Zhang, Chenglong Bao, and Hao Wu. Robust Document Distance with Wasserstein-Fisher-Rao Metric. 2020. In ACML. http://archive.ymsc.tsinghua.edu.cn/pacm_paperurl/20220616151909232488376.
Please log in for comment!
 
 
Contact us: office-iccm@tsinghua.edu.cn | Copyright Reserved