Communications on Applied Mathematics and Computation ›› 2021, Vol. 3 ›› Issue (2): 337-356.doi: 10.1007/s42967-020-00084-4

• ORIGINAL PAPER • 上一篇    下一篇

A Non-intrusive Correction Algorithm for Classifcation Problems with Corrupted Data

Jun Hou, Tong Qin, Kailiang Wu, Dongbin Xiu   

  1. Department of Mathematics, The Ohio State University, Columbus, OH 43210, USA
  • 收稿日期:2020-02-10 修回日期:2020-06-02 出版日期:2021-06-20 发布日期:2021-05-26
  • 通讯作者: Dongbin Xiu, Jun Hou, Tong Qin, Kailiang Wu E-mail:xiu.16@osu.edu;hou.345@osu.edu;qin.428@osu.edu;wu.3423@osu.edu

A Non-intrusive Correction Algorithm for Classifcation Problems with Corrupted Data

Jun Hou, Tong Qin, Kailiang Wu, Dongbin Xiu   

  1. Department of Mathematics, The Ohio State University, Columbus, OH 43210, USA
  • Received:2020-02-10 Revised:2020-06-02 Online:2021-06-20 Published:2021-05-26
  • Contact: Dongbin Xiu, Jun Hou, Tong Qin, Kailiang Wu E-mail:xiu.16@osu.edu;hou.345@osu.edu;qin.428@osu.edu;wu.3423@osu.edu

摘要: A novel correction algorithm is proposed for multi-class classifcation problems with corrupted training data. The algorithm is non-intrusive, in the sense that it post-processes a trained classifcation model by adding a correction procedure to the model prediction. The correction procedure can be coupled with any approximators, such as logistic regression, neural networks of various architectures, etc. When the training dataset is sufciently large, we theoretically prove (in the limiting case) and numerically show that the corrected models deliver correct classifcation results as if there is no corruption in the training data. For datasets of fnite size, the corrected models produce signifcantly better recovery results, compared to the models without the correction algorithm. All of the theoretical fndings in the paper are verifed by our numerical examples.

关键词: Data corruption, Deep neural network, Cross-entropy, Label corruption, Robust loss

Abstract: A novel correction algorithm is proposed for multi-class classifcation problems with corrupted training data. The algorithm is non-intrusive, in the sense that it post-processes a trained classifcation model by adding a correction procedure to the model prediction. The correction procedure can be coupled with any approximators, such as logistic regression, neural networks of various architectures, etc. When the training dataset is sufciently large, we theoretically prove (in the limiting case) and numerically show that the corrected models deliver correct classifcation results as if there is no corruption in the training data. For datasets of fnite size, the corrected models produce signifcantly better recovery results, compared to the models without the correction algorithm. All of the theoretical fndings in the paper are verifed by our numerical examples.

Key words: Data corruption, Deep neural network, Cross-entropy, Label corruption, Robust loss

中图分类号: