Existing domain adaptation methods aim at learning features that can be generalized among domains. These methods commonly require to update source classifier to adapt to the target domain and do not properly handle the trade-off between the source domain and the target domain. In this work, instead of training a classifier to adapt to the target domain, we use a separable component called data calibrator to help the fixed source classifier recover discrimination power in the target domain,
while preserving the source domain’s performance. When the difference between two domains is small, the source
classifier’s representation is sufficient to perform well in the target domain and outperforms GAN-based methods in
digits. Otherwise, the proposed method can leverage synthetic images generated by GANs to boost performance and achieve state-of-the-art performance in digits datasets and driving scene semantic segmentation. Our method also empirically suggests the potential connection between domain adaptation and adversarial attacks.
Code release is available at https://github.com/yeshaokai/Calibrator-Domain-Adaptation
Linfeng ZhangTsinghua University; Institute for interdisciplinary Information Core TechnologyMuzhou YuInstitute for interdisciplinary Information Core Technology; Xi’an Jiaotong UniversityTong ChenTsinghua UniversityZuoqiang ShiTsinghua UniversityChenglong BaoTsinghua UniversityKaisheng MaTsinghua University
Training process is crucial for the deployment of the network in applications which have two strict requirements on both accuracy and robustness. However, most existing approaches are in a dilemma, i.e. model accuracy and robustness forming an embarrassing tradeoff – the improvement of one leads to the drop of the other. The challenge remains for as we try to improve the accuracy and robustness simultaneously. In this paper, we propose a novel training method via introducing the auxiliary classifiers for training on corrupted samples, while the clean samples are normally trained with the primary classifier. In the training stage, a novel distillation method named input-aware self distillation is proposed to facilitate the primary classifier to learn the robust information from auxiliary classifiers. Along with it, a new normalization method - selective batch normal-
ization is proposed to prevent the model from the negative influence of corrupted images. At the end of the training period, a L2-norm penalty is applied to the weights of primary and auxiliary classifiers such that their weights are asymptotically identical. In the stage of inference, only the primary classifier is used and thus no extra computation and storage are needed. Extensive experiments on CIFAR10, CIFAR100 and ImageNet show that noticeable improvements on both accuracy and robustness can be observed by the proposed auxiliary training. On average, auxiliary training achieves 2.21% accuracy and 21.64% robustness
(measured by corruption error) improvements over traditional training methods on CIFAR100. Codes have been released on github.
Zonghan YangInstitute for Artificial Intelligence, Beijing National Research Center for Information Science and Technology, Department of Computer Science and Technology, Tsinghua UniversityYang LiuInstitute for Artificial Intelligence, Beijing National Research Center for Information Science and Technology, Department of Computer Science and Technology, Tsinghua UniversityChenglong BaoYau Mathematical Sciences Center, Tsinghua UniversityZuoqiang ShiDepartment of Mathematical Sciences, Tsinghua University
Although ordinary differential equations (ODEs) provide insights for designing network architectures, its relationship with the non-residual convolutional neural networks (CNNs) is still unclear. In this paper, we present a novel ODE model by adding a damping term. It can be shown that the proposed model can recover both a ResNet and a CNN by adjusting an interpolation coefficient. Therefore, the damped ODE model provides a unified framework for the interpretation of residual and non-residual networks. The Lyapunov analysis reveals better stability of the proposed model, and thus yields robustness improvement of the learned networks. Experiments on a number of image classification benchmarks show that the proposed model substantially improves the accuracy of ResNet and ResNeXt over the perturbed inputs from both stochastic noise and adversarial attack methods. Moreover, the loss landscape analysis demonstrates the improved robustness of our method along the attack direction.
Jinxiu LiangSouth China University of Technology, Guangzhou 510006, ChinaYong XuSouth China University of Technology, Guangzhou 510006, China; Peng Cheng Laboratory, Shenzhen 510852, ChinaChenglong BaoTsinghua University, Beijing 100084, ChinaYuhui QuanSouth China University of Technology, Guangzhou 510006, ChinaHui JiNational University of Singapore, Singapore 117543, Singapore
Learning rate is arguably the most important hyper-parameter to tune when training a neural network. As manually setting right learning rate remains a cumbersome process, adaptive learning rate algorithms aim at automating such a process. Motivated by the success of the Barzilai–Borwein (BB) step-size method in many gradient descent methods for solving convex problems, this paper aims at investigating the potential of the BB method for training neural networks. With strong motivation from related convergence analysis, the BB method is generalized to adaptive learning rate of mini-batch gradient descent. The experiments showed that, in contrast to many existing methods, the proposed BB method is highly insensitive to initial learning rate, especially in terms of generalization performance. Also, the BB method showed its advantages on both learning speed and generalization performance over other available methods.
Tangjun Wangepartment of Mathematical Sciences, Tsinghua University, Beijing 100084, ChinaZehao DouDepartment of Statistics and Data Science, Yale UniversityChenglong BaoYau Mathematical Sciences Center, Tsinghua University, Beijing 100084, China, and also with Yanqi Lake Beijing Institute of Mathematical Sciences and Applications, Beijing 101408, ChinZuoqiang Shiau Mathematical Sciences Center, Tsinghua University, Beijing 100084, China, and also with Yanqi Lake Beijing Institute of Mathematical Sciences and Applications, Beijing 101408, China
Diffusion, a fundamental internal mechanism emerging in many physical processes, describes the interaction among different objects. In many learning tasks with limited training samples, the diffusion connects the labeled and unlabeled data points and is a critical component for achieving high classification accuracy. Many existing deep learning approaches directly impose the fusion loss when training neural networks. In this work, inspired by the convection-diffusion ordinary differential equations (ODEs), we propose a novel diffusion residual network (Diff-ResNet), internally introduces diffusion into the architectures of neural networks. Under the structured data assumption, it is proved that the proposed diffusion block can increase the distance-diameter ratio that improves the separability of inter-class points and reduces the distance among local intra-class points. Moreover, this property can be easily adopted by the residual networks for constructing the separable hyperplanes. Extensive experiments of synthetic binary classification, semi-supervised graph node classification and few-shot image classification in various datasets validate the effectiveness of the proposed method.