The purpose of this paper is to propose methodologies for statistical inference of low
dimensional parameters with high dimensional data.We focus on constructing confidence intervals
for individual coefficients and linear combinations of several of them in a linear regression
model, although our ideas are applicable in a much broader context.The theoretical results that
are presented provide sufficient conditions for the asymptotic normality of the proposed estimators
along with a consistent estimator for their finite dimensional covariance matrices. These
sufficient conditions allow the number of variables to exceed the sample size and the presence
of many small non-zero coefficients. Our methods and theory apply to interval estimation of a
preconceived regression coefficient or contrast as well as simultaneous interval estimation of
many regression coefficients. Moreover, the method proposed turns the regression data into
an approximate Gaussian sequence of point estimators of individual regression coefficients,
which can be used to select variables after proper thresholding. The simulation results that are
presented demonstrate the accuracy of the coverage probability of the confidence intervals
proposed as well as other desirable properties, strongly supporting the theoretical results.
We propose a deep learning-based method for object detection in UAV-borne thermal images that have the capability of observing scenes in both day and night. Compared with visible images, thermal images have lower requirements for illumination conditions, but they typically have blurred edges and low contrast. Using a boundary-aware salient object detection network, we extract the saliency maps of the thermal images to improve the distinguishability. Thermal images are augmented with the corresponding saliency maps through channel replacement and pixel-level weighted fusion methods. Considering the limited computing power of UAV platforms, a lightweight combinational neural network ComNet is used as the core object detection method. The YOLOv3 model trained on the original images is used as a benchmark and compared with the proposed method. In the experiments, we analyze the detection performances of the ComNet models with different image fusion schemes. The experimental results show that the average precisions (APs) for pedestrian and vehicle detection have been improved by 2%~5% compared with the benchmark without saliency map fusion and MobileNetv2. The detection speed is increased by over 50%, while the model size is reduced by 58%. The results demonstrate that the proposed method provides a compromise model, which has application potential in UAV-borne detection tasks.
We demonstrate how path integrals often used in problems of theoretical physics can be adapted to provide a machinery for performing Bayesian inference in function spaces. Such inference comes about naturally in the study of inverse problems of recovering continuous (infinite dimensional) coefficient functions from ordinary or partial differential equations, a problem which is typically ill-posed. Regularization of these problems using L2 function spaces (Tikhonov regularization) is equivalent to Bayesian probabilistic inference, using a Gaussian prior. The Bayesian interpretation of inverse problem regularization is useful since it allows one to quantify and characterize error and degree of precision in the solution of inverse problems, as well as examine assumptions made in solving the problem—namely whether the subjective choice of regularization is compatible with prior knowledge. Using path-integral formalism, Bayesian inference can be explored through various perturbative techniques, such as the semiclassical approximation, which we use in this manuscript. Perturbative path-integral approaches, while offering alternatives to computational approaches like Markov-Chain-Monte-Carlo (MCMC), also provide natural starting points for MCMC methods that can be used to refine approximations. In this manuscript, we illustrate a path-integral formulation for inverse problems and demonstrate it on an inverse problem in membrane biophysics as well as inverse problems in potential theories involving the Poisson equation.
In this paper we introduce a smooth version of local linear regression estimators and address their advantages. The MSE and MISE of the estimators are computed explicitly. It turns out that the local linear regression smoothers have nice sampling properties and high minimax efficiency-they are not only efficient in rates but also nearly efficient in constant factors. In the nonparametric regression context, the asymptotic minimax lower bound is developed via the heuristic of the" hardest onedimensional subproblem" of Donoho and Liu. Connections of the minimax risk with the modulus of continuity are made. The lower bound is also applicable for estimating conditional mean (regression) and conditional quantiles for both fixed and random design regression problems.
Yongkun LiDepartment of Mathematical Sciences, Tsinghua University, Beijing 100084, PR ChinaKun TianDepartment of Mathematical Sciences, Tsinghua University, Beijing 100084, PR ChinaChangchuan YinDepartment of Mathematics, Statistics and Computer Science, The University of Illinois at Chicago, Chicago, IL 60607-7045, USARong Lucy HeDepartment of Biological Sciences, Chicago State University, Chicago, IL 60628, USAStephen S.-T. YauDepartment of Mathematical Sciences, Tsinghua University, Beijing 100084, PR China
Statistics Theory and Methodsmathscidoc:1611.33001
Molecular Phylogenetics and Evolution, 2016, (99), 10, 2016.3
Due to vast sequence divergence among different viral groups, sequence alignment is not directly applicable to genome-wide comparative analysis of viruses. More and more attention has been paid to alignment-free methods for whole genome comparison and phylogenetic tree reconstruction. Among alignment-free methods, the recently proposed ‘‘Natural Vector (NV) representation” has successfully
been used to study the phylogeny of multi-segmented viruses based on a 12-dimensional genome space derived from the nucleotide sequence structure. But the preference of proteomes over genomes for the determination of viral phylogeny was not deeply investigated. As the translated products of genes, proteins directly form the shape of viral structure and are vital for all metabolic pathways. In this study,
using the NV representation of a protein sequence along with the Hausdorff distance suitable to compare point sets, we construct a 60-dimensional protein space to analyze the evolutionary relationships of 4021 viruses by whole-proteomes in the current NCBI Reference Sequence Database (RefSeq). We also take advantage of the previously developed natural graphical representation to recover viral phylogeny. Our results demonstrate that the proposed method is efficient and accurate for classifying viruses. The accuracy rates of our predictions such as for Baltimore II viruses are as high as 95.9% for family labels, 95.7% for subfamily labels and 96.5% for genus labels. Finally, we discover that proteomes lead to better viral classification when reliable protein sequences are abundant. In other cases, the accuracy rates using proteomes are still comparable to that of genomes.