Lishan YuSchool of Biomedical Informatics, UTHealth, Houston, TX, USA; Yau Mathematical Sciences Center, Tsinghua University, Beijing, China; Beijing Institute Mathematical Sciences and Applications, Beijing, China; The majority of this work was conducted when Lishan Yu conducted her internship at UTHealthHamisu M. SalihuDepartment of Family and Community Medicine, Baylor College of Medicine, Houston, TX, USA; Center of Excellence in Health Equity, Training, and Research, Baylor College of Medicine, Houston, TX, USADeepa DongarwarCenter of Excellence in Health Equity, Training, and Research, Baylor College of Medicine, Houston, TX, USALuyao ChenSchool of Biomedical Informatics, UTHealth, Houston, TX, USAXiaoqian JiangSchool of Biomedical Informatics, UTHealth, Houston, TX, USA
Journal of Biomedical Informatics, 125, (103974), 2022.1
In this paper, we developed a feasible and efficient deep-learning-based framework to combine the United States (US) natality data for the last five decades, with changing variables and factors, into a consistent database. We constructed a graph based on the property and elements of databases, including variables, and conducted a graph convolutional network (GCN) to learn the embeddings of variables on the constructed graph, where the learned embeddings implied the similarity of variables. Specifically, we devised a loss function with a slack margin and a banlist mechanism (for a random walk) to learn the desired structure (two nodes sharing more information were more similar to each other.), and developed an active learning mechanism to conduct the harmonization. Toward a total of 9,321 variables from 49 databases (i.e., 783 stemmed variables, from 1970 to 2018), we applied our model iteratively together with human reviews for four rounds, then obtained 323 hyperchains of variables. During the harmonization, the first round of our model achieved recall and precision of 87.56%, 57.70%, respectively. Our harmonized graph neural network (HGNN) method provides a feasible and efficient way to connect relevant databases at a meta-level. Adapting to the database's property and characteristics, HGNN can learn patterns globally, which is powerful to discover the similarity between variables among databases. Our proposed method provides an effective way to reduce the manual effort in database harmonization and integration of fragmented data into useful databases for future research.
Guillaume BalDepartment of Applied Physics and Applied Mathematics, Columbia University, S.W. Mudd Building Room 206, 500 W. 120th Street, New York, NY 10027, USAWenjia JingDepartment of Applied Physics and Applied Mathematics, Columbia University, S.W. Mudd Building Room 206, 500 W. 120th Street, New York, NY 10027, USA
Journal of Quantitative Spectroscopy and Radiative Transfer, 112, (4), 660-670, 2011.3
We consider the effect of small scale random fluctuations of the constitutive coefficients on boundary measurements of solutions to radiative transfer equations. As the correlation length of the random oscillations tends to zero, the transport solution is well approximated by a deterministic, averaged, solution. In this paper, we analyze the random fluctuations to the averaged solution, which may be interpreted as a central limit correction to homogenization.
With the inverse transport problem in mind, we characterize the random structure of the singular components of the transport measurement operator. In regimes of moderate scattering, such components provide stable reconstructions of the constitutive parameters in the transport equation. We show that the random fluctuations strongly depend on the decorrelation properties of the random medium.
Pathric Ha¨gglundSwedish National Audit Office and SOFI, Stockholm, SwedenPer JohanssonUppsala University, Sweden; Institute for Evaluation of Labour Market and Education Policy (IFAU), Uppsala, Sweden; Tsinghua Univiversity, Beijing, China Lisa LaunInstitute for Evaluation of Labour Market and Education Policy (IFAU), Uppsala, Sweden
This article analyses the effect of cognitive behavioral therapy (CBT) for individuals with mild or moderate mental illness. We study the effects on sick leave, health care consumption, and drug prescriptions. We find that CBT improved health and prevented sick leave for individuals who were not on sick leave when treatment was initiated but had no effect for individuals who were on sick leave when the treatment was initiated.
Rerandomization is a strategy for improving balance on observed covariates in randomized controlled trials. It has been both advocated and advised against by renowned scholars of experimental design. However, the relationship and differences between stratification, rerandomization, and the combination of the two have not been previously investigated. In this paper, we show that stratified designs can be recreated by rerandomization and explain why, in most cases, stratification on binary covariates followed by rerandomization on continuous covariates is more efficient than rerandomization on all covariates at the same time.
Per JohanssonStatistics, Uppsala University, Uppsala, Sweden; YMSC, Tsinghua University, Beijing, ChinaPaulina JonéusDepartment of Statistics, Uppsala University, Uppsala, SwedenSophie LangenskiöldDepartment of Medical Sciences, Uppsala University, Uppsala, Sweden
Introduction. This paper presents a study protocol for a comparative effectiveness evaluation of abiraterone acetate against enzalutamide in clinical practice, two cancer drugs given to patients suffering from advanced prostate cancer.
Method and analysis. The protocol designs a comparative-effectiveness analysis of abiraterone acetate against enzalutamide. With the substantial number of covariates a two-step procedure is suggested in choosing relevant covariates in the matching design. In the first step, an exploratory factor analysis reduces the dimension of a large set of continuous covariates to nine factors. In the second step, we reduce the dimension of the covariates, interactions and second order terms for the continuous covariates using propensity score estimation. The final design makes use of a genetic matching algorithm. The study protocol provides a detailed statistical analysis plan of the analysis sample derived from the matching design. The analysis will make use of linear regression and robust inference adjusted for multisignificance testing.
Discussion. As in a randomised experiment the focus is on the design of the assignment to treatment. This allows the publication of this preanalysis plan before having access to outcome data. This means that the p values will be correct if the maintained assumption of uncounfoundedness is valid. Given that is p-hacking is substantial problem in empirical research, this is a substantial strength of this study. However, while design yields, balance on the observed covariates one cannot discard the possibility that unobserved confounders are not balanced. For that reason, sensitivity tests for the maintained assumption of uncounfoundedness are presented.
Ethics and dissemination. The study was approved by the Regional Ethical Review Board in Uppsala, Sweden (Dnr 2017/482). Results will be published in a peer-reviewed journal and distributed to relevant stakeholders in healthcare.