Do you want to **read the rest** of this article?

# Nonparametric Modeling of Longitudinal Covariance Structure in Functional Mapping of Quantitative Trait Loci

**Article**

*in*Biometrics 65(4):1068-77 · March 2009

*with*27 Reads

**How we measure 'reads'**

A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more

DOI: 10.1111/j.1541-0420.2009.01222.x · Source: PubMed

Cite this publicationAbstract

Estimation of the covariance structure of longitudinal processes is a fundamental prerequisite for the practical deployment of functional mapping designed to study the genetic regulation and network of quantitative variation in dynamic complex traits. We present a nonparametric approach for estimating the covariance structure of a quantitative trait measured repeatedly at a series of time points. Specifically, we adopt Huang et al.'s (2006, Biometrika 93, 85-98) approach of invoking the modified Cholesky decomposition and converting the problem into modeling a sequence of regressions of responses. A regularized covariance estimator is obtained using a normal penalized likelihood with an L(2) penalty. This approach, embedded within a mixture likelihood framework, leads to enhanced accuracy, precision, and flexibility of functional mapping while preserving its biological relevance. Simulation studies are performed to reveal the statistical properties and advantages of the proposed method. A real example from a mouse genome project is analyzed to illustrate the utilization of the methodology. The new method will provide a useful tool for genome-wide scanning for the existence and distribution of quantitative trait loci underlying a dynamic trait important to agriculture, biology, and health sciences.

- ... The columns labeled " Yap (nonpara) " and " Yap (autoreg) " are derived from Tables 1 and 2 of Yap et al. (2009). They refer to the results of using the likeihood-based methods of Yap et al. (2009) with an estimated regularized covariance and autocorrelated covariance, respectively. The columns labeled " EE (Wald) " and " EE (residual) " refer to our estimating equations approach with the Wald statistic and the residual error statistic, respectively. ...... . (2009). They refer to the results of using the likeihood-based methods of Yap et al. (2009) with an estimated regularized covariance and autocorrelated covariance, respectively. The columns labeled " EE (Wald) " and " EE (residual) " refer to our estimating equations approach with the Wald statistic and the residual error statistic, respectively. Yap et al. (2009) performed 100 simulation replicates, whereas we used 10,000 replicates, which gave stable estimates. For each method we report the mean position of the genome scan maximum ( " mean " ) over simulation replicates, the standard deviation ( " SD " ), and the root mean squared error ( " rmse " ). Note that Yap et al. (2009) reported standar ...... For each method we report the mean position of the genome scan maximum ( " mean " ) over simulation replicates, the standard deviation ( " SD " ), and the root mean squared error ( " rmse " ). Note that Yap et al. (2009) reported standard error, which we converted to standard deviation as the latter is independent of the number of simulation replicates. The simulations were performed with three error structures corresponding to an autocorrelated covariance (S 1 ), an equicorrelated covariance (S 2 ), and an unstructured covariance (S 3 ). ...In genetic studies, many interesting traits, including growth curves and skeletal shape, have temporal or spatial structure. They are better treated as curves or function-valued traits. Identification of genetic loci contributing to such traits is facilitated by specialized methods that explicitly address the function-valued nature of the data. Current methods for mapping function-valued traits are mostly likelihood-based, requiring specification of the distribution and error structure. However, such specification is difficult or impractical in many scenarios. We propose a general functional regression approach based on estimating equations that is robust to misspecification of the covariance structure. Estimation is based on a two-step least-squares algorithm, which is fast and applicable even when the number of time points exceeds the number of samples. It is also flexible due to a general linear functional model; changing the number of covariates does not necessitate a new set of formulas and programs. In addition, many meaningful extensions are straightforward. For example, we can accommodate incomplete genotype data, and the algorithm can be trivially parallelized. The framework is an attractive alternative to likelihood-based methods when the covariance structure of the data is not known. It provides a good compromise between model simplicity, statistical efficiency, and computational speed. We illustrate our method and its advantages using circadian mouse behavioral data.
- ... Growth Equations (8-11) establish the biological foundation of functional mapping by which one can estimate the temporal pattern of biomass growth at different stages of plant development. There is also a statistical strength of functional mapping which lies in the parsimonious and flexible modeling of the covariance structure R [32,33]. ...... The SAD model of different orders has been used in func- tional mapping [35]. In Yap et al. [32,33], several general approaches are discussed for modeling the covariance by parametric or non- parametric approaches. ...... Since this study aims to provide a new idea for genetic map- ping during the entire ontogeny, we assumed a simple AR(1) structure (specified two parameters, variance s 2 and correlation r) for the covariance matrix. More complex patterns of the covariance matrix with structure or without structure have been studied in the previous literature [32,33,35] which can be integrated into the current model with no technical difficulty. To facilitate computing, we assume no correlation between different stages. ...ArticleFull-text available
- Dec 2011

All organisms face the problem of how to perform a sequence of developmental changes and transitions during ontogeny. We revise functional mapping, a statistical model originally derived to map genes that determine developmental dynamics, to take into account the entire process of ontogenetic growth from embryo to adult and from the vegetative to reproductive phase. The revised model provides a framework that reconciles the genetic architecture of development at different stages and elucidates a comprehensive picture of the genetic control mechanisms of growth that change gradually from a simple to a more complex level. We use an annual flowering plant, as an example, to demonstrate our model by which to map genes and their interactions involved in embryo and postembryonic growth. The model provides a useful tool to study the genetic control of ontogenetic growth in flowering plants and any other organisms through proper modifications based on their biological characteristics. - ... To compare approaches in the context of a single QTL, we considered the simulation setting described in Yap et al. (2009), though exploring a range of QTL effects. We simulated an intercross with sample sizes of 100, 200, or 400, and a single chromosome of length 100 cM, with six equally spaced markers and with a QTL at 32 cM. ...... ¼ 0:5, or (3) an " unstructured " covariance matrix, as given in Yap et al. (2009) (also shown in Table S2of Kwak et al. 2014). The parameter c was given a range of values, which define the percent phenotypic variance explained by the QTL (the heritability). ...... Many other methods have been developed for QTL mapping with function-valued traits. However, most focus on single-QTL models (e.g., Ma et al. 2002; Yang et al. 2009; Yap et al. 2009). Bayesian methods for multiple-QTL mapping with function-valued traits have been proposed (Min et al. 2011; Sillanpää et al. 2012; Li and Sillanpää 2013), but these methods are computationally intensive, and software is not available. ...ArticleFull-text available
- Nov 2015

We previously proposed a simple regression-based method to map quantitative trait loci underlying function-valued phenotypes. In order to better handle the case of noisy phenotype measurements and accommodate the correlation structure among time points, we propose an alternative approach that maintains much of the simplicity and speed of the regression-based method. We overcome noisy measurements by replacing the observed data with a smooth approximation. We then apply functional principal component analysis, replacing the smoothed phenotype data with a small number of principal components. Quantitative trait locus mapping is applied to these dimension-reduced data, either with a multi-trait method or by considering the traits individually and then taking the average or maximum LOD score across traits. We apply these approaches to root gravitropism data on Arabidopsis recombinant inbred lines and further investigate their performance in computer simulations. Our methods have been implemented in the R package, funqtl. - ... The internal correlation of traits measured at different longitudinal points are described using the covariance matrix, which is essential in the likelihood calculation. We included a comprehensive set of 13 covariance matrices, which include those employed by SPSS software and first order ante-dependence (Yap, Fan & Wu, 2009). Funmap2 also implements an automated way to select a covariance matrix using the AIC or MLE method. ...... The longitudinal traits tend to correlate strongly between time points (time-dependent) or reaction norms (environment-dependent). Functional mapping models this internal relation using a covariance matrix, which may increase the statistical power for QTL detection (Ma, Casella & Wu, 2002). Whereas previous publications on functional mapping recommended the use of the most parsimonious covariance matrix (Yap, Fan & Wu, 2009), such as autoregressive, ante-dependence, or autoregressive moving average (Li et al., 2010a), Funmap2 also provides other covariance matrices implemented in IBM SPSS software, such as Compound Symmetry, Factor Analytic, Huynh-Feldt, and Toeplitz. Although a parsimonious covariance matrix can be efficient computationally, nonparsimonious covariance structures contain more parameters and, hence, richer structures, which may potentially lead to better data fitting while minimizing the pitfall of overfitting when guided by information criteria (Zimmerman et al., 2001). ...ArticleFull-text available
- May 2019

Quantitative trait locus (QTL) mapping has been used as a powerful tool for inferring the complexity of the genetic architecture that underlies phenotypic traits. This approach has shown its unique power to map the developmental genetic architecture of complex traits by implementing longitudinal data analysis. Here, we introduce the R package Funmap2 based on the functional mapping framework, which integrates prior biological knowledge into the statistical model. Specifically, the functional mapping framework is engineered to include longitudinal curves that describe the genetic effects and the covariance matrix of the trait of interest. Funmap2 chooses the type of longitudinal curve and covariance matrix automatically using information criteria. Funmap2 is available for download at https://github.com/wzhy2000/Funmap2 . - ... Simulations In order to investigate the performance of our proposed approaches and compare them to existing methods, we performed several computer simulation studies. While numerous methods for QTL mapping with function-valued traits have been described, we were unsuccessful, despite considerable effort, to employ the software for Yang et al. (2009), Yap et al. (2009), Min et al. (2011), or Sillanpää et al. (2012) ...... Methods for the genetic analysis of function-valued phenotypes have mostly focused on single-QTL models (Ma et al. 2002; Yang et al. 2009; Yap et al. 2009; Xiong et al. 2011). ...Most statistical methods for QTL mapping focus on a single phenotype. However, multiple phenotypes are commonly measured, and recent technological advances have greatly simplified the automated acquisition of numerous phenotypes, including function-valued phenotypes, such as growth measured over time. While there exist methods for QTL mapping with function-valued phenotypes, they are generally computationally intensive and focus on single-QTL models. We propose two simple, fast methods that maintain high power and precision and are amenable to extensions with multiple-QTL models using a penalized likelihood approach. After identifying multiple QTL by these approaches, we can view the function-valued QTL effects to provide a deeper understanding of the underlying processes. Our methods have been implemented as a package for R, funqtl.
- ... This unbalanced genetic relatedness requires careful statistical modeling to avoid a large number of false-positive findings. The functional mapping idea is not new in statistical genetics community (Ma et al. 2002; Wu et al. 2002 Wu et al. , 2004; Lin and Wu 2006; Yang et al. 2009). However, this article is the first one that develops the functional mapping method for the RIX data and specifically models the unique genetic structure of RIX samples. ...... One advantage of using B-splines is that the smoother matrix {B k (t i )} is independent of the responses. Unlike other nonparametric approaches, how to determine the smoothness is still an open question, although the choice of the number of knots is generally not critical (Yang et al. 2009 ). Our simulation results (for example, Figures 1 and 2) show that the estimated functional effects are not very sensitive to the choices of d and n j. ...Article
- Feb 2012
- GENETICS

There has been a great deal of interest in the development of methodologies to map quantitative trait loci (QTL) using experimental crosses in the last 2 decades. Experimental crosses in animal and plant sciences provide important data sources for mapping QTL through linkage analysis. The Collaborative Cross (CC) is a renewable mouse resource that is generated from eight genetically diverse founder strains to mimic the genetic diversity in humans. The recombinant inbred intercrosses (RIX) generated from CC recombinant inbred (RI) lines share similar genetic structures of F(2) individuals but with up to eight alleles segregating at any one locus. In contrast to F(2) mice, genotypes of RIX can be inferred from the genotypes of their RI parents and can be produced repeatedly. Also, RIX mice typically do not share the same degree of relatedness. This unbalanced genetic relatedness requires careful statistical modeling to avoid false-positive findings. Many quantitative traits are inherently complex with genetic effects varying with other covariates, such as age. For such complex traits, if phenotype data can be collected over a wide range of ages across study subjects, their dynamic genetic patterns can be investigated. Parametric functions, such as sigmoidal or logistic functions, have been used for such purpose. In this article, we propose a flexible nonparametric time-varying coefficient QTL mapping method for RIX data. Our method allows the QTL effects to evolve with time and naturally extends classical parametric QTL mapping methods. We model the varying genetic effects nonparametrically with the B-spline bases. Our model investigates gene-by-time interactions for RIX data in a very flexible nonparametric fashion. Simulation results indicate that the varying coefficient QTL mapping has higher power and mapping precision compared to parametric models when the assumption of constant genetic effects fails. We also apply a modified permutation procedure to control overall significance level. - ... We used a cost-effective method to model the longitudinal covariance matrix by a particular parameter set. To date, several methods, such as autoregressive, antedependence, and nonparametric methods, have been used to describe this matrix ( Ma et al., 2002;Zhao et al., 2005;Yap et al., 2009). Of these, auto regression may be the most parsimonious as it uses fewer parameters to capture the complex structure of a matrix and exemplifies the statistical power of QTL mapping. ...ArticleFull-text available
- Jun 2019

Covariation between organ growth and biomass accumulation plays an important role in plants. Plant to capture optimal fitness in nature, which depend coordinate and interact for distinct organs such as leaves, stems, and roots. Although many studies have focused on plant growth or biomass allocation, detailed information on the genetic mechanism of coordinated variation is lacking. Here, we expand a new mapping model based on functional mapping to detect covariation quantitative trait loci (QTLs) that govern development of plant organs and whole biomass, which, via a series of hypothesis tests, allows quantification of how QTLs regulate covariation between organ growth and biomass accumulation. The model was implemented to analyze leaf number data and the whole dry weight of recombinant inbred lines (RILs) of Arabidopsis. Two key QTLs related to growth and biomass allocation that reside within biologically meaningful genes, CRA1 and HIPP25, are characterized. These two genes may control covariation between two traits. The new model will enable the elucidation of the genetic architecture underlying growth and biomass accumulation, which may enhance our understanding of fitness development in plants. - ... Next, we need to model the covariance structure by using a parsimonious and flexible approach such as an autoregressive, antedependence, autoregressive moving average, or nonparametric and semiparametric approaches. Yap et al. [49] provided a discussion of how to choose a general approach for covariance structure modeling. In likelihood (1), the conditional probabilities of functional genotypes given marker genotypes can be expressed as a function of recombination fractions for an experimental cross population or linkage disequilibria for a natural population [48,50]. ...Mathematical models of viral dynamics in vivo provide incredible insights into the mechanisms for the nonlinear interaction between virus and host cell populations, the dynamics of viral drug resistance, and the way to eliminate virus infection from individual patients by drug treatment. The integration of these mathematical models with high-throughput genetic and genomic data within a statistical framework will raise a hope for effective treatment of infections with HIV virus through developing potent antiviral drugs based on individual patients' genetic makeup. In this opinion article, we will show a conceptual model for mapping and dictating a comprehensive picture of genetic control mechanisms for viral dynamics through incorporating a group of differential equations that quantify the emergent properties of a system.
- ... Therefore, even though the computation with AR(1) covariance structure is more expensive due to the presence of nonconjugacy in the posterior, it might be a more suitable choice especially when the heritabilities of the dynamic traits under study are low. Other more complicated covariance structures such as some nonstationary parametric structures (Liu and Wu 2009) or nonparametric structures (Yap et al. 2009) can be possibly incorporated if needed, but they require the development of more specific algorithms for the computation of those newly involved parameters. Furthermore, it is necessary to point out that due to the approximative nature of the VB algorithm, the uncertainty estimates for markers may still be underestimated even by using an appropriate residual covariance structure or, by other means, the estimated Wald statistic might be upward biased. ...In biology, many quantitative traits are dynamic in nature. They can often be described by some smooth functions or curves. A joint analysis of all the repeated measurements of the dynamic traits by functional quantitative trait loci (QTL) mapping methods has the benefits to (1) understand the genetic control of the whole dynamic process of the quantitative traits, and (2) improve the statistical power to detect QTLs. One crucial issue in functional QTL mapping is how to correctly describe the smoothness of trajectories of functional valued traits. We develop an efficient Bayesian non-parametric multiple-loci procedure for mapping dynamic traits. The method uses the Bayesian P-splines with (non-parametric) B-spline bases to specify the functional form of a QTL trajectory, and a random walk prior to automatically determine its degree of smoothness. An efficient deterministic variational Bayes algorithm is used to implement both (1) search an optimal subset of QTLs among large marker panels, and (2) estimate the genetic effects of the selected QTLs changing over time. Our method can be fast even on some large scale data sets. The advantages of our method are illustrated on both simulated and real data sets.
- ... There have been a body of literature on the mathematical resolution of differential equations[59,60], which can be incorporated into systems mapping to confront various complexities. Next, we need to model the covariance structure by using a parsimonious and flexible approach, such as autoregressive, antedependence, autoregressive moving average or nonparametric and semiparametric approaches[61]. Zimmerman and NunezAnton[62]discussed the choice of an optimal approach for covariance structuring based on several model selection criteria. The first-order structured antedependence[24]was shown to be powerful for modeling the longitudinal covariance structure between multiple variables using a few number of dependence parameters. ...ArticleFull-text available
- Feb 2013

The recent availability of high-throughput genetic and genomic data allows the genetic architecture of complex traits to be systematically mapped. The application of these genetic results to design and breed new crop types can be made possible through systems mapping. Systems mapping is a computational model that dissects a complex phenotype into its underlying components, coordinates different components in terms of biological laws through mathematical equations and maps specific genes that mediate each component and its connection with other components. Here, we present a new direction of systems mapping by integrating this tool with carbon economy. With an optimal spatial distribution of carbon fluxes between sources and sinks, plants tend to maximize whole-plant growth and competitive ability under limited availability of resources. We argue that such an economical strategy for plant growth and development, once integrated with systems mapping, will not only provide mechanistic insights into plant biology, but also help to spark a renaissance of interest in ideotype breeding in crops and trees. - ... Mathematical solution for delay differential equations has been discussed and used to map clock genes for a biological system ( Fu et al., 2011). For longitudinal data, we can use structural approaches to model the covariance matrix for longitudinal traits (Zimmerman and Nunez-Anton, 2001;Zhao et al., 2005;Yap et al., 2009). These approaches include (1) parametric stationary, (2) parametric non-stationary, (3) non-parametric, and (4) semiparametric models. ...ArticleFull-text available
- May 2012

The growing evidence that cancer originates from stem cells (SC) holds a great promise to eliminate this disease by designing specific drug therapies for removing cancer SC. Translation of this knowledge into predictive tests for the clinic is hampered due to the lack of methods to discriminate cancer SC from non-cancer SC. Here, we address this issue by describing a conceptual strategy for identifying the genetic origins of cancer SC. The strategy incorporates a high-dimensional group of differential equations that characterizes the proliferation, differentiation, and reprogramming of cancer SC in a dynamic cellular and molecular system. The deployment of robust mathematical models will help uncover and explain many still unknown aspects of cell behavior, tissue function, and network organization related to the formation and division of cancer SC. The statistical method developed allows biologically meaningful hypotheses about the genetic control mechanisms of carcinogenesis and metastasis to be tested in a quantitative manner. - ... The Runge–Kutta fourth order algorithm with step size h ¼ 0.1 is used to approximate the solution in high accuracy given a trial set of parameter values and initial conditions. Next, we need to model the covariance structure by using a parsimonious and flexible approach such as an autoregressive, antedependence, autoregressive moving average or nonparametric and semiparametric approaches[74]. In likelihood (B1), the conditional probabilities of QTL genotypes given marker genotypes can be expressed as a function of recombination fractions for an experimental cross population or linkage disequilibria for a natural population[58]. The estimation of the recombination fractions or linkage disequilibria can be implemented with the EM algorithm. ...ArticleFull-text available
- Aug 2012

The formation of phenotypic traits, such as biomass production, tumor volume and viral abundance, undergoes a complex process in which interactions between genes and developmental stimuli take place at each level of biological organization from cells to organisms. Traditional studies emphasize the impact of genes by directly linking DNA-based markers with static phenotypic values. Functional mapping, derived to detect genes that control developmental processes using growth equations, has proven powerful for addressing questions about the roles of genes in development. By treating phenotypic formation as a cohesive system using differential equations, a different approach-systems mapping-dissects the system into interconnected elements and then map genes that determine a web of interactions among these elements, facilitating our understanding of the genetic machineries for phenotypic development. Here, we argue that genetic mapping can play a more important role in studying the genotype-phenotype relationship by filling the gaps in the biochemical and regulatory process from DNA to end-point phenotype. We describe a new framework, named network mapping, to study the genetic architecture of complex traits by integrating the regulatory networks that cause a high-order phenotype. Network mapping makes use of a system of differential equations to quantify the rule by which transcriptional, proteomic and metabolomic components interact with each other to organize into a functional whole. The synthesis of functional mapping, systems mapping and network mapping provides a novel avenue to decipher a comprehensive picture of the genetic landscape of complex phenotypes that underlie economically and biomedically important traits. - ... Since the publication of the pioneering work by Laird and Ware [10], random effects model have been extensively used for longitudinal data analysis [11]. All these statistical approaches have been incorporated into functional mapping [12, 13], aiming to provide the most parsimonious estimates of QTL effects for a given data set. A Bayesian algorithm for functional mapping has been proposed recently by Liu and Wu [14]. ...The most powerful and comprehensive approach of study in modern biology is to understand the whole process of development and all events of importance to development which occur in the process. As a consequence, joint modeling of developmental processes and events has become one of the most demanding tasks in statistical research. Here, we propose a joint modeling framework for functional mapping of specific quantitative trait loci (QTLs) which controls developmental processes and the timing of development and their causal correlation over time. The joint model contains two submodels, one for a developmental process, known as a longitudinal trait, and the other for a developmental event, known as the time to event, which are connected through a QTL mapping framework. A nonparametric approach is used to model the mean and covariance function of the longitudinal trait while the traditional Cox proportional hazard (PH) model is used to model the event time. The joint model is applied to map QTLs that control whole-plant vegetative biomass growth and time to first flower in soybeans. Results show that this model should be broadly useful for detecting genes controlling physiological and pathological processes and other events of interest in biomedicine.
- ... Much statistical analysis of such high-dimensional data involves the estimation of covariance matrix or its inverse (precision matrix). Examples include portfolio management and risk assessment (Fan, Fan and Lv, 2008 ), high-dimensional classification such as Fisher discriminant (Hastie, Tibshirani and Friedman, 2009), graphic models (Meinshausen and Bühlmann, 2006), statistical inference such as controlling false discoveries in multiple testing (Leek and Storey, 2008; Efron, 2010), finding quantitative trait loci based on longitudinal data (Yap, Fan, and Wu, 2009; Xiong, et al., 2011), and testing the capital asset pricing model (Sentana, 2009), among others. See Section 4 for some of those applications. ...Article
- Sep 2013
- J R STAT SOC B

This paper deals with the estimation of a high-dimensional covariance with a conditional sparsity structure and fast-diverging eigenvalues. By assuming sparse error covariance matrix in an approximate factor model, we allow for the presence of some cross-sectional correlation even after taking out common but unobservable factors. We introduce the Principal Orthogonal complEment Thresholding (POET) method to explore such an approximate factor structure with sparsity. The POET estimator includes the sample covariance matrix, the factor-based covariance matrix (Fan, Fan, and Lv, 2008), the thresholding estimator (Bickel and Levina, 2008) and the adaptive thresholding estimator (Cai and Liu, 2011) as specific examples. We provide mathematical insights when the factor analysis is approximately the same as the principal component analysis for high-dimensional data. The rates of convergence of the sparse residual covariance matrix and the conditional sparse covariance matrix are studied under various norms. It is shown that the impact of estimating the unknown factors vanishes as the dimensionality increases. The uniform rates of convergence for the unobserved factors and their factor loadings are derived. The asymptotic results are also verified by extensive simulation studies. Finally, a real data application on portfolio allocation is presented. - ... If the stationarity assumptions do not hold, we will need to use a nonstationary approach, such as a structured antidependence (SAD) model [32,33] and autoregressive moving average (ARMA) [34], for the covariance structure. In some cases, nonparametric or semiparametric approaches are a better choice [35]. Thus, instead of estimating all elements in the covariance matrix, we estimate the parameters that model the covariance structure. ...
- ... [8,9]. Recently, applications involving estimation in very highdimensional settings have attracted resurgent attention, in particular for genomic data, e.g.10111213. However, there has been little interest in regularized estimation in estimating genetic parameters. ...Estimation of genetic covariance matrices for multivariate problems comprising more than a few traits is inherently problematic, since sampling variation increases dramatically with the number of traits. This paper investigates the efficacy of regularized estimation of covariance components in a maximum likelihood framework, imposing a penalty on the likelihood designed to reduce sampling variation. In particular, penalties that "borrow strength" from the phenotypic covariance matrix are considered. An extensive simulation study was carried out to investigate the reduction in average 'loss', i.e. the deviation in estimated matrices from the population values, and the accompanying bias for a range of parameter values and sample sizes. A number of penalties are examined, penalizing either the canonical eigenvalues or the genetic covariance or correlation matrices. In addition, several strategies to determine the amount of penalization to be applied, i.e. to estimate the appropriate tuning factor, are explored. It is shown that substantial reductions in loss for estimates of genetic covariance can be achieved for small to moderate sample sizes. While no penalty performed best overall, penalizing the variance among the estimated canonical eigenvalues on the logarithmic scale or shrinking the genetic towards the phenotypic correlation matrix appeared most advantageous. Estimating the tuning factor using cross-validation resulted in a loss reduction 10 to 15% less than that obtained if population values were known. Applying a mild penalty, chosen so that the deviation in likelihood from the maximum was non-significant, performed as well if not better than cross-validation and can be recommended as a pragmatic strategy. Penalized maximum likelihood estimation provides the means to 'make the most' of limited and precious data and facilitates more stable estimation for multi-dimensional analyses. It should become part of our everyday toolkit for multivariate estimation in quantitative genetics.
- ... Though nonparametric modeling of the time-dependent mean functions has been extensively studied, research on the modeling of the covariance structures via non-parametric approaches is rarely reported due to various difficulties [18]. In the original functional mapping [5], a stationary covariance function such as the first-order autoregressive (AR(1)) model was applied. ...Functional mapping has been a powerful tool in mapping quantitative trait loci (QTL) underlying dynamic traits of agricultural or biomedical interest. In functional mapping, multivariate normality is often assumed for the underlying data distribution, partially due to the ease of parameter estimation. The normality assumption however could be easily violated in real applications due to various reasons such as heavy tails or extreme observations. Departure from normality has negative effect on testing power and inference for QTL identification. In this work, we relax the normality assumption and propose a robust multivariate t-distribution mapping framework for QTL identification in functional mapping. Simulation studies show increased mapping power and precision with the t distribution than that of a normal distribution. The utility of the method is demonstrated through a real data analysis.
- ... This unconstrained reparameterization and its statistical interpretability makes it easy to incorporate covariates in covariance modeling and to cast the joint modeling of mean and covariance into the generalized linear model framework. The methodology has proved to be useful in recent literature; see for example, Pourahmadi and Daniels (2002), Pan and MacKenzie (2003), Ye and Pan (2006), Daniels (2006), Huang et al. (2006), Levina et al. (2008), Yap et al. (2009), and Lin and Wang (2009). ...Missing data in longitudinal studies can create enormous challenges in data analysis when coupled with the positive-definiteness constraint on a covariance matrix. For complete balanced data, the Cholesky decomposition of a covariance matrix makes it possible to remove the positive-definiteness constraint and use a generalized linear model setup to jointly model the mean and covariance using covariates (Pourahmadi, 2000). However, this approach may not be directly applicable when the longitudinal data are unbalanced, as coherent regression models for the dependence across all times and subjects may not exist. Within the existing generalized linear model framework, we show how to overcome this and other challenges by embedding the covariance matrix of the observed data for each subject in a larger covariance matrix and employing the familiar EM algorithm to compute the maximum likelihood estimates of the parameters and their standard errors. We illustrate and assess the methodology using real data sets and simulations.
- ... For a longitudinal covariance matrix R i , it is suggested that an appropriate statistical model be used to model its structure. A number of models have been available to model the covariance structure within the functional mapping framework [32, 33]. In a real example for QTL mapping using an RIL population of soybeans (Figure 1 ), it appears that timevarying variability in seed developmental trajectories can be modeled by a simple autoregressive model of order one [AR(1)] for covariance structure. ...ArticleFull-text available
- Mar 2013

Despite a tremendous effort to map quantitative trait loci (QTLs) responsible for agriculturally and biologically important traits in plants, our understanding of how a QTL governs the developmental process of plant seeds remains elusive. In this article, we address this issue by describing a model for functional mapping of seed development through the incorporation of the relationship between vegetative and reproductive growth. The time difference of reproductive from vegetative growth is described by Reeve and Huxley’s allometric equation. Thus, the implementation of this equation into the framework of functional mapping allows dynamic QTLs for seed development to be identified more precisely. By estimating and testing mathematical parameters that define Reeve and Huxley’s allometric equations of seed growth, the dynamic pattern of the genetic effects of the QTLs identified can be analyzed. We used the model to analyze a soybean data, leading to the detection of QTLs that control the growth of seed dry weight. Three dynamic QTLs, located in two different linkage groups, were detected to affect growth curves of seed dry weight. The QTLs detected may be used to improve seed yield with marker-assisted selection by altering the pattern of seed development in a hope to achieve a maximum size of seeds at a harvest time. - ... The Fan et al. model's advantage lies in the combination between the flexibility of nonparametric modeling and parsimony of parametric modeling. The establishment of a robust estimation procedure and asymptotic properties of the estimators will make this semiparametric model useful in the practical estimation of the covariance function [190]. ...Article
- Feb 2015
- PHYS LIFE REV

Despite increasing emphasis on the genetic study of quantitative traits, we are still far from being able to chart a clear picture of their genetic architecture, given an inherent complexity involved in trait formation. A competing theory for studying such complex traits has emerged by viewing their phenotypic formation as a "system" in which a high-dimensional group of interconnected components act and interact across different levels of biological organization from molecules through cells to whole organisms. This system is initiated by a machinery of DNA sequences that regulate a cascade of biochemical pathways to synthesize endophenotypes and further assemble these endophenotypes toward the end-point phenotype in virtue of various developmental changes. This review focuses on a conceptual framework for genetic mapping of complex traits by which to delineate the underlying components, interactions and mechanisms that govern the system according to biological principles and understand how these components function synergistically under the control of quantitative trait loci (QTLs) to comprise a unified whole. This framework is built by a system of differential equations that quantifies how alterations of different components lead to the global change of trait development and function, and provides a quantitative and testable platform for assessing the multiscale interplay between QTLs and development. The method will enable geneticists to shed light on the genetic complexity of any biological system and predict, alter or engineer its physiological and pathological states. Copyright © 2015. Published by Elsevier B.V. - ... For a longitudinal covariance matrix R i , it is suggested that an appropriate statistical model be used to model its structure. A number of models have been available to model the covariance structure within the functional mapping framework [32, 33]. In a real example for QTL mapping using an RIL population of soybeans (Figure 1 ), it appears that timevarying variability in seed developmental trajectories can be modeled by a simple autoregressive model of order one [AR(1)] for covariance structure. ...DataFull-text available
- Mar 2014

- ... Much statistical analysis of such high dimensional data requires estimating a covariance matrix or its inverse. Several applications in numerous domains such as port- folio management and risk assessment Wolf, 2003, 2004;Jagannathan and Ma, 2003;Kourtis et al., 2012;Fan et al., 2013;Xue et al., 2012;Lai et al., 2011;Deng and Tsui, 2013), high dimensional classification ( Guo et al., 2007;Witten and Tibshirani, 2011;Tibshirani et al., 2004), analysis of independence and conditional indepen- dence relationships between components in graphical models, statistical inference like controlling false discoveries in multiple testing ( Leek and Storey, 2008;Efron, 2010), finding quantitative trait loci based on longitudinal data ( Yap et al., 2009;Xiong et al., 2011), testing the capital asset pricing model (Sentana, 2009), etc., have reported suc- cess stories of using covariance matrix estimation. For instance, principal component analysis (PCA) applies the eigen-decomposition of the covariance matrix for dimension reduction. ...Article
- Sep 2017
- NEURAL COMPUT

This letter proposes a novel approach using the [Formula: see text]-norm regularization for the sparse covariance matrix estimation (SCME) problem. The objective function of SCME problem is composed of a nonconvex part and the [Formula: see text] term, which is discontinuous and difficult to tackle. Appropriate DC (difference of convex functions) approximations of [Formula: see text]-norm are used that result in approximation SCME problems that are still nonconvex. DC programming and DCA (DC algorithm), powerful tools in nonconvex programming framework, are investigated. Two DC formulations are proposed and corresponding DCA schemes developed. Two applications of the SCME problem that are considered are classification via sparse quadratic discriminant analysis and portfolio optimization. A careful empirical experiment is performed through simulated and real data sets to study the performance of the proposed algorithms. Numerical results showed their efficiency and their superiority compared with seven state-of-the-art methods. - ... Second, functional mapping chooses and uses a cost-effective statistical model to structure the longitudinal covariance Σ by a particular set of parameters. Such a model can be parametric, such as autoregressive ( Ma et al., 2002), antedependence ( Zhao et al., 2005) and autoregressive moving average ( ), nonparametric ( Yap et al., 2009), or semiparameteric ( Das et al., 2012), depending on data types. Estimating fewer parameters, these models display increasing statistical power for QTL detection. ...ArticleFull-text available
- Mar 2016

Phase change plays a prominent role in determining the form of growth and development. Although considerable attention has been focused on identifying the regulatory control mechanisms of phase change, a detailed understanding of the genetic architecture of this phenomenon is still lacking. We address this issue by deriving a computational model. The model is founded on the framework of functional mapping aimed at characterizing the interplay between quantitative trait loci (QTLs) and development through biologically meaningful mathematical equations. A multiphasic growth equation was implemented into functional mapping, which, via a series of hypothesis tests, allows the quantification of how QTLs regulate the timing and pattern of vegetative phase transition between independently regulated, temporally coordinated processes. The model was applied to analyze stem radial growth data of an interspecific hybrid family derived from two Populus species during the first 24 yr of ontogeny. Several key QTLs related to phase change have been characterized, most of which were observed to be in the adjacent regions of candidate genes. The identification of phase transition QTLs, whose expression is regulated by endogenous and environmental signals, may enhance our understanding of the evolution of development in changing environments. - ... The statistical power of systems mapping is partly to result from structural modeling of the covariance matrix (15). There are many approaches which are available to model the covariance structure, including autoregressive ( Ma et al., 2002), antedepen- dent ( Zhao et al., 2005), autoregressive moving average ( Li et al., 2010), nonparametric ( Yap et al., 2009), and semiparametric ( Das et al., 2013). These approaches have their own advantages and disadvantages in terms of efficiency and flexibility, and parsimony. ...
- Article
- Jan 2014

Cancer can be controlled effectively by using chemotherapeutic drugs to inhibit cancer stem cells, but there is considerable inter-patient variability regarding how these cells respond to drug intervention. Here, we describe a statistical framework for mapping genes that control tumor responses to chemotherapeutic drugs as well as the efficacy of treatments in arresting tumor growth. The framework integrates the mathematical aspects of the cancer stem cell hypothesis into genetic association studies, equipped with a capacity to quantify the magnitude and pattern of genetic effects on the kinetic decline of cancer stem cells in response to therapy. By quantifying how specific genes and their interactions govern drug response, the model provides essential information to tailor personalized drugs for individual patients. - ArticleFull-text available
- Feb 2012

The identification of imprinted genes is becoming a standard procedure in searching for quantitative trait loci (QTL) underlying complex traits. When a developmental characteristic such as growth or drug response is observed at multiple time points, understanding the dynamics of gene function governing the underlying feature should provide more biological information regarding the genetic control of an organism. Recognizing that differential imprinting can be development-specific, mapping imprinted genes considering the dynamic imprinting effect can provide additional biological insights into the epigenetic control of a complex trait. In this study, we proposed a Bayesian imprinted QTL (iQTL) mapping framework considering the dynamics of imprinting effects and model multiple iQTLs with an efficient Bayesian model selection procedure. The method overcomes the limitation of likelihood-based mapping procedure, and can simultaneously identify multiple iQTLs with different gene action modes across the whole genome with high computational efficiency. An inference procedure using Bayes factors to distinguish different imprinting patterns of iQTL was proposed. Monte Carlo simulations were conducted to evaluate the performance of the method. The utility of the approach was illustrated through an analysis of a body weight growth data set in an F(2) family derived from LG/J and SM/J mouse stains. The proposed Bayesian mapping method provides an efficient and computationally feasible framework for genome-wide multiple iQTL inference with complex developmental traits. - Article
- Mar 2010
- J Biopharm Stat

Tremendous progress has been made in recent years on developing statistical methods for mapping quantitative trait loci (QTL) from crosses of inbred lines. Most of the recent research is focused on strategies for mapping multiple-QTL and associated model selection procedures and criterion. We review the progress of research in this area on one trait and multiple traits by maximum likelihood and Bayesian methods. - Obtaining accurate estimates of the genetic covariance matrix Sigma(G) for multivariate data is a fundamental task in quantitative genetics and important for both evolutionary biologists and plant or animal breeders. Classical methods for estimating Sigma(G) are well known to suffer from substantial sampling errors; importantly, its leading eigenvalues are systematically overestimated. This article proposes a framework that exploits information in the phenotypic covariance matrix Sigma(P) in a new way to obtain more accurate estimates of Sigma(G). The approach focuses on the "canonical heritabilities" (the eigenvalues of Sigma(P)(-1)Sigma(G)), which may be estimated with more precision than those of Sigma(G) because Sigma(P) is estimated more accurately. Our method uses penalized maximum likelihood and shrinkage to reduce bias in estimates of the canonical heritabilities. This in turn can be exploited to get substantial reductions in bias for estimates of the eigenvalues of Sigma(G) and a reduction in sampling errors for estimates of Sigma(G). Simulations show that improvements are greatest when sample sizes are small and the canonical heritabilities are closely spaced. An application to data from beef cattle demonstrates the efficacy this approach and the effect on estimates of heritabilities and correlations. Penalized estimation is recommended for multivariate analyses involving more than a few traits or problems with limited data.
- The identification of genes or quantitative trait loci that are expressed in response to different environmental factors such as temperature and light, through functional mapping, critically relies on precise modeling of the covariance structure. Previous work used separable parametric covariance structures, such as a Kronecker product of autoregressive one [AR(1)] matrices, that do not account for interaction effects of different environmental factors. We implement a more robust nonparametric covariance estimator to model these interactions within the framework of functional mapping of reaction norms to two signals. Our results from Monte Carlo simulations show that this estimator can be useful in modeling interactions that exist between two environmental signals. The interactions are simulated using nonseparable covariance models with spatio-temporal structural forms that mimic interaction effects. The nonparametric covariance estimator has an advantage over separable parametric covariance estimators in the detection of QTL location, thus extending the breadth of use of functional mapping in practical settings.
- Functional mapping is a statistical method for mapping quantitative trait loci (QTLs) that regulate the dynamic pattern of a biological trait. This method integrates mathematical aspects of biological complexity into a mixture model for genetic mapping and tests the genetic effects of QTLs by comparing genotype-specific curve parameters. As a way of quantitatively specifying the dynamic behavior of a system, differential equations have proven to be powerful for modeling and unraveling the biochemical, molecular, and cellular mechanisms of a biological process, such as biological rhythms. The equipment of functional mapping with biologically meaningful differential equations provides new insights into the genetic control of any dynamic processes. We formulate a new functional mapping framework for a dynamic biological rhythm by incorporating a group of ordinary differential equations (ODE). The Runge-Kutta fourth order algorithm was implemented to estimate the parameters that define the system of ODE. The new model will find its implications for understanding the interplay between gene interactions and developmental pathways in complex biological rhythms.
- Article
- Feb 2011

Increasing evidence shows that genes have a pivotal role in affecting the dynamic pattern of viral loads in the body of a host. By reviewing the biochemical interactions between a virus and host cells as a dynamic system, we outline a computational approach for mapping the genetic control of virus dynamics. The approach integrates differential equations (DEs) to quantify the dynamic origin and behavior of a viral infection system. It enables geneticists to generate various testable hypotheses about the genetic control mechanisms for virus dynamics and infection. The experiment designed according to this approach will also enable researchers to gain insight into the role of genes in limiting virus abundance and the dynamics of viral drug resistance, facilitating the development of personalized medicines to eliminate viral infections. - ThesisFull-text available
- Dec 2013

Many complex traits and human diseases, such as blood pressure and body weight, are known to change over time. The genetic basis of such traits can be better understood by repeatedly collecting data over time. The resulting longitudinal data provide us useful resources for studying the joint action of multiple time-dependent genetic factors. In the first part of the dissertation, we extend two existing Bayesian multiple quantitative trait loci (QTL) mapping methods from univariate traits to longitudinal traits. Our first approach focuses on mapping genes with main effects and two-way gene-gene and gene-environment interactions. Multiple QTL are selected by a variable selection procedure based on the composite model space framework. Our second approach presents a Bayesian Gaussian process method to map multiple QTL without restricting to pairwise interactions. Rather than modeling each main and interaction term explicitly, the nonparametric Bayesian method measures the importance of each QTL, regardless whether it is mostly due to a main effect or some interaction effect(s), via an unspecified function. We assign a Gaussian process prior to this unknown function. For the unstructured covariance matrix, both approaches employ a modified Cholesky decomposition. For data where phenotype measurements are not collected at a fixed set of time points across all samples, we propose a grid-based approach which parsimoniously approximates the covariance matrix of each subject as a function of a covariance matrix defined on a set of pre-selected time points. For most genome-wide association studies (GWAS), power to detect an association between a single genetic variant, such as a single nucleotide polymorphism (SNP) and a complex trait is extremely low. Alternative strategies, such as regional SNP-set analysis have overcome some of the limitations of the standard single SNP analysis. Our third topic develops a Bayesian regional SNP-set analysis which extends the nonparametric Gaussian process model and simultaneously models multiple groups of rare and/or common SNP variants. Instead of assigning each SNP a hyperparameter, we assign a common hyperparameter to every SNP within each set to measure the cumulative effect of all SNPs in that set. - Article
- Mar 2019
- DRUG DISCOV TODAY

The personalized therapy for hypertension needs comprehensive knowledge about how blood pressures (BPs; systolic and diastolic) and their pulsatile and steady components are controlled by genetic factors. Here, we propose a unified pharmacodynamic (PD) functional mapping framework for identifying specific quantitative trait loci (QTLs) that mediate multivariate response–dose curves of BP. This framework can characterize how QTLs govern pulsatile and steady components through jointly regulating systolic and diastolic pressures. The model can quantify the genetic effects of individual QTLs on maximal drug effect, the maximal rate of drug response, and the dose window of maximal drug response. This unified mapping framework provides a tool for identifying pharmacological genes potentially useful to design the right medication and right dose for patients. - Portfolio management and integrated risk management are more commonly applied toward Enterprise Risk Management (ERM), requiring multivariate risk measures that capture the dependence among many risk factors. In this paper we propose the non-parametric estimator for multivariate value at risk (MVaR) and multivariate average value at risk (MAVaR) based on the multivariate geometric quantile approach and derive the symptotic properties of the proposed estimators for MVaR. We also present their performances under both simulated data and high-frequency financial data from the New York Stock Exchange. In addition, we compare our method with the delta normal approach and order statistics approach. The overall empirical results confirm that the multivariate geometric quantile approach significantly improves the risk management performance of MVaR and MAVaR.
- Article
- Mar 2012
- Meth Mol Biol

Functional mapping is a statistical tool for mapping quantitative trait loci (QTLs) that control the developmental pattern and process of a complex trait. Functional mapping has two significant advantages beyond traditional QTL mapping approaches. First, it integrates biological principles of trait formation into the model, enabling the biological interpretation of QTLs detected. Second, functional mapping is based on parsimonious modeling of mean-covariance structures, which enhances the statistical power of QTL detection. Here, we review the basic theory of functional mapping and describe one of its applications to plant genetics. We pinpoint several areas in which functional mapping can be integrated with systems biology to further our understanding of the genetic and genetic regulatory underpinnings of development. - Article
- Dec 2013
- Pharmacogenomics

Clinical pharmacogenomics, aimed at integrating genomic information with clinical practices to facilitate the prediction of drug response, has recently emerged as a vital area of public health. To make clinical pharmacogenomics a success, we need a comprehensive understanding of how genes singly or interactively affect patients' response to a particular drug. In this chapter, we review statistical designs for mapping the genetic architecture of drug response using molecular markers. Genes that affect a pharmacological response, their number, genomic distribution, and genetic actions and interactions can be estimated and tested. Functional mapping that integrates genetic mapping with pharmacodynamic and pharmacokinetic machineries of drug response can improve the precision of mapping results and their clinical interpretations. Genome-wide association studies (GWASes), beyond traditional genetic mapping approaches, provide an unprecedented opportunity to chart a complete picture of the genetic control of drug response. The implementation of GWASes by functional mapping leads to the birth of a dynamic model, fGWAS, for studying and characterizing clinical pharmacogenomics toward personalized medicine. - Although genome-wide association studies (GWAS) have proven powerful for comprehending the genetic architecture of complex traits, they are challenged by a high dimension of single-nucleotide polymorphisms (SNPs) as predictors, the presence of complex environmental factors, and longitudinal or functional natures of many complex traits or diseases. To address these challenges, we propose a high-dimensional varying-coefficient model for incorporating functional aspects of phenotypic traits into GWAS to formulate a so-called functional GWAS or fGWAS. The Bayesian group lasso and the associated MCMC algorithms are developed to identify significant SNPs and estimate how they affect longitudinal traits through time-varying genetic actions. The model is generalized to analyze the genetic control of complex traits using subject-specific sparse longitudinal data. The statistical properties of the new model are investigated through simulation studies. We use the new model to analyze a real GWAS data set from the Framingham Heart Study, leading to the identification of several significant SNPs associated with age-specific changes of body mass index. The fGWAS model, equipped with the Bayesian group lasso, will provide a useful tool for genetic and developmental analysis of complex traits or diseases.
- Chapter
- Jul 2015

Precise identification of biological samples remains the most important proof in the forensic science. Illegal logging has become the urgent issue in Poland during the last decades, and conventional methods of investigation turn out to be often insufficient. Recently, the DNA-based markers (SSR and cytoplasmic genes) can remarkably help in the forensic botany performed by the Forest Service Guards and the Police investigation in illegal logging of timber. The identification method relies on comparison of the piece of evidence (i.e., stolen wood fragments) with the piece of reference (e.g., tree parts remained in the forest). We present the usefulness of the DNA neutral markers (i.e., microsatellite loci) and cytoplasmic genes in forensic botany based on several case studies of illegal wood identification in Poland, concerning the most economically important coniferous tree species such as Pinus sylvestris L., Picea abies (L.) Karst., Abies alba Mill., and Larix decidua (L.). Thanks to the DNA profiles established on the basis of minimum 4 microsatellite nuclear DNA loci, and at least one cytoplasmic organelle (mitochondrial or chloroplast) DNA marker, the determination of the DNA profiles provided fast and reliable comparison between material of evidence (also wood and needles) and material of reference (first of all tree stumps) in the forest. These data strongly supported the decision taken by several District Courts in Poland, as far as the identification of wood samples was proved with a high probability (approximately 98–99 %). The aim of the below publication is to present Polish case study on DNA use to fight illegal logging which became very successful among foresters. - Cells with the same genotype growing under the same conditions can show different phenotypes, which is known as "population heterogeneity". The heterogeneity of hematopoietic progenitor cells has an effect on their differentiation potential and lineage choices. However, the genetic mechanisms governing population heterogeneity remain unclear. Here, we present a statistical model for mapping the quantitative trait locus (QTL) that affects hematopoietic cell heterogeneity. This strategy, termed systems mapping, integrates a system of differential equations into the framework for systems mapping, allowing hypotheses regarding the interplay between genetic actions and cell heterogeneity to be tested. A simulation approach based on cell heterogeneity dynamics has been designed to test the statistical properties of the model. This model not only considers the traditional QTLs, but also indicates the methylated QTLs that can illustrate non-genetic individual differences. It has significant implications for probing the molecular, genetic and epigenetic mechanisms of hematopoietic progenitor cell heterogeneity.
- Understanding the genetic machinery of plant growth and development is of fundamental importance in agriculture and biology. Recently, a novel statistical framework, coined functional mapping, has been developed to study the genetic architecture of the dynamic pattern of phenotypic development at different levels of organization. By integrating mathematical aspects of cellular and biological processes, functional mapping provides a quantitative platform in which a seemingly unlimited number of hypotheses about the interplay between genes and development can be asked and tested. However, plant development involves a series of multi-hierarchical, sequential pathways from DNA to mRNA to proteins to metabolites and finally to high-order phenotypes, and thus it is unlikely that the control mechanisms of plant development can be understood using genetic knowledge alone. Here, we describe a network biology approach for functional mapping of phenotypic formation and progression through their underlying biochemical pathways. The integration of functional mapping with information-rich spectroscopic data sets including transcriptome, proteome, and metabolome can be used to model and predict physiological variation and plant development, and will pave the way for future genetic studies capable of addressing the complex nature of growth and development.
- Phenotypic traits, such as seed development, are a consequence of complex biochemical interactions among genes, proteins and metabolites, but the underlying mechanisms that operate in a coordinated and sequential manner remain elusive. Here, we address this issue by developing a computational algorithm to monitor proteome changes during the course of trait development. The algorithm is built within the mixture-model framework in which each mixture component is modeled by a specific group of proteins that display a similar temporal pattern of expression in trait development. A nonparametric approach based on Legendre orthogonal polynomials was used to fit dynamic changes of protein expression, increasing the power and flexibility of protein clustering. By analyzing a dataset of proteomic dynamics during early embryogenesis of the Chinese fir, the algorithm has successfully identified several distinct types of proteins that coordinate with each other to determine seed development in this forest tree commercially and environmentally important to China. The algorithm will find its immediate applications for the characterization of mechanistic underpinnings for any other biological processes in which protein abundance plays a key role.
- Article
- Mar 2013
- STAT APPL GENET MOL

Abstract Knowledge of genes influencing longitudinal patterns may offer information about predicting disease progression. We developed a systematic procedure for testing association between SNP genotypes and longitudinal phenotypes. We evaluated false positive rates and statistical power to localize genes for disease progression. We used genome-wide SNP data from the Framingham Heart Study. With longitudinal data from two real studies unrelated to Framingham, we estimated three trajectory curves from each study. We performed simulations by randomly selecting 500 individuals. In each simulation replicate, we assigned each individual to one of the three trajectory groups based on the underlying hypothesis (null or alternative), and generated corresponding longitudinal data. Individual Bayesian posterior probabilities (BPPs) for belonging to a specific trajectory curve were estimated. These BPPs were treated as a quantitative trait and tested (using the Wald test) for genome-wide association. Empirical false positive rates and power were calculated. Our method maintained the expected false positive rate for all simulation models. Also, our method achieved high empirical power for most simulations. Our work presents a method for disease progression gene mapping. This method is potentially clinically significant as it may allow doctors to predict disease progression based on genotype and determine treatment accordingly. - Article
- Mar 2013

The latest developments of pharmacology in the post-genomic era foster the emergence of new biomarkers that represent the future of drug targets. To identify these biomarkers, we need a major shift from traditional genomic analyses alone, moving the focus towards systems approaches to elucidating genetic variation in biochemical pathways of drug response. Is there any general model that can accelerate this shift via a merger of systems biology and pharmacogenomics? Here we describe a statistical framework for mapping dynamic genes that affect drug response by incorporating its pharmacokinetic and pharmacodynamic pathways. This framework is expanded to shed light on the mechanistic and therapeutic differences of drug response based on pharmacogenetic information, coupled with genomic, proteomic and metabolic data, allowing novel therapeutic targets and genetic biomarkers to be characterized and utilized for drug discovery. - Article
- Mar 2013

As a basis of personalized medicine, pharmacogenetics and pharmacogenomics that aim to study the genetic architecture of drug response critically rely on dynamic modeling of how a drug is absorbed and transported to target tissues where the drug interacts with body molecules to produce drug effects. Systems mapping provides a general framework for integrating systems pharmacology and pharmacogenomics through robust ordinary differential equations. In this chapter, we extend systems mapping to more complex and more heterogeneous structure of drug response by implementing stochastic differential equations (SDE). We argue that SDE-implemented systems mapping provides a computational tool for pharmacogenetic or pharmacogenomic research towards personalized medicine. - Article
- May 2014
- Mol Biol Evol

Heterochrony, the phylogenic change in the time of developmental events or rate of development, has been thought to play an important role in producing phenotypic novelty during evolution. Increasing evidence suggests that specific genes are implicated in heterochrony, guiding the process of developmental divergence, but no quantitative models have been instrumented to map such heterochrony genes. Here we present a computational framework for genetic mapping by which to characterize and locate quantitative trait loci (QTLs) that govern heterochrony described by four parameters, the timing of the inflection point, the timing of maximum acceleration of growth, the timing of maximum deceleration of growth, and the length of linear growth. The framework was developed from functional mapping, a dynamic model derived to map QTLs for the overall process and pattern of development. By integrating an optimality algorithm, the framework allows the so-called heterochrony QTLs (hQTLs) to be tested and quantified. Specific pipelines are given for testing how hQTLs control the onset and offset of developmental events, the rate of development, and duration of a particular developmental stage. Computer simulation was performed to examine the statistical properties of the model and demonstrate its utility to characterize the effect of hQTLs on population diversification due to heterochrony. By analyzing a genetic mapping data in rice, the framework identified an hQTL that controls the timing of maximum growth rate and duration of linear growth stage in plant height growth. The framework provides a tool to study how genetic variation translates into phenotypic innovation, leading a lineage to evolve, through heterochrony. - Article
- Apr 2013

This paper discusses statistical methods for estimating complex correlation structure from large pharmacogenomic datasets. We selectively review several prominent statistical methods for estimating large covariance matrix for understanding correlation structure, inverse covariance matrix for network modeling, large-scale simultaneous tests for selecting significantly differently expressed genes and proteins and genetic markers for complex diseases, and high dimensional variable selection for identifying important molecules for understanding molecule mechanisms in pharmacogenomics. Their applications to gene network estimation and biomarker selection are used to illustrate the methodological power. Several new challenges of Big data analysis, including complex data distribution, missing data, measurement error, spurious correlation, endogeneity, and the need for robust statistical methods, are also discussed. - Article
- Nov 2009
- TRENDS GENET

One of the fundamental tasks in biology is the identification of genes that control the structure and developmental pattern of complex traits and their responses to the environment during trait development. Functional mapping provides a statistical means for detecting quantitative trait loci (QTLs) that underlie developmental traits, such as growth trajectories, and for testing the interplay between gene action and development. Here we describe how functional mapping and studies of plant ontology can be integrated so as to elucidate the expression mechanisms of QTLs that control plant growth, morphology, development, and adaptation to changing environments. This approach can also be used to construct an evo-devo framework for inferring the evolution of developmental traits.

- Article
- Feb 1989
- GENETICS

The advent of complete genetic linkage maps consisting of codominant DNA markers [typically restriction fragment length polymorphisms (RFLPs)] has stimulated interest in the systematic genetic dissection of discrete Mendelian factors underlying quantitative traits in experimental organisms. We describe here a set of analytical methods that modify and extend the classical theory for mapping such quantitative trait loci (QTLs). These include: (i) a method of identifying promising crosses for QTL mapping by exploiting a classical formula of SEWALL WRIGHT; (ii) a method (interval mapping) for exploiting the full power of RFLP linkage maps by adapting the approach of LOD score analysis used in human genetics, to obtain accurate estimates of the genetic location and phenotypic effect of QTLs; and (iii) a method (selective genotyping) that allows a substantial reduction in the number of progeny that need to be scored with the DNA markers. In addition to the exposition of the methods, explicit graphs are provided that allow experimental geneticists to estimate, in any particular case, the number of progeny required to map QTLs underlying a quantitative trait. - Article
- Jan 1990

The EM algorithm is a popular approach to maximum likelihood estimation but has not been much used for penalized likelihood or maximum a posteriori estimation. This paper discusses properties of the EM algorithm in such contexts, concentrating on rates of convergence, and presents an alternative that is usually more practical and converges at least as quickly. - We investigate power transformations in nonlinear regression problems when there is a physical model for the response but little understanding of the underlying error structure. In such circumstances, and unlike the ordinary power transformation model, both the response and the model must be transformed simultaneously and in the same way. We show by an asymptotic theory and a small Monte Carlo study that for estimating the model parameters there is little cost for not knowing the correct transform a priori; this is in dramatic contrast to the results for the usual case where only the response is transformed. Possible applications of the theory are illustrated by examples.
- Article
- Mar 1989
- BIOMETRICS

An abstract is not available. - Article
- Jul 2001
- J Chemometr

In this paper, penalized regression using the L1 norm on the estimated parameters is proposed for chemometric calibration. The algorithm is of the lasso type, introduced by Tibshirani in 1996 as a linear regression method with bound on the absolute length of the parameters, but a modification is suggested to cope with the singular design matrix most often seen in chemometric calibration. Furthermore, the proposed algorithm may be generalized to all convex norms like ∑|βj| where ≥ 1, i.e. a method that continuously varies from ridge regression to the lasso. The lasso is applied both directly as a calibration method and as a method to select important variables/wavelengths. It is demonstrated that the lasso algorithm, in general, leads to parameter estimates of which some are zero while others are quite large (compared to e.g. the traditional PLS or RR estimates). By using several benchmark data sets, it is shown that both the direct lasso method and the regression where the lasso acts as a wavelength selection method most often outperform the PLS and RR methods. Copyright © 2001 John Wiley & Sons, Ltd. - Article
- Mar 2007
- J COMPUT GRAPH STAT

The major difficulties in estimating a large covariance matrix are the high dimen-sionality and the positive definiteness constraint. To overcome these difficulties, we propose to apply smoothing-based regularization and utilize the modified Cholesky decomposition of the covariance matrix. In our proposal, the covariance matrix is di-agonalized by a lower triangular matrix, whose subdiagonals are treated as smooth functions. These functions are approximated by splines and estimated by maximizing the normal likelihood. In our framework, the mean and the covariance of the longitudi-nal data can be modeled simultaneously and missing data can be handled in a natural way using the EM algorithm. We illustrate the proposed method via simulation and applying it to two real data examples, which involve estimation of 11 by 11 and 102 by 102 covariance matrices. Huang's work is partially supported by US NSF grants DMS-0204556 and DMS-0606580. The authors would like to thank Mohsen Pourahmadi for helpful discussion. The comments from an AE and three referees help significantly improve the paper. - Article
- Apr 2012
- TECHNOMETRICS

In multiple regression it is shown that parameter estimates based on minimum residual sum of squares have a high probability of being unsatisfactory, if not incorrect, if the prediction vectors are not orthogonal. Proposed is an estimation procedure based on adding small positive quantities to the diagonal of X′X. Introduced is the ridge trace, a method for showing in two dimensions the effects of nonorthogonality. It is then shown how to augment X′X to obtain biased estimates with smaller mean square error. - Article
- Sep 1977

A broadly applicable algorithm for computing maximum likelihood estimates from incomplete data is presented at various levels of generality. Theory showing the monotone behaviour of the likelihood and convergence of the algorithm is derived. Many examples are sketched, including missing value situations, applications to grouped, censored or truncated data, finite mixture models, variance component estimation, hyperparameter estimation, iteratively reweighted least squares and factor analysis. - Article
- Jan 1977
- J R STAT SOC B

A broadly applicable algorithm for computing maximum likelihood estimates from incomplete data is presented at various levels of generality. Theory showing the monotone behaviour of the likelihood and convergence of the algorithm is derived. Many examples are sketched, including missing value situations, applications to grouped, censored or truncated data, finite mixture models, variance component estimation, hyperparameter estimation, iteratively reweighted least squares and factor analysis. - Genetic interactions or epistasis may play an important role in the genetic etiology of drug response. With the availability of large-scale, high-density single nucleotide polymorphism markers, a great challenge is how to associate haplotype structures and complex drug response through its underlying pharmacodynamic mechanisms. We have derived a general statistical model for detecting an interactive network of DNA sequence variants that encode pharmacodynamic processes based on the haplotype map constructed by single nucleotide polymorphisms. The model was validated by a pharmacogenetic study for two predominant beta-adrenergic receptor (betaAR) subtypes expressed in the heart, beta1AR and beta2AR. Haplotypes from these two receptors trigger significant interaction effects on the response of heart rate to different dose levels of dobutamine. This model will have implications for pharmacogenetic and pharmacogenomic research and drug discovery. A computer program written in Matlab can be downloaded from the webpage of statistical genetics group at the University of Florida. Supplementary data are available at Bioinformatics online.
- Article
- Jun 2009
- ANN APPL STAT

Graphical models are frequently used to explore networks, such as genetic networks, among a set of variables. This is usually carried out via exploring the sparsity of the precision matrix of the variables under consideration. Penalized likelihood methods are often used in such explorations. Yet, positive-definiteness constraints of precision matrices make the optimization problem challenging. We introduce non-concave penalties and the adaptive LASSO penalty to attenuate the bias problem in the network estimation. Through the local linear approximation to the non-concave penalty functions, the problem of precision matrix estimation is recast as a sequence of penalized likelihood problems with a weighted L(1) penalty and solved using the efficient algorithm of Friedman et al. (2008). Our estimation schemes are applied to two real datasets. Simulation experiments and asymptotic theory are used to justify our proposed methods. - This paper studies the sparsistency and rates of convergence for estimating sparse covariance and precision matrices based on penalized likelihood with nonconvex penalty functions. Here, sparsistency refers to the property that all parameters that are zero are actually estimated as zero with probability tending to one. Depending on the case of applications, sparsity priori may occur on the covariance matrix, its inverse or its Cholesky decomposition. We study these three sparsity exploration problems under a unified framework with a general penalty function. We show that the rates of convergence for these problems under the Frobenius norm are of order (s(n) log p(n)/n)(1/2), where s(n) is the number of nonzero elements, p(n) is the size of the covariance matrix and n is the sample size. This explicitly spells out the contribution of high-dimensionality is merely of a logarithmic factor. The conditions on the rate with which the tuning parameter λ(n) goes to 0 have been made explicit and compared under different penalties. As a result, for the L(1)-penalty, to guarantee the sparsistency and optimal rate of convergence, the number of nonzero elements should be small: sn'=O(pn) at most, among O(pn2) parameters, for estimating sparse covariance or correlation matrix, sparse precision or inverse correlation matrix or sparse Cholesky factor, where sn' is the number of the nonzero elements on the off-diagonal entries. On the other hand, using the SCAD or hard-thresholding penalty functions, there is no such a restriction.
- Graphical models are frequently used to explore networks, such as genetic networks, among a set of variables. This is usually carried out via exploring the sparsity of the precision matrix of the variables under consideration. Penalized likelihood methods are often used in such explorations. Yet, positive-definiteness constraints of precision matrices make the optimization problem challenging. We introduce nonconcave penalties and the adaptive LASSO penalty to attenuate the bias problem in the network estimation. Through the local linear approximation to the nonconcave penalty functions, the problem of precision matrix estimation is recast as a sequence of penalized likelihood problems with a weighted $L_1$ penalty and solved using the efficient algorithm of Friedman et al. [Biostatistics 9 (2008) 432--441]. Our estimation schemes are applied to two real datasets. Simulation experiments and asymptotic theory are used to justify our proposed methods. Comment: Published in at http://dx.doi.org/10.1214/08-AOAS215 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)
- SUMMARY Two major reasons for the popularity of the EM algorithm are that its maximum step involves only complete-data maximum likelihood estimation, which is often computationally simple, and that its convergence is stable, with each iteration increasing the likelihood. When the associated complete-data maximum likelihood estimation itself is complicated, EM is less attractive because the M-step is computationally unattractive. In many cases, however, complete-data maximum likelihood estimation is relatively simple when conditional on some function of the parameters being estimated. We introduce a class of generalized EM algorithms, which we call the ECM algorithm, for Expectation/Conditional Maximization (CM), that takes advantage of the simplicity of complete-data conditional maximum likelihood estimation by replacing a complicated M-step of EM with several computationally simpler CM-steps. We show that the ECM algorithm shares all the appealing convergence properties of EM, such as always increasing the likelihood, and present several illustrative examples.
- Article
- Jun 2000
- BIOMETRIKA

The positive-definiteness constraint is the most awkward stumbling block in modelling the covariance matrix. Pourahmadi's (1999) unconstrained parameterisation models covariance using covariates in a similar manner to mean modelling in generalised linear models. The new covariance parameters have statistical interpretation as the regression coefficients and logarithms of prediction error variances corresponding to regressing a response on its predecessors. In this paper, the maximum likelihood estimators of the parameters of a generalised linear model for the covariance matrix, their consistency and their asymptotic normality are studied when the observations are normally distributed. These results along with the likelihood ratio test and penalised likelihood criteria such as BIC for model and variable selection are illustrated using a real dataset. - Article
- Jan 1965
- COMPUT J

A method is described for the minimization of a function of n variables, which depends on the comparison of function values at the (n + 1) vertices of a general simplex, followed by the replacement of the vertex with the highest value by another point. The simplex adapts itself to the local landscape, and contracts on to the final minimum. The method is shown to be effective and computationally compact. A procedure is given for the estimation of the Hessian matrix in the neighbourhood of the minimum, needed in statistical estimation problems. - Article
- Sep 1999
- BIOMETRIKA

We provide unconstrained parameterisation for and model a covariance using covariates. The Cholesky decomposition of the inverse of a covariance matrix is used to associate a unique unit lower triangular and a unique diagonal matrix with each covariance matrix. The entries of the lower triangular and the log of the diagonal matrix are unconstrained and have meaning as regression coefficients and prediction variances when regressing a measurement on its predecessors. An extended generalised linear model is introduced for joint modelling of the vectors of predictors for the mean and covariance subsuming the joint modelling strategy for mean and variance heterogeneity, Gabriel's antedependence models, Dempster's covariance selection models and the class of graphical models. The likelihood function and maximum likelihood estimators of the covariance and the mean parameters are studied when the observations are normally distributed. Applications to modelling nonstationary dependence structures and multivariate data are discussed and illustrated using real data. A graphical method, similar to that based on the correlogram in time series, is developed and used to identify parametric models for nonstationary covariances. - ArticleFull-text available
- Jun 2009

Functional mapping of dynamic traits measured in a longitudinal study was originally derived within the maximum likelihood (ML) context and implemented with the EM algorithm. Although ML-based functional mapping possesses many favorable statistical properties in parameter estimation, it may be computationally intractable for analyzing longitudinal data with high dimensions and high measurement errors. In this article, we derive a general functional mapping framework for quantitative trait locus mapping of dynamic traits within the Bayesian paradigm. Markov chain Monte Carlo techniques were implemented for functional mapping to estimate biologically and statistically sensible parameters that model the structures of time-dependent genetic effects and covariance matrix. The Bayesian approach is useful to handle difficulties in constructing confidence intervals as well as the identifiability problem, enhancing the statistical inference of functional mapping. We have undertaken simulation studies to investigate the statistical behavior of Bayesian-based functional mapping and used a real example with F2 mice to validate the utilization and usefulness of the model. - Article
- Jun 1998
- GENET MOL BIOL

Strain intercross experiments provide a powerful means for mapping genes affecting complex quantitative traits. We report on the genetic variability of the intercross of the Large (LG/J) and Small (SM/J) inbred mouse strains as a guide to gene mapping studies. Ten SM/J males were crossed to 10 LG/J females, after which animals were randomly mated to produce F1, F2, and F3 intercross generations. The 1632 F3 animals from 200 full-sib families were used to estimate heritabilities and genetic correlations of the traits measured. A subset of families was cross-fostered at birth to allow measurement of the importance of post-natal maternal effects. Data was collected on weekly body weight from one to 10 weeks and on organ weights, body weight, reproductive fat pad weight, and tail length at necropsy in the intercross generations. There was no heterosis for age-specific weights or necropsy traits, except that one-week weight was the highest in the F2 generation, indicating heterosis for maternal effect in the F1 mothers. We found moderate to high heritability for most age-specific weights and necropsy traits. Maternal effects were significant for age-specific weights from one to four weeks but disappeared completely at ten-week weight. Maternal effects for necropsy traits were low and not statistically significant. Age-specific weights showed a typical correlation pattern, with correlation declining as the difference in ages increased. Among necropsy traits, reproductive fat pad and body weights were very highly genetically correlated. Most other genetic correlations were low to moderate. The intercross between SM/J and LG/J inbred mouse strains provides a valuable resource for mapping quantitative trait loci for body size, composition, and morphology - In the past two decades a parametric multivariate regression modelling approach for analyzing growth curve data has achieved prominence. The approach, which has several advantages over classical analysis-of-variance and general multivariate approaches, consists of postulating, fitting, evaluating, and comparing parametric models for the data's mean structure and covariance structure. This article provides an overview of the approach, using unified terminology and notation. Well-established models and some developed more recently are described, with emphasis given to those models that allow for nonstationarity and for measurement times that differ across subjects and are unequally spaced. Graphical diagnostics that can assist with model postulation and evaluation are discussed, as are more formal methods for fitting and comparing models. Three examples serve to illustrate the methodology and to reveal the relative strengths and weaknesses of the various parametric models.
- The genetic architecture of growth traits plays a central role in shaping the growth, development, and evolution of organisms. While a limited number of models have been devised to estimate genetic effects on complex phenotypes, no model has been available to examine how gene actions and interactions alter the ontogenetic development of an organism and transform the altered ontogeny into descendants. In this article, we present a novel statistical model for mapping quantitative trait loci (QTL) determining the developmental process of complex traits. Our model is constructed within the traditional maximum-likelihood framework implemented with the EM algorithm. We employ biologically meaningful growth curve equations to model time-specific expected genetic values and the AR(1) model to structure the residual variance-covariance matrix among different time points. Because of a reduced number of parameters being estimated and the incorporation of biological principles, the new model displays increased statistical power to detect QTL exerting an effect on the shape of ontogenetic growth and development. The model allows for the tests of a number of biological hypotheses regarding the role of epistasis in determining biological growth, form, and shape and for the resolution of developmental problems at the interface with evolution. Using our newly developed model, we have successfully detected significant additive additive epistatic effects on stem height growth trajectories in a forest tree.
- The detection of genes that control quantitative characters is a problem of great interest to the genetic mapping community. Methods for locating these quantitative trait loci (QTL) relative to maps of genetic markers are now widely used. This paper addresses an issue common to all QTL mapping methods, that of determining an appropriate threshold value for declaring significant QTL effects. An empirical method is described, based on the concept of a permutation test, for estimating threshold values that are tailored to the experimental data at hand. The method is demonstrated using two real data sets derived from F(2) and recombinant inbred plant populations. An example using simulated data from a backcross design illustrates the effect of marker density on threshold values.
- Article
- May 1994
- GENETICS

Adequate separation of effects of possible multiple linked quantitative trait loci (QTLs) on mapping QTLs is the key to increasing the precision of QTL mapping. A new method of QTL mapping is proposed and analyzed in this paper by combining interval mapping with multiple regression. The basis of the proposed method is an interval test in which the test statistic on a marker interval is made to be unaffected by QTLs located outside a defined interval. This is achieved by fitting other genetic markers in the statistical model as a control when performing interval mapping. Compared with the current QTL mapping method (i.e., the interval mapping method which uses a pair or two pairs of markers for mapping QTLs), this method has several advantages. (1) By confining the test to one region at a time, it reduces a multiple dimensional search problem (for multiple QTLs) to a one dimensional search problem. (2) By conditioning linked markers in the test, the sensitivity of the test statistic to the position of individual QTLs is increased, and the precision of QTL mapping can be improved. (3) By selectively and simultaneously using other markers in the analysis, the efficiency of QTL mapping can be also improved. The behavior of the test statistic under the null hypothesis and appropriate critical value of the test statistic for an overall test in a genome are discussed and analyzed. A simulation study of QTL mapping is also presented which illustrates the utility, properties, advantages and disadvantages of the method. - The problem of detecting minor quantitative trait loci (QTL) responsible for genetic variation not explained by major QTL is of importance in the complete dissection of quantitative characters. Two extensions of the permutation-based method for estimating empirical threshold values are presented. These methods, the conditional empirical threshold (CET) and the residual empirical threshold (RET), yield critical values that can be used to construct tests for the presence of minor QTL effects while accounting for effects of known major QTL. The CET provides a completely nonparametric test through conditioning on markers linked to major QTL. It allows for general nonadditive interactions among QTL, but its practical application is restricted to regions of the genome that are unlinked to the major QTL. The RET assumes a structural model for the effect of major QTL, and a threshold is constructed using residuals from this structural model. The search space for minor QTL is unrestricted, and RET-based tests may be more powerful than the CET-based test when the structural model is approximately true.
- Body size is an archetypal quantitative trait with variation due to the segregation of many gene loci, each of relatively minor effect, and the environment. We examine the effects of quantitative trait loci (QTLs) on age-specific body weights and growth in the F2 intercross of the LG/J and SM/J strains of inbred mice. Weekly weights (1-10 wk) and 75 microsatellite genotypes were obtained for 535 mice. Interval mapping was used to locate and measure the genotypic effects of QTLs on body weight and growth. QTL effects were detected on 16 of the 19 autosomes with several chromosomes carrying more than one QTL. The number of QTLs for age-specific weights varied from seven at 1 week to 17 at 10 wk. The QTLs were each of relatively minor, subequal effect. QTLs affecting early and late growth were generally distinct, mapping to different chromosomal locations indicating separate genetic and physiological systems for early and later murine growth.
- A new statistical method for mapping quantitative trait loci (QTL), called multiple interval mapping (MIM), is presented. It uses multiple marker intervals simultaneously to fit multiple putative QTL directly in the model for mapping QTL. The MIM model is based on Cockerham's model for interpreting genetic parameters and the method of maximum likelihood for estimating genetic parameters. With the MIM approach, the precision and power of QTL mapping could be improved. Also, epistasis between QTL, genotypic values of individuals, and heritabilities of quantitative traits can be readily estimated and analyzed. Using the MIM model, a stepwise selection procedure with likelihood ratio test statistic as a criterion is proposed to identify QTL. This MIM method was applied to a mapping data set of radiata pine on three traits: brown cone number, tree diameter, and branch quality scores. Based on the MIM result, seven, six, and five QTL were detected for the three traits, respectively. The detected QTL individually contributed from approximately 1 to 27% of the total genetic variation. Significant epistasis between four pairs of QTL in two traits was detected, and the four pairs of QTL contributed approximately 10.38 and 14.14% of the total genetic variation. The asymptotic variances of QTL positions and effects were also provided to construct the confidence intervals. The estimated heritabilities were 0.5606, 0.5226, and 0. 3630 for the three traits, respectively. With the estimated QTL effects and positions, the best strategy of marker-assisted selection for trait improvement for a specific purpose and requirement can be explored. The MIM FORTRAN program is available on the worldwide web (http://www.stat.sinica.edu.tw/chkao/).
- Over 20 years ago, D. S. Falconer and others launched an important avenue of research into the quantitative of body size growth in mice. This study continues in that tradition by locating quantitative trait loci (QTLs) responsible for murine growth, such as age-specific weights and growth periods, and examining the genetic architecture for body weight. We identified a large number of potential QTLs in an earlier F2 intercross (Intercross I) of the SM/J and LG/J inbred mouse strains. Many of these QTLs are replicated in a second F2 intercross (Intercross II) between the same two strains. These replicated regions provide candidate regions for future fine-mapping studies. We also examined body size and growth QTLs using the combined data set from these two intercrosses, resulting in 96 microsatellite markers being scored for 1045 individuals. An examination of the genetic architecture for age-specific weight and growth periods resulted in locating 20 separate QTLs, which were mainly additive in nature, although dominance was found to affect early growth and body size. QTLs affecting early and late growth were generally distinct, mapping to separate chromosome locations. This QTL pattern indicates largely separate genetic and physiological systems for early and later murine growth, as Falconer suggested. We also found sex-specific QTLs for body size with implications for the evolution of sexual dimorphism.
- Several equations have been proposed to describe ontogenetic growth trajectories for organisms justified primarily on the goodness of fit rather than on any biological mechanism. Here, we derive a general quantitative model based on fundamental principles for the allocation of metabolic energy between maintenance of existing tissue and the production of new biomass. We thus predict the parameters governing growth curves from basic cellular properties and derive a single parameterless universal curve that describes the growth of many diverse species. The model provides the basis for deriving allometric relationships for growth rates and the timing of life history events.
- Unlike a character measured at a finite set of landmark points, function-valued traits are those that change as a function of some independent and continuous variable. These traits, also called infinite-dimensional characters, can be described as the character process and include a number of biologically, economically, or biomedically important features, such as growth trajectories, allometric scalings, and norms of reaction. Here we present a new statistical infrastructure for mapping quantitative trait loci (QTL) underlying the character process. This strategy, termed functional mapping, integrates mathematical relationships of different traits or variables within the genetic mapping framework. Logistic mapping proposed in this article can be viewed as an example of functional mapping. Logistic mapping is based on a universal biological law that for each and every living organism growth over time follows an exponential growth curve (e.g., logistic or S-shaped). A maximum-likelihood approach based on a logistic-mixture model, implemented with the EM algorithm, is developed to provide the estimates of QTL positions, QTL effects, and other model parameters responsible for growth trajectories. Logistic mapping displays a tremendous potential to increase the power of QTL detection, the precision of parameter estimation, and the resolution of QTL localization due to the small number of parameters to be estimated, the pleiotropic effect of a QTL on growth, and/or residual correlations of growth at different ages. More importantly, logistic mapping allows for testing numerous biologically important hypotheses concerning the genetic basis of quantitative variation, thus gaining an insight into the critical role of development in shaping plant and animal evolution and domestication. The power of logistic mapping is demonstrated by an example of a forest tree, in which one QTL affecting stem growth processes is detected on a linkage group using our method, whereas it cannot be detected using current methods. The advantages of functional mapping are also discussed.
- Article
- Sep 2002
- J THEOR BIOL

Most previous models of populations mixed for reproductive mode have omitted important local interactions between sexual and asexual individuals. We propose a cellular automaton model where local rules focus on fertilization and colonization. This model produces rich sets of data which are then studied by means of spatial statistics. Results point to the fixation of one of the two reproductive modes in the landscape. However, some examples of coexistence of sexual and asexual conspecifics over long periods of time are also found. This model is an example of a CA that diverges from its mean field approximation. The formation of sexual and asexual clusters reduces effective colonization rate in the CA and may account for this behaviour. - Many biological processes, from cellular metabolism to population dynamics, are characterized by particular allometric scaling (power-law) relationships between size and rate. Although such allometric relationships may be under genetic determination, their precise genetic mechanisms have not been clearly understood due to a lack of a statistical analytical method. In this paper, we present a basic statistical framework for mapping quantitative genes (or quantitative trait loci, QTL) responsible for universal quarter-power scaling laws of organic structure and function with the entire body size. Our model framework allows the testing of whether a single QTL affects the allometric relationship of two traits or whether more than one linked QTL is segregating. Like traditional multi-trait mapping, this new model can increase the power to detect the underlying QTL and the precision of its localization on the genome. Beyond the traditional method, this model is integrated with pervasive scaling laws to take advantage of the mechanistic relationships of biological structures and processes. Simulation studies indicate that the estimation precision of the QTL position and effect can be improved when the scaling relationship of the two traits is considered. The application of our model in a real example from forest trees leads to successful detection of a QTL governing the allometric relationship of third-year stem height with third-year stem biomass. The model proposed here has implications for genetic, evolutionary, biomedicinal and breeding research.
- The genetic architecture of growth traits plays a central role in shaping the growth, development, and evolution of organisms. While a limited number of models have been devised to estimate genetic effects on complex phenotypes, no model has been available to examine how gene actions and interactions alter the ontogenetic development of an organism and transform the altered ontogeny into descendants. In this article, we present a novel statistical model for mapping quantitative trait loci (QTL) determining the developmental process of complex traits. Our model is constructed within the traditional maximum-likelihood framework implemented with the EM algorithm. We employ biologically meaningful growth curve equations to model time-specific expected genetic values and the AR(1) model to structure the residual variance-covariance matrix among different time points. Because of a reduced number of parameters being estimated and the incorporation of biological principles, the new model displays increased statistical power to detect QTL exerting an effect on the shape of ontogenetic growth and development. The model allows for the tests of a number of biological hypotheses regarding the role of epistasis in determining biological growth, form, and shape and for the resolution of developmental problems at the interface with evolution. Using our newly developed model, we have successfully detected significant additive x additive epistatic effects on stem height growth trajectories in a forest tree.
- Most organisms display remarkable differences in morphological, anatomical, and developmental features between the two sexes. It has been recognized that these sex-dependent differences are controlled by an array of specific genetic factors, mediated through various environmental stimuli. In this paper, we present a unifying statistical model for mapping quantitative trait loci (QTL) that are responsible for sexual differences in growth trajectories during ontogenetic development. This model is derived within the maximum likelihood context, incorporated by sex-stimulated differentiation in growth form that is described by mathematical functions. A typical structural model is implemented to approximate time-dependent covariance matrices for longitudinal traits. This model allows for a number of biologically meaningful hypothesis tests regarding the effects of QTL on overall growth trajectories or particular stages of development. It is particularly powerful to test whether and how the genetic effects of QTL are expressed differently in different sexual backgrounds. Our model has been employed to map QTL affecting body mass growth trajectories in both male and female mice of an F2 population derived from the large (LG/J) and small (SM/J) mouse strains. We detected four growth QTL on chromosomes 6, 7, 11, and 15, two of which trigger different effects on growth curves between the two sexes. All the four QTL display significant genotype-sex interaction effects on the timing of maximal growth rate in the ontogenetic growth of mice. The implications of our model for studying the genetic architecture of growth trajectories and its extensions to some more general situations are discussed.
- Article
- Oct 2004
- BIOMETRICS

The incorporation of developmental control mechanisms of growth has proven to be a powerful tool in mapping quantitative trait loci (QTL) underlying growth trajectories. A theoretical framework for implementing a QTL mapping strategy with growth laws has been established. This framework can be generalized to an arbitrary number of time points, where growth is measured, and becomes computationally more tractable, when the assumption of variance stationarity is made. In practice, however, this assumption is likely to be violated for age-specific growth traits due to a scale effect. In this article, we present a new statistical model for mapping growth QTL, which also addresses the problem of variance stationarity, by using a transform-both-sides (TBS) model advocated by Carroll and Ruppert (1984, Journal of the American Statistical Association 79, 321-328). The TBS-based model for mapping growth QTL cannot only maintain the original biological properties of a growth model, but also can increase the accuracy and precision of parameter estimation and the power to detect a QTL responsible for growth differentiation. Using the TBS-based model, we successfully map a QTL governing growth trajectories to a linkage group in an example of forest trees. The statistical and biological properties of the estimates of this growth QTL position and effect are investigated using Monte Carlo simulation studies. The implications of our model for understanding the genetic architecture of growth are discussed. - Article
- Oct 2004
- STAT MED

Are there specific genes that control the pathogenesis of HIV infection? This question, which is of fundamental importance in designing personalized strategies of gene therapy to control HIV infection, can be examined by genetic mapping approaches. In this article, we present a new statistical model for unravelling the genetic mechanisms for the dynamic change of HIV that causes AIDS by marker-based linkage disequilibrium (LD) analyses. This new model is the extension of our functional mapping theory to integrate viral load trajectories within a genetic mapping framework. Earlier studies of HIV dynamics have led to various mathematical functions for modelling the kinetic curves of plasma virions and CD4 lymphocytes in HIV patients. Through incorporating these functions into the LD-based mapping procedure, we can identify and map individual quantitative trait loci (or QTL) responsible for viral pathogenesis. We derive a closed-form solution for estimating QTL allele frequency and marker-QTL linkage disequilibrium in the context of EM algorithm and implement the simplex algorithm to estimate the mathematical parameters describing the curve shapes of HIV pathogenesis. We performed different simulation scenarios based on currently used clinical designs in AIDS/HIV research to illustrate the utility and power of our model for genetic mapping of HIV dynamics. The implications of our model for genetic and genomic research into AIDS pathogenesis are discussed. - Understanding the genetic control of growth is fundamental to agricultural, evolutionary and biomedical genetic research. In this article, we present a statistical model for mapping quantitative trait loci (QTL) that are responsible for genetic differences in growth trajectories during ontogenetic development. This model is derived within the maximum likelihood context, implemented with the expectation-maximization algorithm. We incorporate mathematical aspects of growth processes to model the mean vector and structured antedependence models to approximate time-dependent covariance matrices for longitudinal traits. Our model has been employed to map QTL that affect body mass growth trajectories in both male and female mice of an F2 population derived from the Large and Small mouse strains. The results from this model are compared with those from the autoregressive-based functional mapping approach. Based on results from computer simulation studies, we suggest that these two models are alternative to one another and should be used simultaneously for the same dataset.
- Article
- Oct 2006
- J THEOR BIOL

A general growth model derived from basic cellular properties can be used to describe the dynamic process of cancer growth with mathematical equations. It has been recognized that cancer growth is under genetic control, with a multitude of interacting genes each segregating in a Mendelian fashion and displaying environmental sensitivity. In this article, we integrate the mathematical aspects of the pervasive growth model into a statistical framework for the identification of quantitative trait nucleotides that underlie cancer growth. This integrative framework is constructed with a single nucleotide polymorphism-based haplotype blocking analysis. Simulation studies have been performed to demonstrate the usefulness of the model. The proposed model provides a generic platform model for testing and detecting specific DNA sequence variants that regulates the timing of cancer emergence, growth and differentiation. - Many biological processes, from cellular metabolism to population dynamics, are characterized by particular allometric scaling relationships between rate and size (power laws). A statistical model for mapping specific quantitative trait loci (QTLs) that are responsible for allometric scaling laws has been developed. We present an improved model for allometric mapping of QTLs based on a more general allometry equation. This improved model includes two steps: (1) use model II regression analysis to estimate the parameters underlying universal allometric scaling laws, and (2) substitute the estimated allometric parameters in the mixture-based mapping model to obtain the estimation of QTL position and effects. This model has been validated by a real example for a mouse F2 progeny, in which two QTLs were detected on different chromosomes that determine the allometric relationship between growth rate and body weight.
- Genes that control circadian rhythms in organisms have been recognized, but have been difficult to detect because circadian behavior comprises periodically dynamic traits and is sensitive to environmental changes. We present a statistical model for mapping and characterizing specific genes or quantitative trait loci (QTL) that affect variations in rhythmic responses. This model integrates a system of differential equations into the framework for functional mapping, allowing hypotheses about the interplay between genetic actions and periodic rhythms to be tested. A simulation approach based on sustained circadian oscillations of the clock proteins and their mRNAs has been designed to test the statistical properties of the model. The model has significant implications for probing the molecular genetic mechanism of rhythmic oscillations through the detection of the clock QTL throughout the genome.
- Whether and how thermal reaction norm is under genetic control is fundamental to understand the mechanistic basis of adaptation to novel thermal environments. However, the genetic study of thermal reaction norm is difficult because it is often expressed as a continuous function or curve. Here we derive a statistical model for dissecting thermal performance curves into individual quantitative trait loci (QTL) with the aid of a genetic linkage map. The model is constructed within the maximum likelihood context and implemented with the EM algorithm. It integrates the biological principle of responses to temperature into a framework for genetic mapping through rigorous mathematical functions established to describe the pattern and shape of thermal reaction norms. The biological advantages of the model lie in the decomposition of the genetic causes for thermal reaction norm into its biologically interpretable modes, such as hotter-colder, faster-slower and generalist-specialist, as well as the formulation of a series of hypotheses at the interface between genetic actions/interactions and temperature-dependent sensitivity. The model is also meritorious in statistics because the precision of parameter estimation and power of QTLdetection can be increased by modeling the mean-covariance structure with a small set of parameters. The results from simulation studies suggest that the model displays favorable statistical properties and can be robust in practical genetic applications. The model provides a conceptual platform for testing many ecologically relevant hypotheses regarding organismic adaptation within the Eco-Devo paradigm.
- Functional mapping has emerged as a powerful tool for mapping quantitative trait loci (QTL) that control developmental patterns of complex dynamic traits. Original functional mapping has been constructed within the context of simple interval mapping, without consideration of separate multiple linked QTL for a dynamic trait. In this article, we present a statistical framework for mapping QTL that affect dynamic traits by capitalizing on the strengths of functional mapping and composite interval mapping. Within this so-called composite functional-mapping framework, functional mapping models the time-dependent genetic effects of a QTL tested within a marker interval using a biologically meaningful parametric function, whereas composite interval mapping models the time-dependent genetic effects of the markers outside the test interval to control the genome background using a flexible nonparametric approach based on Legendre polynomials. Such a semiparametric framework was formulated by a maximum-likelihood model and implemented with the EM algorithm, allowing for the estimation and the test of the mathematical parameters that define the QTL effects and the regression coefficients of the Legendre polynomials that describe the marker effects. Simulation studies were performed to investigate the statistical behavior of composite functional mapping and compare its advantage in separating multiple linked QTL as compared to functional mapping. We used the new mapping approach to analyze a genetic mapping example in rice, leading to the identification of multiple QTL, some of which are linked on the same chromosome, that control the developmental trajectory of leaf age.
- Article
- Mar 2006
- Biometrika

We propose a nonparametric method for identifying parsimony and for producing a statistically efficient estimator of a large covariance matrix. We reparameterise a covariance matrix through the modified Cholesky decomposition of its inverse or the one-step-ahead predictive representation of the vector of responses and reduce the nonintuitive task of modelling covariance matrices to the familiar task of model selection and estimation for a sequence of regression models. The Cholesky factor containing these regression coefficients is likely to have many off-diagonal elements that are zero or close to zero. Penalised normal likelihoods in this situation with L-sub-1 and L-sub-2 penalities are shown to be closely related to Tibshirani's (1996) LASSO approach and to ridge regression. Adding either penalty to the likelihood helps to produce more stable estimators by introducing shrinkage to the elements in the Cholesky factor, while, because of its singularity, the L-sub-1 penalty will set some elements to zero and produce interpretable models. An algorithm is developed for computing the estimator and selecting the tuning parameter. The proposed maximum penalised likelihood estimator is illustrated using simulation and a real dataset involving estimation of a 102 × 102 covariance matrix. Copyright 2006, Oxford University Press. - Estimation of an unstructured covariance matrix is difficult because of its positive-definiteness constraint. This obstacle is removed by regressing each variable on its predecessors, so that estimation of a covariance matrix is shown to be equivalent to that of estimating a sequence of varying-coefficient and varying-order regression models. Our framework is similar to the use of increasing-order autoregressive models in approximating the covariance matrix or the spectrum of a stationary time series. As an illustration, we adopt Fan & Zhang's (2000) two-step estimation of functional linear models and propose nonparametric estimators of covariance matrices which are guaranteed to be positive definite. For parsimony a suitable order for the sequence of (auto)regression models is found using penalised likelihood criteria like AIC and BIC. Some asymptotic results for the local polynomial estimators of components of a covariance matrix are established. Two longitudinal datasets are analysed to illustrate the methodology. A simulation study reveals the advantage of the nonparametric covariance estimator over the sample covariance matrix for large covariance matrices. Copyright Biometrika Trust 2003, Oxford University Press.
- Article
- Feb 2002
- J Am Stat Assoc

This article proposes a data-driven method to identify parsimony in the covariance matrix of longitudinal data and to exploit any such parsimony to produce a statistically efficient estimator of the covariance matrix. The approach parameterizes the covariance matrix through the Cholesky decomposition of its inverse. For longitudinal data, this is a one-step-ahead predictive representation, and the Cholesky factor is likely to have off diagonal elements that are zero or close to zero. A hierarchical Bayesian model is used to identify any such zeros in the Cholesky factor, similar to approaches that have been successful in Bayesian variable selection. The model is estimated using a Markov chain Monte Carlo sampling scheme that is computationally efficient and can be applied to covariance matrices of high dimension. It is demonstrated through simulations that the proposed method compares favorably in terms of statistical efficiency with a highly regarded competing approach. The estimator is applied to three real examples in which the dimension of the covariance matrix is large relative to the sample size. The first two examples are from biometry and electricity demand modeling and are longitudinal. The third example is from finance and highlights the potential of our method for estimating cross-sectional covariance matrices.