University of Georgia Department of Statistics
Overview
Works:  48 works in 48 publications in 1 language and 53 library holdings 

Genres:  Academic theses 
Classifications:  TD365, 
Publication Timeline
.
Most widely held works by
University of Georgia
Development of confidence intervals and monthly design values for low streamflows by
William P McCormick(
Book
)
1 edition published in 1986 in English and held by 7 WorldCat member libraries worldwide
1 edition published in 1986 in English and held by 7 WorldCat member libraries worldwide
Dimension reduction and multisource fusion for big data with applications in bioinformatics by
Yiwen Liu(
)
1 edition published in 2018 in English and held by 1 WorldCat member library worldwide
With the rapid development of technology, increasing amount of data has been produced from many fields of science, such as biology, neuroscience, and engineering. The inadequate sample is no longer a bottleneck of modern statistical research. More often, we are facing data of extremely high dimensionality or coming from remarkably different sources. How to effectively extract information from the largescale and highdimensional data or data with various types and formats poses new statistical challenges. In this thesis, I develop novel statistical method and theory to harness the various issues in analyzing the highdimensional or multisource big data. More specifically, I propose a modelfree variable screening method for highdimensional data regression, a data level fusion method and a feature level fusion method to integrate multiple data sources for improved knowledge discovery. The consistency property in screening redundant variables and asymptotic property for the fused data are established respectively to provide theoretical underpinnings. The proposed methods are widely applied to many scientific investigations including genomic, epigenetic and metabolomic studies, and greatly help the scientific development in other disciplines
1 edition published in 2018 in English and held by 1 WorldCat member library worldwide
With the rapid development of technology, increasing amount of data has been produced from many fields of science, such as biology, neuroscience, and engineering. The inadequate sample is no longer a bottleneck of modern statistical research. More often, we are facing data of extremely high dimensionality or coming from remarkably different sources. How to effectively extract information from the largescale and highdimensional data or data with various types and formats poses new statistical challenges. In this thesis, I develop novel statistical method and theory to harness the various issues in analyzing the highdimensional or multisource big data. More specifically, I propose a modelfree variable screening method for highdimensional data regression, a data level fusion method and a feature level fusion method to integrate multiple data sources for improved knowledge discovery. The consistency property in screening redundant variables and asymptotic property for the fused data are established respectively to provide theoretical underpinnings. The proposed methods are widely applied to many scientific investigations including genomic, epigenetic and metabolomic studies, and greatly help the scientific development in other disciplines
Assessing UGA transportation and parking services data collection using RouteMatch Software⁴́Ø by Mariya Nadein(
)
1 edition published in 2017 in English and held by 1 WorldCat member library worldwide
Determining the best data management techniques for an enterprise is industryspecific, but generally involves the standardization and documentation of data collection procedures. The University of Georgia's Transportation and Parking Services (UGA Transit) is in the process of searching for an intelligent transportation system (ITS) software vendor in order to enhance efficacy of the transit network and tap into costcutting opportunities. We use raw data exported from their current ITS provider, RouteMatch Software⁴́Ø, to check its validity, provide some insight into bus patterns in relation to ridership and transit time, and recommend strategies to obtain more reliable data in the future. We found that opportunities exist to collect better information such as creating codelogic that flags improbable observations from being recorded and explore ways to manage ridership while decreasing travel time between stops. Additionally, we recommend a method for the next ITS provider to decrease headway fluctuation
1 edition published in 2017 in English and held by 1 WorldCat member library worldwide
Determining the best data management techniques for an enterprise is industryspecific, but generally involves the standardization and documentation of data collection procedures. The University of Georgia's Transportation and Parking Services (UGA Transit) is in the process of searching for an intelligent transportation system (ITS) software vendor in order to enhance efficacy of the transit network and tap into costcutting opportunities. We use raw data exported from their current ITS provider, RouteMatch Software⁴́Ø, to check its validity, provide some insight into bus patterns in relation to ridership and transit time, and recommend strategies to obtain more reliable data in the future. We found that opportunities exist to collect better information such as creating codelogic that flags improbable observations from being recorded and explore ways to manage ridership while decreasing travel time between stops. Additionally, we recommend a method for the next ITS provider to decrease headway fluctuation
Modelling precipitation volumes using a weibull mixture and the gamma generalized linear model by Vineet Aswin Vora(
)
1 edition published in 2018 in English and held by 1 WorldCat member library worldwide
A novel approach is used to model the distribution of precipitation volumes using Weibull mixture model and gamma model as a function of Convective Available Potential Energy (CAPE). Seasonal Weibull mixture model is t to precipitation volumes to determine the distributions of convective and stratiform precipitation for Lakewood, Fort Collins and Boulder, Colorado. This was achieved by implementing NelderMead Algorithm to minimize the negative loglikelihood. We find that season is a significant factor in determining the mixture distribution. In addition, seasonal gamma regressions with log link were estimated to model the precipitation volumes as a function of CAPE and location. The models accurately predict rainfall/snowfall events with low or medium amount of precipitation in general. The Fall model also predicts events with high precipitation
1 edition published in 2018 in English and held by 1 WorldCat member library worldwide
A novel approach is used to model the distribution of precipitation volumes using Weibull mixture model and gamma model as a function of Convective Available Potential Energy (CAPE). Seasonal Weibull mixture model is t to precipitation volumes to determine the distributions of convective and stratiform precipitation for Lakewood, Fort Collins and Boulder, Colorado. This was achieved by implementing NelderMead Algorithm to minimize the negative loglikelihood. We find that season is a significant factor in determining the mixture distribution. In addition, seasonal gamma regressions with log link were estimated to model the precipitation volumes as a function of CAPE and location. The models accurately predict rainfall/snowfall events with low or medium amount of precipitation in general. The Fall model also predicts events with high precipitation
A statistical analysis of crime in San Luis Obispo (20092017) by
Courtney Patterson(
)
1 edition published in 2018 in English and held by 1 WorldCat member library worldwide
In this thesis, police reports from the city of San Luis Obispo, California (20092017) are explored and analyzed in order to identify various trends and to forecast future criminal activity. In particular, the distributions of the different groups of crimes are visualized over various time periods and several autoregressive integrated moving average (ARIMA) time series models are considered. The graphics are important for reasons such as recognizing trends and patterns over certain periods of time
1 edition published in 2018 in English and held by 1 WorldCat member library worldwide
In this thesis, police reports from the city of San Luis Obispo, California (20092017) are explored and analyzed in order to identify various trends and to forecast future criminal activity. In particular, the distributions of the different groups of crimes are visualized over various time periods and several autoregressive integrated moving average (ARIMA) time series models are considered. The graphics are important for reasons such as recognizing trends and patterns over certain periods of time
Model comparison with squared sharpe ratios of mimicking portfolios by Jihyeon Kim(
)
1 edition published in 2018 in English and held by 1 WorldCat member library worldwide
There are various asset pricing models proposed in the field of finance by using different traded and nontraded factors. In this paper, a variety of statistical methodologies are introduced for comparisons based on differences between squared Sharpe ratios of such models. Especially, in order to compare mimicking portfolios with nontraded factors, different computations are used for squared Sharpe ratios defined by whether two or more models are nested or nonnested. For empirical analysis, five asset pricing models with traded factors are used; FamaFrench 3 factor model by Fama and French (1992), FamaFrench 5 factor model by Fama and French (2017), FamaFrench models with a momentum factor by Jegadeesh and Titman (1993), and Carhart (1997), and the betting against beta model by Frazzini and Pedersen (2014). For mimicking portfolios, four different nontraded factors, which are proxies for consumption, are compared with those five models
1 edition published in 2018 in English and held by 1 WorldCat member library worldwide
There are various asset pricing models proposed in the field of finance by using different traded and nontraded factors. In this paper, a variety of statistical methodologies are introduced for comparisons based on differences between squared Sharpe ratios of such models. Especially, in order to compare mimicking portfolios with nontraded factors, different computations are used for squared Sharpe ratios defined by whether two or more models are nested or nonnested. For empirical analysis, five asset pricing models with traded factors are used; FamaFrench 3 factor model by Fama and French (1992), FamaFrench 5 factor model by Fama and French (2017), FamaFrench models with a momentum factor by Jegadeesh and Titman (1993), and Carhart (1997), and the betting against beta model by Frazzini and Pedersen (2014). For mimicking portfolios, four different nontraded factors, which are proxies for consumption, are compared with those five models
Maximum monthly rainfall behavior along the front range of Colorado by Jeremy Marcus Mulcahey(
)
1 edition published in 2016 in English and held by 1 WorldCat member library worldwide
This work investigates long term trends in observed monthly maximum 24h precipitation for Boulder, Colorado, which experienced a historic 24h rainfall in September, 2013. The long term precipitation trends for four cities (Fort Collins, Evergreen, Lakewood, and Greeley) in the area around Boulder are also analyzed, yet the historic event observed in Boulder was not observed in those cities. The maximum precipitation trends for Fort Collins, Lakewood, and Boulder, which have presumably similar geographic features, showed increases in the mean, variance, and probability of experiencing certain rainfall thresholds over time. Each of the aforementioned values were decreasing over time in the city of Evergreen, which appears to be geographically dissimilar from the other cities investigated. Lastly, the trend lines for Greeley were highly variable, yet have an increasing trend since the mid 1900s. The trend lines for individual months showed that these increasing trends are not spread among the 12 months. Certain months are experiencing greater maximum rainfall on average, while others are experiencing less
1 edition published in 2016 in English and held by 1 WorldCat member library worldwide
This work investigates long term trends in observed monthly maximum 24h precipitation for Boulder, Colorado, which experienced a historic 24h rainfall in September, 2013. The long term precipitation trends for four cities (Fort Collins, Evergreen, Lakewood, and Greeley) in the area around Boulder are also analyzed, yet the historic event observed in Boulder was not observed in those cities. The maximum precipitation trends for Fort Collins, Lakewood, and Boulder, which have presumably similar geographic features, showed increases in the mean, variance, and probability of experiencing certain rainfall thresholds over time. Each of the aforementioned values were decreasing over time in the city of Evergreen, which appears to be geographically dissimilar from the other cities investigated. Lastly, the trend lines for Greeley were highly variable, yet have an increasing trend since the mid 1900s. The trend lines for individual months showed that these increasing trends are not spread among the 12 months. Certain months are experiencing greater maximum rainfall on average, while others are experiencing less
The periodic solution and the global asymptotic stability for northeastern Puerto Rico ecosystem by
Qihu Zhang(
)
1 edition published in 2018 in English and held by 1 WorldCat member library worldwide
The paper deals with the following Northeastern Puerto Rico Ecosystem% begin{equation*} left{ begin{array}{l} frac{dN}{dt}=left(INright) vfrac{alpha _{N}}{d}frac{[Chl]N}{% K_{N}+leftvert Nrightvert }, frac{d[Chl]}{dt}=[Chl]left[mu left(frac{PAR^{gamma }}{PAR_{min }^{gamma }+PAR^{gamma }}frac{leftvert N_{B}rightvert ^{alpha 1}N_{B}% }{N_{Bmin }^{alpha }+leftvert N_{B}rightvert ^{alpha }}xi right) D% right], frac{dN_{B}}{dt}=frac{alpha _{N}N}{K_{N}+leftvert Nrightvert }mu left(frac{PAR^{gamma }}{PAR_{min }^{gamma }+PAR^{gamma }}frac{% leftvert N_{B}rightvert ^{alpha 1}N_{B}}{N_{Bmin }^{alpha }+leftvert N_{B}rightvert ^{alpha }}xi right) N_{B}.% end{array}% right. end{equation*}% Our aim is to approach the existence of periodic solution under nonautonomous assumption by using the method of LeraySchauder degree, and the global asymptotically stable under autonomous assumption. Our global asymptotically stable results partly generalized [textit{Patrick De Leenheer, Simon A. Levin, Eduardo D. Sontag, Christopher A. Klausmeier, Global stability in a chemostat with multiple nutrients, J. Math. Biol. 52 (2006), 419438}] from $v=D$ to $vgeq D$
1 edition published in 2018 in English and held by 1 WorldCat member library worldwide
The paper deals with the following Northeastern Puerto Rico Ecosystem% begin{equation*} left{ begin{array}{l} frac{dN}{dt}=left(INright) vfrac{alpha _{N}}{d}frac{[Chl]N}{% K_{N}+leftvert Nrightvert }, frac{d[Chl]}{dt}=[Chl]left[mu left(frac{PAR^{gamma }}{PAR_{min }^{gamma }+PAR^{gamma }}frac{leftvert N_{B}rightvert ^{alpha 1}N_{B}% }{N_{Bmin }^{alpha }+leftvert N_{B}rightvert ^{alpha }}xi right) D% right], frac{dN_{B}}{dt}=frac{alpha _{N}N}{K_{N}+leftvert Nrightvert }mu left(frac{PAR^{gamma }}{PAR_{min }^{gamma }+PAR^{gamma }}frac{% leftvert N_{B}rightvert ^{alpha 1}N_{B}}{N_{Bmin }^{alpha }+leftvert N_{B}rightvert ^{alpha }}xi right) N_{B}.% end{array}% right. end{equation*}% Our aim is to approach the existence of periodic solution under nonautonomous assumption by using the method of LeraySchauder degree, and the global asymptotically stable under autonomous assumption. Our global asymptotically stable results partly generalized [textit{Patrick De Leenheer, Simon A. Levin, Eduardo D. Sontag, Christopher A. Klausmeier, Global stability in a chemostat with multiple nutrients, J. Math. Biol. 52 (2006), 419438}] from $v=D$ to $vgeq D$
Nonparametric methods for big and complex datasets under a reproducing kernel Hilbert space framework by
Xiaoxiao Sun(
)
1 edition published in 2018 in English and held by 1 WorldCat member library worldwide
Large and complex data have been generated routinely from various sources, for instance, time course biological studies and social media. Classic nonparametric models, such as smoothing spline ANOVA models, are not well equipped to analyze such large and complex data. To overcome these challenges, I propose novel nonparametric methods under a reproducing kernel Hilbert space framework to (1) significantly reduce daunting computational costs of selecting smoothing parameters for smoothing spline ANOVA models; (2) model the data with a functional response and a functional predictor; (3) accurately identify differentially expressed genes in time course RNAseq data. To validate my proposed methods, I conduct simulation studies and apply the proposed methods to real data studies. In the end, I present derivations and theoretical proofs
1 edition published in 2018 in English and held by 1 WorldCat member library worldwide
Large and complex data have been generated routinely from various sources, for instance, time course biological studies and social media. Classic nonparametric models, such as smoothing spline ANOVA models, are not well equipped to analyze such large and complex data. To overcome these challenges, I propose novel nonparametric methods under a reproducing kernel Hilbert space framework to (1) significantly reduce daunting computational costs of selecting smoothing parameters for smoothing spline ANOVA models; (2) model the data with a functional response and a functional predictor; (3) accurately identify differentially expressed genes in time course RNAseq data. To validate my proposed methods, I conduct simulation studies and apply the proposed methods to real data studies. In the end, I present derivations and theoretical proofs
Predictive biomarker reproducibility modeling with censored data by Qian Kuang(
)
1 edition published in 2016 in English and held by 1 WorldCat member library worldwide
Breast cancer is the most commonly diagnosed cancer within women. A great amount of research has focused on discovering and evaluating predictive biomarkers. In our research, we investigate the interaction between a biomarker and treatment effects(true Î, which is the decrease in the population event rate under markerbased treatment versus a standard of care)based on the assumption of Cox regression model, and then we conduct a simulation to calculate the estimated Î₈ under the range of ICC from 0 to 1. We plot the curve of estimated Î₈ vs. ICC under four different settings. Then we conduct a random effects simulation for the biomarker Ki67, and get the ICC of biomarker Ki67. We conclude that the biomarker is better to detect the treatment effect when the ICC value is greater. We could get the true value of risk rate decrease under markerbased treatment of particular biomarker if we know the estimated value and the ICC of the biomarker in experiments. Our study is informative to evaluate the predictive biomarker detection of treatment effects in cancer
1 edition published in 2016 in English and held by 1 WorldCat member library worldwide
Breast cancer is the most commonly diagnosed cancer within women. A great amount of research has focused on discovering and evaluating predictive biomarkers. In our research, we investigate the interaction between a biomarker and treatment effects(true Î, which is the decrease in the population event rate under markerbased treatment versus a standard of care)based on the assumption of Cox regression model, and then we conduct a simulation to calculate the estimated Î₈ under the range of ICC from 0 to 1. We plot the curve of estimated Î₈ vs. ICC under four different settings. Then we conduct a random effects simulation for the biomarker Ki67, and get the ICC of biomarker Ki67. We conclude that the biomarker is better to detect the treatment effect when the ICC value is greater. We could get the true value of risk rate decrease under markerbased treatment of particular biomarker if we know the estimated value and the ICC of the biomarker in experiments. Our study is informative to evaluate the predictive biomarker detection of treatment effects in cancer
Regularized aggregation approaches for complex data by
Liyou Wang(
)
1 edition published in 2016 in English and held by 1 WorldCat member library worldwide
The objective of this dissertation is to develop penalized aggregation methods in order to improve prediction and/or representative summaries in various scenarios. We first introduce predictive models in both regression and classification problems when data batches are collected in a sequential manner. With streaming data, information is constantly being updated and a major statistical challenge for these types of data is, that the underlying distribution and/or the true inputoutput dependency might change over time, a phenomenon known as concept drift. The concept drift phenomenon makes the learning process complicated because a predictive model constructed on the past data is no longer consistent with new examples. In order to effectively track concept drift, we propose several novel modelcombining methods using constrained and penalized regression that possesses grouping properties. The new learning methods enable us to select data batches as groups that are relevant to the current one, reduce the effects of irrelevant batches, and adaptively reflect the degree of concept drift emerging in data streams. We study theoretical properties and finite sample performances using simulated and real examples. The analytical and empirical results indicate that the proposed methods can effectively adapt to various types of concept drift and outperform existing methods. Second, we build an aggregation method on statistical parameter maps (SPM) to improve graphical representation in functional magnetic resonance imaging (fMRI) data analysis. A combination of SPM from individual subjects is an important step in a group analysis. They are usually combined in a simple average form, which might be affected by outlying subjects. For example, a $t$ test is prone to making false detection and/or nondetection when extreme values are observed. We propose a regularized unsupervised aggregation method for SPM to find an optimal weight for aggregation, detect possible outlying subjects, and mitigate the effect of outlying subjects. We also develop a bootstrapbased weighted $t$ test based on the optimal weight to construct a robust activation map to outlying subjects. Studying the proposed methods numerically, we demonstrate that the proposed approach can effectively detect outlying subjects by lowering their weights, and produces robust SPMs
1 edition published in 2016 in English and held by 1 WorldCat member library worldwide
The objective of this dissertation is to develop penalized aggregation methods in order to improve prediction and/or representative summaries in various scenarios. We first introduce predictive models in both regression and classification problems when data batches are collected in a sequential manner. With streaming data, information is constantly being updated and a major statistical challenge for these types of data is, that the underlying distribution and/or the true inputoutput dependency might change over time, a phenomenon known as concept drift. The concept drift phenomenon makes the learning process complicated because a predictive model constructed on the past data is no longer consistent with new examples. In order to effectively track concept drift, we propose several novel modelcombining methods using constrained and penalized regression that possesses grouping properties. The new learning methods enable us to select data batches as groups that are relevant to the current one, reduce the effects of irrelevant batches, and adaptively reflect the degree of concept drift emerging in data streams. We study theoretical properties and finite sample performances using simulated and real examples. The analytical and empirical results indicate that the proposed methods can effectively adapt to various types of concept drift and outperform existing methods. Second, we build an aggregation method on statistical parameter maps (SPM) to improve graphical representation in functional magnetic resonance imaging (fMRI) data analysis. A combination of SPM from individual subjects is an important step in a group analysis. They are usually combined in a simple average form, which might be affected by outlying subjects. For example, a $t$ test is prone to making false detection and/or nondetection when extreme values are observed. We propose a regularized unsupervised aggregation method for SPM to find an optimal weight for aggregation, detect possible outlying subjects, and mitigate the effect of outlying subjects. We also develop a bootstrapbased weighted $t$ test based on the optimal weight to construct a robust activation map to outlying subjects. Studying the proposed methods numerically, we demonstrate that the proposed approach can effectively detect outlying subjects by lowering their weights, and produces robust SPMs
Bt cotton and spatial autocorrelation of yield by
Ran Huo(
)
1 edition published in 2019 in English and held by 1 WorldCat member library worldwide
The spatial autocorrelation of crop yield violates the independence assumption of conventional methods such as ordinary least square (OLS). Therefore, in the model of crop yield, spatial structure must be incorporated. More often, crop yields are available across space as well as over time, and the additional dimension allows the estimation of the full spatial covariance matrix, using the time dimension. However, the stationarity of the spatial pattern does not necessarily hold over time. In this study, we present empirical evidence that the spatial autocorrelation pattern of cotton yield can be fundamentally changed by the adoption of Bt cotton seeds. The finding of this study provides a cautionary note that spatial autocorrelation may vary over time due to technological change
1 edition published in 2019 in English and held by 1 WorldCat member library worldwide
The spatial autocorrelation of crop yield violates the independence assumption of conventional methods such as ordinary least square (OLS). Therefore, in the model of crop yield, spatial structure must be incorporated. More often, crop yields are available across space as well as over time, and the additional dimension allows the estimation of the full spatial covariance matrix, using the time dimension. However, the stationarity of the spatial pattern does not necessarily hold over time. In this study, we present empirical evidence that the spatial autocorrelation pattern of cotton yield can be fundamentally changed by the adoption of Bt cotton seeds. The finding of this study provides a cautionary note that spatial autocorrelation may vary over time due to technological change
A mixed effect model with feature extraction for functional magnetic resonance imaging (fMRI) data by Sooyoung Kim(
)
1 edition published in 2017 in English and held by 1 WorldCat member library worldwide
The Primary goal of this research is an application of feature extraction and a mixed model to functional Magnetic Resonance Imaging (fMRI) data. The goal of study is a comparison of multiple groups of subjects when they conduct a cognitive task. Since fMRI data of interest are the information of stimulusresponse reactions from human brain activity over time, they show repeated patterns in the signals. Therefore, we use the feature extraction method that collects the characteristics or patterns of data. Then, we apply a mixed model that includes both fixed and random effects to find any group difference. Through a simulation study we find a mixed model with feature extraction approach effective for detecting a difference between groups. Finally, we have applied the approach on the11 regions of interest in human brain from the cognitive task fMRI data, and found that the region called Striatum shows significant difference
1 edition published in 2017 in English and held by 1 WorldCat member library worldwide
The Primary goal of this research is an application of feature extraction and a mixed model to functional Magnetic Resonance Imaging (fMRI) data. The goal of study is a comparison of multiple groups of subjects when they conduct a cognitive task. Since fMRI data of interest are the information of stimulusresponse reactions from human brain activity over time, they show repeated patterns in the signals. Therefore, we use the feature extraction method that collects the characteristics or patterns of data. Then, we apply a mixed model that includes both fixed and random effects to find any group difference. Through a simulation study we find a mixed model with feature extraction approach effective for detecting a difference between groups. Finally, we have applied the approach on the11 regions of interest in human brain from the cognitive task fMRI data, and found that the region called Striatum shows significant difference
Towards understanding the interplay between cellular stresses and cancer development by
Sha Cao(
)
1 edition published in 2017 in English and held by 1 WorldCat member library worldwide
The development of neoplastic cells is hypothesized to be the result of cells responding to a stressful microenvironment such as chronic hypoxia, increased ROS and persistent immune attacks. Distinct levels of oxidative stress, estimated by gene markers in ROSgenerating processes, are found to explain well the differences in disease incidences rates associated with different cancer types in different regions of the world. Further, increased levels of ROS could force the cells to induce higher antioxidant synthesis. This process could compete for sulfur resources with SAM synthesis used for DNA methylation, and eventually lead to a globally reduced level of DNA methylation. In metastatic cancer, oxidized cholesterol and its further metabolized derivatives are found to be a key driver of the explosive growth of postmetastatic cancers. My work suggests that it is the change in the O2 level between the metastasized and the primary sites, i.e., from O2 poor to O2 rich, that leads to the substantially increased uptake and de novo synthesis of cholesterol as well as oxidation and further metabolism of cholesterol towards the production of oxysterol and steroidal hormones, all powerful growth signals. To understand how various stress types may drive the unique biology of cancer, we need to study cancer tissues rather than cancer cell line data since the former contains all the relevant information but the latter does not. Compared to the cellbased omic data, observed tissuebased geneexpression data are the results of geneexpression levels summed over all cell types, such as cancer cells, multiple immune cell types, fat cells, and normal cells in the tissues. A novel algorithm for deconvoluting tissuebased data to the celltype specific contributions is developed based on the following information: (1) genes in each cell type are expressed in a coordinated manner, specifically they are grouped into pathways whose genes are coexpressed; and (2) different cell types tend to have different sets of pathways activated
1 edition published in 2017 in English and held by 1 WorldCat member library worldwide
The development of neoplastic cells is hypothesized to be the result of cells responding to a stressful microenvironment such as chronic hypoxia, increased ROS and persistent immune attacks. Distinct levels of oxidative stress, estimated by gene markers in ROSgenerating processes, are found to explain well the differences in disease incidences rates associated with different cancer types in different regions of the world. Further, increased levels of ROS could force the cells to induce higher antioxidant synthesis. This process could compete for sulfur resources with SAM synthesis used for DNA methylation, and eventually lead to a globally reduced level of DNA methylation. In metastatic cancer, oxidized cholesterol and its further metabolized derivatives are found to be a key driver of the explosive growth of postmetastatic cancers. My work suggests that it is the change in the O2 level between the metastasized and the primary sites, i.e., from O2 poor to O2 rich, that leads to the substantially increased uptake and de novo synthesis of cholesterol as well as oxidation and further metabolism of cholesterol towards the production of oxysterol and steroidal hormones, all powerful growth signals. To understand how various stress types may drive the unique biology of cancer, we need to study cancer tissues rather than cancer cell line data since the former contains all the relevant information but the latter does not. Compared to the cellbased omic data, observed tissuebased geneexpression data are the results of geneexpression levels summed over all cell types, such as cancer cells, multiple immune cell types, fat cells, and normal cells in the tissues. A novel algorithm for deconvoluting tissuebased data to the celltype specific contributions is developed based on the following information: (1) genes in each cell type are expressed in a coordinated manner, specifically they are grouped into pathways whose genes are coexpressed; and (2) different cell types tend to have different sets of pathways activated
Bayesian framework for developing and evaluating medical screening tests for early disease detection with applications in
oncology by Alexei C Ionan(
)
1 edition published in 2017 in English and held by 1 WorldCat member library worldwide
Abstract We build a comprehensive Bayesian framework for medical screening test development and evaluation. The hallmarks of our approach are accurate quantification and propagation of uncertainty with optimal decision making under such uncertainty, as well as explicit inclusion of a clinical context and decisiontheoretic perspective. We determine necessary characteristics for a clinically relevant screening test, evaluate methods for credible interval calculation in small samples, and provide guidance to researchers developing a screening test, and address test reproducibility and design challenges. To our knowledge, this is the first comprehensive treatment of the problem  literature only touched the surface and does not provide concrete advice to researchers. Our work fills this gap by making contributions to the individual parts of the process, and, more importantly, explicitly connecting various aspects of medical test design. Although this framework was developed primarily for the medical screening testing, some results can be readily generalized to other settings, such as rare event surveillance and sequential decision making
1 edition published in 2017 in English and held by 1 WorldCat member library worldwide
Abstract We build a comprehensive Bayesian framework for medical screening test development and evaluation. The hallmarks of our approach are accurate quantification and propagation of uncertainty with optimal decision making under such uncertainty, as well as explicit inclusion of a clinical context and decisiontheoretic perspective. We determine necessary characteristics for a clinically relevant screening test, evaluate methods for credible interval calculation in small samples, and provide guidance to researchers developing a screening test, and address test reproducibility and design challenges. To our knowledge, this is the first comprehensive treatment of the problem  literature only touched the surface and does not provide concrete advice to researchers. Our work fills this gap by making contributions to the individual parts of the process, and, more importantly, explicitly connecting various aspects of medical test design. Although this framework was developed primarily for the medical screening testing, some results can be readily generalized to other settings, such as rare event surveillance and sequential decision making
Data mining to identify gene regulatory elements by Lexiang Ji(
)
1 edition published in 2019 in English and held by 1 WorldCat member library worldwide
Gene regulatory elements are essential for the survival and development of all organisms but only a handful of gene regulatory elements have been discovered thus far in plants. Unfortunately, traditional methods to identify regulatory elements are ineffective as these regions are typically short (412 base pairs) and can be thousands of base pairs away from target genes. My project aimed to identify regulatory elements in plant genomes by mining a large and diverse set of histone modifications, chromatin modifications, and chromatin accessibility data with statistical approaches including Kmeans clustering and smoothing spline clustering. The results revealed an abundance of distal accessible chromatin regions in the maize genome which contain distinct combinations of chromatin modifications. Results from this project provide valuable clues for the future improvement of economically important crop traits
1 edition published in 2019 in English and held by 1 WorldCat member library worldwide
Gene regulatory elements are essential for the survival and development of all organisms but only a handful of gene regulatory elements have been discovered thus far in plants. Unfortunately, traditional methods to identify regulatory elements are ineffective as these regions are typically short (412 base pairs) and can be thousands of base pairs away from target genes. My project aimed to identify regulatory elements in plant genomes by mining a large and diverse set of histone modifications, chromatin modifications, and chromatin accessibility data with statistical approaches including Kmeans clustering and smoothing spline clustering. The results revealed an abundance of distal accessible chromatin regions in the maize genome which contain distinct combinations of chromatin modifications. Results from this project provide valuable clues for the future improvement of economically important crop traits
The effect of CEO political ideology on executive succession following firm misconduct by
Ugyŏng Yi(
)
1 edition published in 2019 in English and held by 1 WorldCat member library worldwide
When a firm experiences a negative event, the board often changes the firm's CEO to constrain negative reactions. Although research enriched in examining incoming CEOs' characteristics, little is known about the crisis itself that incurs the board's inclination to the attributes. This study examines how different natures of infractions affect the selection of CEOs with specific political orientations. More precisely, I hypothesize that a board is more likely to appoint conservative CEOs following competence failures and liberal CEOs after integrity violations. I also hypothesize that outside evaluators will positively perceive the congruence between the nature of the crisis and the political orientation of the incoming CEO. These hypotheses are tested using a twostage treatment effects model to correct for the potential endogeneity induced by omitted variables. Accordingly, this study contributes to the management literature by examining the antecedents that lead to the selection of CEOs with certain attributes
1 edition published in 2019 in English and held by 1 WorldCat member library worldwide
When a firm experiences a negative event, the board often changes the firm's CEO to constrain negative reactions. Although research enriched in examining incoming CEOs' characteristics, little is known about the crisis itself that incurs the board's inclination to the attributes. This study examines how different natures of infractions affect the selection of CEOs with specific political orientations. More precisely, I hypothesize that a board is more likely to appoint conservative CEOs following competence failures and liberal CEOs after integrity violations. I also hypothesize that outside evaluators will positively perceive the congruence between the nature of the crisis and the political orientation of the incoming CEO. These hypotheses are tested using a twostage treatment effects model to correct for the potential endogeneity induced by omitted variables. Accordingly, this study contributes to the management literature by examining the antecedents that lead to the selection of CEOs with certain attributes
Statistical inference and learning for topological data analysis by
Chʻŏl Mun(
)
1 edition published in 2018 in English and held by 1 WorldCat member library worldwide
Topological data analysis (TDA) is a rapidly developing collection of methods for studying the shape of data. Persistent homology is a prominent branch of TDA which analyzes the dynamics of topological features of a data set. We introduce statistical inference and learning methods for persistent homology of three types of data: point clouds, fingerprints, and rock images. First, we illustrate a topological inference plot for point cloud data, called the persistence terrace. The suggested plot allows robust and scalefree inference on the size and point density of topological features. Second, we suggest a new interface between persistent homology and machine learning algorithms and apply it to the problem of sorting fingerprints into predetermined groups. We achieve near stateoftheart classification accuracy rates by applying TDA to minutiae points and inkroll images. Last, we present a statistical model for analysis of porous materials using persistent homology. Our model enables us to predict the geophysical properties of rocks based on their geometry and connectivity
1 edition published in 2018 in English and held by 1 WorldCat member library worldwide
Topological data analysis (TDA) is a rapidly developing collection of methods for studying the shape of data. Persistent homology is a prominent branch of TDA which analyzes the dynamics of topological features of a data set. We introduce statistical inference and learning methods for persistent homology of three types of data: point clouds, fingerprints, and rock images. First, we illustrate a topological inference plot for point cloud data, called the persistence terrace. The suggested plot allows robust and scalefree inference on the size and point density of topological features. Second, we suggest a new interface between persistent homology and machine learning algorithms and apply it to the problem of sorting fingerprints into predetermined groups. We achieve near stateoftheart classification accuracy rates by applying TDA to minutiae points and inkroll images. Last, we present a statistical model for analysis of porous materials using persistent homology. Our model enables us to predict the geophysical properties of rocks based on their geometry and connectivity
Optimal pvalue weighting with independent information by Mohamad Shakil Hasan(
)
1 edition published in 2017 in English and held by 1 WorldCat member library worldwide
The large scale multiple testing inherent to high throughput biological data necessitates very high statistical stringency and thus true effects in data are difficult to detect unless they have high effect sizes. One solution to this problem is to use independent information to prioritize the most promising features of the data and thus increase the power to detect them. Weighted pvalues provide a general framework for doing this in a statistically rigorous fashion. However, calculating weights that incorporate the independent information and optimize statistical power remains a challenging problem despite recent advances in this area. Existing methods tend to perform poorly in the common situation that true positive features are rare and of low effect size. We introduce Covariate Rank Weighting a method for calculating approximate optimal weights conditioned on the ranking of tests by an external covariate. This approach uses the probabilistic relationship between covariate ranking and test effect size to calculate more informative weights that are not diluted by null effects as is common with groupbased methods. This relationship can be calculated theoretically for normally distributed covariates or estimated empirically in other cases. We show via simulations and applications to data that this method outperforms existing methods by a large margin in the rare/low effect size scenario and has at least comparable performance in all scenarios
1 edition published in 2017 in English and held by 1 WorldCat member library worldwide
The large scale multiple testing inherent to high throughput biological data necessitates very high statistical stringency and thus true effects in data are difficult to detect unless they have high effect sizes. One solution to this problem is to use independent information to prioritize the most promising features of the data and thus increase the power to detect them. Weighted pvalues provide a general framework for doing this in a statistically rigorous fashion. However, calculating weights that incorporate the independent information and optimize statistical power remains a challenging problem despite recent advances in this area. Existing methods tend to perform poorly in the common situation that true positive features are rare and of low effect size. We introduce Covariate Rank Weighting a method for calculating approximate optimal weights conditioned on the ranking of tests by an external covariate. This approach uses the probabilistic relationship between covariate ranking and test effect size to calculate more informative weights that are not diluted by null effects as is common with groupbased methods. This relationship can be calculated theoretically for normally distributed covariates or estimated empirically in other cases. We show via simulations and applications to data that this method outperforms existing methods by a large margin in the rare/low effect size scenario and has at least comparable performance in all scenarios
Efficient genotyping by sampling extreme individuals in a genome wide association study in plants by Wenqian Kong(
)
1 edition published in 2018 in English and held by 1 WorldCat member library worldwide
We evaluated statistical power of selective genotyping strategies based on sampling extreme individuals a genomewide association study (GWAS). Simulation with a theoretical setup and with application in the actual dataset provides guidance on determining the minimum individuals from the extremes needed to detect causal variants to reach 80% statistical power. We compared power and false discovery rates of three different methods in a realworld sorghum diversity panel using Fisher's exact test, analysis of variance (ANOVA) and a popular software GAPIT which applies mixed model for variant detection and controls for population structure. Our simulation results also discover that the power of detecting causal SNP markers in selective genotyping is dependent on the initial population size. This strat egy is particularly helpful in genetic studies to reduce genotyping costs for variant detection and validation
1 edition published in 2018 in English and held by 1 WorldCat member library worldwide
We evaluated statistical power of selective genotyping strategies based on sampling extreme individuals a genomewide association study (GWAS). Simulation with a theoretical setup and with application in the actual dataset provides guidance on determining the minimum individuals from the extremes needed to detect causal variants to reach 80% statistical power. We compared power and false discovery rates of three different methods in a realworld sorghum diversity panel using Fisher's exact test, analysis of variance (ANOVA) and a popular software GAPIT which applies mixed model for variant detection and controls for population structure. Our simulation results also discover that the power of detecting causal SNP markers in selective genotyping is dependent on the initial population size. This strat egy is particularly helpful in genetic studies to reduce genotyping costs for variant detection and validation
more
fewer
Audience Level
0 

1  
Kids  General  Special 
Related Identities
 University of Georgia
 University of Georgia Electronic Theses and Dissertations database
 Park, Cheolwoo
 Georgia Institute of Technology Environmental Resources Center
 McCormick, William P. Author
 Geological Survey (U.S.). Water Resources Division
 Reeves, Jaxk H.
 Schliekelman, Paul
 Kaplan, Jennifer
 Mandal, Abhyuday
Associated Subjects
Alternative Names
University of Georgia. Dept. of Statistics
University of Georgia. Statistics Department
Languages