czwartek, 05-12-2019 - 14:15, 603
Statistical challenges in mass spectrometry data analysis: shared peptides
Mass spectrometry (MS) is one of the most important technologies for study of proteins. MS experiments generate massive amounts of complex data which require advanced pre-processing and careful statistical analysis.
In bottom-up approach to MS, peptides - smaller segments of proteins - enter the mass spectrometer and thus measurements are made on a peptide level.
Because of this, one of the problems in protein quantification based on MS is the presence of peptides that can be assigned to multiple proteins.
Such peptides are referred to as shared or degenerate peptides.
Since it is not obvious how to assign the abundance of shared peptides to proteins, they are often discarded from the analysis. This leads to a loss of a substantial amount of data.
In this talk, I will first present the basics of Mass Spectrometry data analysis. Then, I will review existing methods for handling shared peptides.
I will finish with a summary of our progress on improving methodology of protein quantification with shared peptides and related statistical challenges.
The talk is based on an ongoing collaboration with Tomasz Burzykowski (Hasselt University) and Jurgen Claesen (Belgian Nuclear Research Centre).
czwartek, 14-11-2019 - 14:15, 603
On the Model Selection Properties and Uniqueness of the Lasso and Related Estimators
Ulrike Schneider (Vienna University of Technology)
We investigate the model selection properties of the Lasso estimator in finite samples with no conditions on the regressor matrix X. We show that which covariates the Lasso estimator may potentially choose in high dimensions (where the number of explanatory variables p exceeds sample
size n) depends only on X and the given penalization weights. This set of potential covariates can be determined through a geometric condition on X and may be small enough (less than or equal to n in cardinality). Related to the geometric conditions in our considerations, we also provide a necessary and sufficient condition for uniqueness of the Lasso solutions. Finally, we discuss how these results carry over to other model selection procedures such as the SLOPE
czwartek, 07-11-2019 - 14:15, 603
Selection of colored saturated Gaussian models
Piotr Graczyk (Université d'Angers)
wtorek, 29-10-2019 - 14:15, 605
Analysis of HDX-MS data: a pristine land for bioinformatics
Michał Burdukiewicz (MI2 DataLab, PW)
Hydrogen-deuterium exchange monitored by mass spectrometry (HDX-MS) has recently become a staple tool in studies of protein structure. The main application of this technique is to compare the structure of a protein altered by several factors (so-called states). Introduced statistical frameworks address the screening part of the analysis, i.e., search for significant differences between states, but miss the post-screening phase of analysis. We critically evaluate existing models and point their strengths and weaknesses. Additionally, we provide a novel solution to a multi-state comparison problem where the region of the interest inside the protein structure is already well-defined.
czwartek, 24-10-2019 - 14:15, 603
Counting faces of random polytopes and applications
Abstract in the attachment
czwartek, 17-10-2019 - 14:15, 603
Statistical inference with missing values
Missing data exist in almost all areas of empirical research. There are various reasons why missing data may occur, including survey non-response, unavailability of measurements, and lost data. In this presentation, I will share my experience on how to do parametric estimation with missing covariates, based on likelihood methods and Expectation-Maximization algorithm. Then I will focus on recent results in a supervised learning setting, for performing logistic regression with missing values. We illustrate the method on a dataset of severely traumatized patients from Paris hospitals to predict the occurrence of hemorrhagic shock, a leading cause of early preventable death in severe trauma cases. The methodology is implemented in the R package misaem.
środa, 07-11-2018 - 14:15, 711/712
Topics on stochastic optimization and long-time approximation of stochastic processes
Stochastic optimization is a way of approximating minima of deterministic functions by a stochastic approach. I will begin my talk by some background on this topic and on the Robbins-Monro algorithm. Then, I will state some recent non-asymptotic results about Ruppert-Polyak algorithm, which is an averaged version of the Robbins-Monro algorithm. In a last part, I will briefly introduce the problem of long-time approximation of diffusion processes and its link with approximation of Gibbs distributions. I will conclude some statistical applications of these methods. This talk is based on collaborations with Sébastien Gadat and Gilles Pagès