I am a senior statistician at Sandia National Laboratories. My research interests broadly lie in the fields of computational statistics, simulation-based statistical learning (particle methods, Monte Carlo approaches, Approximate Bayesian Computation) of stochastic processes and Bayesian nonparametrics. Other areas of interest include temporal-spatial statistics, extreme value analysis and Bayesian variable selection. Recently, I have developed an interest in physics-informed statistical learning. Applications of my work include in bioinformatics and biological imaging, cyber-security, climatology and the social sciences.
Download my resumé.
PhD in Statistics, 2019
Imperial College London
MSci in Mathematics, 2015
Imperial College London
A key difficulty that arises from real event data is imprecision in the recording of event time-stamps. In many cases, retaining event times with a high precision is expensive due to the sheer volume of activity. Combined with practical limits on the accuracy of measurements, binned data is common. In order to use point processes to model such event data, tools for handling parameter estimation are essential. Here we consider parameter estimation of the Hawkes process, a type of self-exciting point process that has found application in the modeling of financial stock markets, earthquakes and social media cascades. We develop a novel optimization approach to parameter estimation of binned Hawkes processes using a modified Expectation-Maximization algorithm, referred to as Binned Hawkes Expectation Maximization (BH-EM). Through a detailed simulation study, we demonstrate that existing methods are capable of producing severely biased and highly variable parameter estimates and that our novel BH-EM method significantly outperforms them in all studied circumstances. We further illustrate the performance on network flow (NetFlow) data between devices in a real large-scale computer network, to characterize triggering behavior. These results highlight the importance of correct handling of binned data. Supplementary materials for this article are available online.
Motivation: Many recent advancements in single-molecule localization microscopy exploit the stochastic photoswitching of fluorophores to reveal complex cellular structures beyond the classical diffraction limit. However, this same stochasticity makes counting the number of molecules to high precision extremely challenging, preventing key insight into the cellular structures and processes under observation.
Results: Modelling the photoswitching behaviour of a fluorophore as an unobserved continuous time Markov process transitioning between a single fluorescent and multiple dark states, and fully mitigating for missed blinks and false positives, we present a method for computing the exact probability distribution for the number of observed localizations from a single photoswitching fluorophore. This is then extended to provide the probability distribution for the number of localizations in a direct stochastic optical reconstruction microscopy experiment involving an arbitrary number of molecules. We demonstrate that when training data are available to estimate photoswitching rates, the unknown number of molecules can be accurately recovered from the posterior mode of the number of molecules given the number of localizations. Finally, we demonstrate the method on experimental data by quantifying the number of adapter protein linker for activation of T cells on the cell surface of the T-cell immunological synapse.
Fluorescing molecules (fluorophores) that stochastically switch between photon-emitting and dark states underpin some of the most celebrated advancements in super-resolution microscopy. While this stochastic behavior has been heavily exploited, full characterization of the underlying models can potentially drive forward further imaging methodologies. Under the assumption that fluorophores move between fluorescing and dark states as continuous time Markov processes, the goal is to use a sequence of images to select a model and estimate the transition rates. We use a hidden Markov model to relate the observed discrete time signal to the hidden continuous time process. With imaging involving several repeat exposures of the fluorophore, we show the observed signal depends on both the current and past states of the hidden process, producing emission probabilities that depend on the transition rate parameters to be estimated. To tackle this unusual coupling of the transition and emission probabilities, we conceive transmission (transition-emission) matrices that capture all dependencies of the model. We provide a scheme of computing these matrices and adapt the forward-backward algorithm to compute a likelihood which is readily optimized to provide rate estimates. When confronted with several model proposals, combining this procedure with the Bayesian Information Criterion provides accurate model selection.