In the previous blogpost, we talked about the hazard function - the probability that the event happens at time $t$ given that you it hasn’t till time $t$. In the Cox proportional hazard model, we assume the hazard function is:
I recently gave a short intro to survival models to the team as part of a knowledge share session. The goal was to motivate why we should care about censored models.
I did a little bit of work recently to help the Wake county in North Carolina allocate poll observers to polling stations. One of the reasons poll observers cancel is because the polling station ends up being too far to travel. Now if we thought of this before we send them an allocation we can find an optimal allocation using one of the many matching algorithms.
We’ve been doing some work with Delhi on COVID response and thinking a lot about positivity rate and optimal testing. Say you want to catch the maximum number of positive cases but you have no idea what the positivity rates are in each ward within the city but you expect wards close to each other to have similar rates. You have a limited number of tests. How do you optimally allocate these to each ward to maximise the number of positive cases you catch?
State space models (SSM) are a tonne of fun. I sneaked one into a this post I did a while ago. In that post, I was recreating an analysis but using a state space model where the hidden state, the true $\beta$s were following a Gaussian Random Walk and what we observed was the growth in GDP. In this post, I’m going to explore a generalised version of the model - the linear-Gaussian SSM (LG-SSM).
In part 1, we looked at Empirical Bayes Large-Scale Testing where we defined the data generating process as a mixture model. In this post, instead of empirically estimating $S(z)$, we assume it’s a mixture of two gaussian and define a mixture model in pymc3. We finish by considering the local false discovery rate, which has a much cleaner bayesian interpretation.
We take a short detour from Bayesian methods to talk about large-scale hypothesis testing. You all are probably quite familiar with the p-hacking controversy and the dangers of multiple testing. This post isn’t about that. What if you are not confirming a single hypothesis but want to find a few interesting “statistically significant” estimates in your data to direct your research?
In frequentist statistics, you want to know how seriously you should take your estimates. That’s easy if you’re doing something straight forward like averaging:
In the last two posts, we explored some features of the logit choice model. In the first, we looked at systematic taste variation and how that can be accounted for in the model. In the second, we explored one of nice benefits of the IIA assumption - we provided a random subset of alternatives of varying size to each decision maker and were able to use that to estimate the parameters.
In the last post, we talked about how this property of Independence from Irrelevant Alternatives (IIA) may not be realistic (see red bus / blue bus example). But, say you are comfortable with it and the proportional substitution that it implies, you get to use some nice tricks.
I’ve been working my way through Kenneth Train’s “Discrete Choice Methods with Simulation” and playing around with the models and examples as I go. Kind of what I did with Mackay’s book. This post and the next have are some key takeaways with code from chapter 3 - Logit.
We use things without knowing how they work. Last time my fridge stopped working, I turned it off and on again to see if that fixed it. When it didn’t I promptly called the “fridge guy”. If you don’t know how things work, you don’t know when and how they break, and you definitely don’t know how to fix it.
I just wanted to put up a few animations of HMC and slice samplers that I have been playing around with.
In this post, I just implement a Gibbs and a slice sampler for a non-totally-trivial distribution. Both of these are vanilla version – no overrelaxation for Gibbs and no elliptical slice samplers, rectangular hyper-boxes etc. I am hoping you never use these IRL. It is a good intro though.
Following David Mackay’s book along with his videos online has been a real joy. In lecture 11, as an example of an inference problem, he goes over many variations of the k-means algorithm. Let’s check these out.
There are two ways of learning and building intuition. From the top down, like fast.ai believes, and the bottom up, like Andrew Ng’s deep learning course on coursera. I’m not sure what my preferred strategy is.
Last month, I did a post on how you could setup your HMM in pymc3. It was beautiful, it was simple. It was a little too easy. The inference button makes setting up the model a breeze. Just define the likelihoods and let pymc3 figure out the rest.
A colleague of mine came across an interesting problem on a project. The client wanted an alarm raised when the number of problem tickets coming in increased “substantialy”, indicating some underlying failure. So there is a some standard rate at which tickets are raised and when something has failed or there is serious problem, a tonne more tickets are raised. Sounds like a perfect problem for a Hidden Markov Model.
If you want to measure the causal effect of a treatment what you need is a counterfactual. What would have happened to the units if they had not got the treatment? Unless your unit is Gwyneth Paltrow in Sliding Doors, you only observe one state of the world. So the key to causal inference is to reconstruct the untreated state of the world. Athey et al. in their paper show how matrix completion can be used to estimate this unobserved counterfactual world. You can treat the unobserved (untreated) states of the treated units as missing and use a penalized SVD to reconstruct these from the rest of the dataset. If you are familiar with the econometric literature on synthetic controls, fixed effects, or unconfoundedness you should definitely read the paper; it shows these as special cases of matrix completion with the missing data of a specific form. Actually, you should read the paper anyway. Most of it is quite approachable and it’s very insightful.
David McKay’s Information Theory, Inference, and Learning Algorithms, in addition to being very well written and insightful, has exercises that read like a book of puzzles. Here’s one I came across in chapter 2:
Hat tip to @mkessler_DC for the clickbaitey title.
I’ve been reading Efron & Hastie’s Computer Age Statistical Inference (CASI) in my downtime. Actually, I’m doing better than reading. I don’t know why I didn’t think of this earlier - the best way to truly understand the material is to have your favourite statistical package open and actually play around with the examples as you go.
I have been slowly working my way through Efron & Hastie’s Computer Age Statistical Inference (CASI). Electronic copy is freely available and so far it has been a great though at time I get lost in the derivations.
This posts gives the Fader and Hardie (2005) model the full Bayesian treatment. You can check out the notebook here.
In chapter 2 of BDA3, the authors provide an example where they regularize the cancer rates in counties in the US using an empirical Bayesian model. In this post, I repeat the exercise using county level data on suicides using firearms and other means.
Anyone else feel that US mass shootings have increased over the past few years? My wife thinks that it’s just availability heuristic at play. Well, luckily there is data out there that we can use to test it. This analysis in this blog uses the dataset from Mother Jones. I did some minor cleaning that you can see in the notebook.
I did a quick intro to gaussian processes a little while back. Check that out if you haven’t.
This is an implementation of SGDR based on this paper by Loshchilov and Hutter. Though the cosine annealing is built into PyTorch now which handles the learning rate (LR) decay, the restart schedule and with it the decay rate update is not (though PyTorch 0.4 came out yesterday and I haven’t played with it yet). The notebook that generates the figures in this can be found here.
This post is an intro to Gaussian Processes.
I recently had to create a bunch of maps for work. I did a bunch in d3.js a while back for India for CEA’s office and some (in non-interactive form) were included in the Indian Economic Survey.
I imagine most of you have some idea of Monte Carlo (MC) methods. Here we’ll try and quantify it a little bit.
A lot of this material is from Larry Wasserman’s All of Statistics. I love how the title makes such a bold claim and then quickly hedges by adding the subtitle “A Concise Course in Statistical Inference” (The italic are mine).
If you didn’t see Part 1, check that out first.
I was going to dive straight into it but thought I should go over Simulated Annealing (SA) first before connecting them. SA is an heuristic optimization algorithm to find the global minimum of some complex function $f(X)$ which may have a bunch of local ones. Note that $X$ can be vector of length N: $X = [x_1, x_2, …, x_n]$