Recently we uploaded the paper
- Josef Dick, Daniel Rudolf and Houying Zhu, Discrepancy bounds for uniformly ergodic Markov chain quasi-Monte Carlo. ArXiv, 19 March 2013. Submitted, 2013.
In a sense, this paper is an extension of the paper
- Su Chen, Josef Dick, Art Owen, Consistency of Markov Chain quasi-Monte Carlo for continuous state spaces. Ann. Stat., 39, 679–701, 2011. For a blog entry and preprint version see here.
In this paper we prove bounds on the discrepancy of Markov chains which are generated by a deterministic driver sequence . We also prove a Koksma-Hlawka inequality. The main assumption which we use is uniform ergodicity of the transition (Markov) kernel. We describe the essential ingredients and results in the following.
The transition kernel of a Markov chain
Let be a probability space. We want to generate a Markov chain in the state space whose distribution converges to the target distribution . The idea of the transition kernel is the following. Assume that is the th point of the Markov chain. Let be an arbitrary set. Then the probability that the next point of the Markov chain is in , is given by . For a precise definition of the transition kernel see Definition 2 on page 6 here.
Let be the starting point of our Markov chain. Then we use to sample the next point of our Markov chain. The point is then sampled from the distribution , and so on. Asymptotically we then get that the distribution of converges in some sense to the target distribution . One can also write down the transition kernel for going steps, denoted by , namely it can be defined recursively by
where is the indicator function of the set . Here, just means that we integration with respect to the probability measure .
This method is an important algorithm in simulation problems where it is not possible to generate samples which have distribution . Instead one generates a Markov chain which has target distribution and uses the chain to as the points for the simulation. Such algorithms are called Markov chain Monte Carlo.
There are many methods for generating the Markov chain, see for instance the Metropolis-Hastings algorithm or the slice sampler. However, those algorithms will not play a role in the following. We just assume that there is a function , which we call update function. This function is used to generate the Markov chain by , where is a uniformly distributed random number in .
Uniformly ergodic Markov chains
Uniform ergodicity is a property of the transition kernel which tells us how fast the distribution of converges to the target distribution . The assumption is of the following form
where and are constants. This assumption is actually very natural as we explain for a special case in the following.
If the transition kernel satisfies certain assumptions, we have
where are the eigenvalues and are orthonormal eigenfunctions of the integral operator (we assume that this operator is a compact operator). We have
Then it can be shown using the recursive formula for and the Karhunen-Loeve expansion of that
Thus converges exponentially fast to . (In fact, under certain assumptions, one can choose .) But we have , the stationary distribution.
The discrepancy of the Markov chain
We measure the discrepancy of the empirical distribution of the first points of the Markov chain from the target distribution . The empirical distribution for a set is
where is the indicator function of the set . The local discrepancy is
Note that for arbitrary sets one cannot get convergence in general (if say and is just the point set , then the local discrepancy is ). One example is to consider boxes of the form . For instance, one can consider the set of sets . We call the set the set of test sets. The discrepancy is now defined by
The main result of the paper is the proof that for any starting point there exists a finite driver sequence such that the Markov chain generated by this driver sequence is bounded by
for some constant independent of . See Corollary 5 here.
Several further extensions would be interesting. One is to remove the dependence of the driver sequence on the starting point. Further one would like to have a driver sequence which can be used for a large class of update functions. In practice one wants to have explicit constructions of such driver sequences. Ultimately one would also like to have driver sequences for which one can achieve a rate of convergence beyond the Monte Carlo rate, at least, for some restricted class of problems. Answers to these questions would be very interesting, but they seem to be quite challenging, at least at the moment.