Singular+Value+Decomposition+applied+to+Cluster+Markov+Chains

= = toc =**Abell 2631** (for fgas paper)= This example shows the correlation plots for Abell 2631, for an x-ray Markov chain with __4 free parameters (ne__ __0, rs, n and Te0)__. Clearly, several of the parameters are strongly correlated. The existence of correlations between parameters doesn't constitute a problem in and of itself -- they're a "feature" of all commonly used cluster models -- but these strongly correlated parameters do cause the fitting process to be very slow and inefficient. Specifically, our joint X-ray/SZ markov chains typically have an acceptance probability of order 1%, meaning that 99% of the trial parameter sets are rejected, and 99% of the computing time is spent calculating likelihoods for parameters that have no hope of working. Surely there must be a better way?

There is a better way: reparameterization using singular value decomposition (SVD). Jeff Kolodziejczak described the approach in the writeup that was sent around on 9/15/2011; in brief, the idea is to construct a linear transformation of the original variables to an orthogonal basis set, with the added feature that the orthogonal parameters are ranked in order of numerical importance to the problem at hand. Here are the correlation plots for the same Abell 2631 x-ray dataset, the only difference being that we are now using the orthogonal SVD parameters (here SVD0, SVD1, SVD2, and SVD3) when we evaluate the likelihoods for the Markov chain. Strongly correlated parameters are now a thing of the past, and the acceptance rate of the Markov chain is right at the optimum value of ~30% -- delivering a large improvement in computational efficiency:

Taking the SVD further: the strong degeneracy between the original model parameters r_scale and n (see triangle plot at the top of the page) suggests that we have one too many free parameters for this particular dataset. The parameters generated by the singular value decomposition are ranked by numerical importance to the problem at hand, and you can see from examining the scales on the SVD correlation plots above that the magnitude of the last parameter (SVD3) is much smaller than that of the first parameter (SVD0). We can drop the lowest-rank SVD parameter from the calculation (thus reducing the dimension of the space that the Markov chain must explore), and see what effect this has on the quality of the fit to the X-ray data:


 * A2631: Best fit to the X-ray surface brightness and kt profile data using 4 SVD parameters.**





Acceptance rates: //original chain//:4% //4-parameter SVD// : 28% //3-parameter SVD//: 86%

**Best-fit parameters**
Original chain: ne0 7.718e-03 +2.910e-04 -2.750e-04 Rs(arcsec) 2.449e+02 +6.770e+01 -4.660e+01 n 9.177e+00 +1.913e+00 -1.310e+00 Tx0 8.426e+00 +7.200e-01 -7.450e-01

Reparametrized (4x4) ne0 7.704e-03 +2.850e-04 -2.740e-04 Rs(arcsec) 2.480e+02 +6.760e+01 -4.700e+01 n 9.273e+00 +1.907e+00 -1.339e+00 Tx0 8.330e+00 +7.290e-01 -7.440e-01

Reparametrized (4x3) ne0 7.717e-03 +2.530e-04 -2.400e-04 Rs(arcsec) 2.376e+02 +5.770e+01 -3.300e+01 n 8.935e+00 +1.645e+00 -8.890e-01 Tx0 8.334e+00 +7.610e-01 -7.060e-01

**Correlation matrices**
A2631, original: (ne0,Rs(arcsec)):-0.852 (ne0,n):-0.825 (ne0,Tx0):0.296 (Rs(arcsec),n):0.998 (Rs(arcsec),Tx0):-0.328 (n,Tx0):-0.313

A2631, 4x4, reparametrized: (SVD0,SVD1):0.013 (SVD0,SVD2):-0.138 (SVD0,SVD3):0.213 (SVD1,SVD2):0.020 (SVD1,SVD3):-0.044 (SVD2,SVD3):-0.147

(ne0,Rs(arcsec)):-0.848 (ne0,n):-0.824 (ne0,Tx0):0.311 (Rs(arcsec),n):0.998 (Rs(arcsec),Tx0):-0.368 (n,Tx0):-0.353

A2631, 4x3 reparametrized: (SVD0,SVD1):-0.065 (SVD0,SVD2):-0.050 (SVD1,SVD2):0.096

(ne0,Rs(arcsec)):-0.796 (ne0,n):-0.761 (ne0,Tx0):0.205 (Rs(arcsec),n):0.998 (Rs(arcsec),Tx0):-0.301 (n,Tx0):-0.288

=Abell 2204 (for fgas paper)=

The original chain has 7 parameters (only the index "p" was held fixed). You can see the strong correlation as in the case of A2631



The reparametrized chain with 7 SVD parameters fares much better in terms of the correlation among the new parameters



Here is the reparametrized chain in which the 7th SVD parameters was neglected; it yields the same goodness of fit as the 7-parameter SVD chain.



In all three cases (original chain, 7-parameter SVD and 6-parameter SVD), the best-fit are virtually identical. Acceptance rates: //original//: 4.5% //SVD 7-parameters//: 21% //SVD 6-parameters//: 26%

Best fit parameters
These are the best-fit values for the parameters of the 3 chains; in all cases there is agreement in the best-fit values.

__Original chain:__ ne0 4.117e-02 +1.912e-03 -1.799e-03 Rs(arcsec) 2.260e+01 +1.222e+00 -1.204e+00 n 6.642e+00 +8.866e-01 -6.838e-01 Sigma 2.384e+00 +5.030e-02 -4.990e-02 Tx0 1.511e+01 +6.670e-01 -7.190e-01 r_cool 1.991e+01 +5.570e-01 -5.020e-01 a_cool 1.655e-01 +7.160e-03 -7.070e-03

__SVD chain with 7 parameters:__ ne0 4.118e-02 +2.290e-03 -2.510e-03 Rs(arcsec) 2.251e+01 +1.740e+00 -1.540e+00 n 6.760e+00 +1.186e+00 -8.160e-01 Sigma 2.375e+00 +6.100e-02 -6.000e-02 Tx0 1.498e+01 +8.100e-01 -7.600e-01 r_cool 2.001e+01 +6.900e-01 -6.100e-01 a_cool 1.662e-01 +8.100e-03 -9.000e-03

__SVD chain with 6 parameters:__ ne0 4.132e-02 +2.210e-03 -1.980e-03 Rs(arcsec) 2.242e+01 +1.320e+00 -1.340e+00 n 6.713e+00 +9.060e-01 -6.040e-01 Sigma 2.379e+00 +4.300e-02 -5.100e-02 Tx0 1.501e+01 +7.900e-01 -7.700e-01 r_cool 1.995e+01 +5.200e-01 -5.500e-01 a_cool 1.672e-01 +8.100e-03 -8.200e-03

**Covariance matrices**
Here we show how the SVD chain have removed the parameter correlation almost entirely; notice that we should not expect a correlation coefficient of exactly 0.0 for the SVD parameters, since the correlation among the original parameters was not exactly linear.

__Original chain:__ (ne0,Rs(arcsec)):-0.955 (ne0,n):0.293 (ne0,Sigma):-0.384 (ne0,Tx0):-0.436 (ne0,r_cool):-0.873 (ne0,a_cool):0.907 (Rs(arcsec),n):-0.420 (Rs(arcsec),Sigma):0.522 (Rs(arcsec),Tx0):0.487 (Rs(arcsec),r_cool):0.758 (Rs(arcsec),a_cool):-0.883 (n,Sigma):-0.982 (n,Tx0):-0.767 (n,r_cool):-0.224 (n,a_cool):0.195 (Sigma,Tx0):0.787 (Sigma,r_cool):0.293 (Sigma,a_cool):-0.285 (Tx0,r_cool):0.365 (Tx0,a_cool):-0.289 (r_cool,a_cool):-0.660

__SVD chain, 7 parameters:__ (ne0,Rs(arcsec)):-0.967 (ne0,n):0.435 (ne0,Sigma):-0.503 (ne0,Tx0):-0.466 (ne0,r_cool):-0.902 (ne0,a_cool):0.920 (Rs(arcsec),n):-0.511 (Rs(arcsec),Sigma):0.605 (Rs(arcsec),Tx0):0.515 (Rs(arcsec),r_cool):0.841 (Rs(arcsec),a_cool):-0.900 (n,Sigma):-0.969 (n,Tx0):-0.821 (n,r_cool):-0.437 (n,a_cool):0.258 (Sigma,Tx0):0.837 (Sigma,r_cool):0.504 (Sigma,a_cool):-0.334 (Tx0,r_cool):0.486 (Tx0,a_cool):-0.265 (r_cool,a_cool):-0.721

(SVD0,SVD1):0.018 (SVD0,SVD2):-0.082 (SVD0,SVD3):0.271 (SVD0,SVD4):-0.003 (SVD0,SVD5):-0.174 (SVD0,SVD6):0.164 (SVD1,SVD2):0.183 (SVD1,SVD3):-0.019 (SVD1,SVD4):-0.035 (SVD1,SVD5):0.156 (SVD1,SVD6):-0.070 (SVD2,SVD3):-0.046 (SVD2,SVD4):0.013 (SVD2,SVD5):0.073 (SVD2,SVD6):-0.102 (SVD3,SVD4):-0.065 (SVD3,SVD5):-0.212 (SVD3,SVD6):0.049 (SVD4,SVD5):0.141 (SVD4,SVD6):-0.142 (SVD5,SVD6):-0.174

__SVD chain, 6 parameters__ (ne0,Rs(arcsec)):-0.960 (ne0,n):0.473 (ne0,Sigma):-0.524 (ne0,Tx0):-0.489 (ne0,r_cool):-0.847 (ne0,a_cool):0.885 (Rs(arcsec),n):-0.540 (Rs(arcsec),Sigma):0.627 (Rs(arcsec),Tx0):0.498 (Rs(arcsec),r_cool):0.751 (Rs(arcsec),a_cool):-0.872 (n,Sigma):-0.966 (n,Tx0):-0.766 (n,r_cool):-0.415 (n,a_cool):0.254 (Sigma,Tx0):0.780 (Sigma,r_cool):0.478 (Sigma,a_cool):-0.321 (Tx0,r_cool):0.482 (Tx0,a_cool):-0.220 (r_cool,a_cool):-0.579

(SVD0,SVD1):0.008 (SVD0,SVD2):-0.054 (SVD0,SVD3):-0.102 (SVD0,SVD4):-0.029 (SVD0,SVD5):-0.320 (SVD1,SVD2):0.288 (SVD1,SVD3):0.154 (SVD1,SVD4):-0.253 (SVD1,SVD5):0.178 (SVD2,SVD3):0.099 (SVD2,SVD4):-0.104 (SVD2,SVD5):0.096 (SVD3,SVD4):-0.026 (SVD3,SVD5):-0.047 (SVD4,SVD5):-0.078

=**Abell 1835** (for pressure paper)=

Two chains run so far, to test whether the original and SVD reparametrized chains give the same results: //Original chain//: X-ray + SZ data, 8 parameters + SZ normalization + 2 point source fluxes free (11 parameters free) //SVD chain//: same X-ray + Sz data, 8 parameters with SVD + SZ normalization (2 PS were fixed, and SZ normalization was not SVD'd over)

//Acceptance//: Original chain: 1.4% SVD chain: 6%

//Best-fit statistics//: Original chain: minimum chi^2 kT profile = 2.3; Sx profile= 69.0 SVD chain: minimum chi^2 kT = 2.4; Sx= 68.9 Both chains find the same minimum chi^2.

//Best-fit parameters// We were interested in finding out especially if the tight parameters on Sigma would increase by the SVD chain, but the constraints are the same as in the original chain; there is also agreement on the other parameters//.//
 * __Original chain__:

ne0 0.054+-0.003 Rs(arcsec) 16.6+-1.0 n 61+-14 Tx0 10.4+-0.4 r_cool 17.2+-0.3 a_cool 0.48+-0.03 p_cool 4.5+-0.5 || __SVD chain__:
 * Sigma 2.04+-0.01**

ne0 0.055+-0.002 Rs(arcsec) 16.4+-0.7 n 56+-11 Tx0 10.3+-0.4 r_cool 17.2+-0.3 a_cool 0.49+-0.02 p_cool 4.5+-0.4 ||
 * Sigma 2.04+-0.01**

=Abell 2204 (for pressure paper)=

We also tested the changes in parameters between the original chain and the SVD chain for A2204, the other cluster (besides A1835) with the highest S/N X-ray data, therefore most suitable to measure all the poly parameters.

Acceptance:

Best-fit parameters
 * __Original chain__

neo 4.1+-0.2 rs 22.4+-1.2 n 7.3+-1.1 To 14.7+-0.8 rcool 19.9+-0.5 alpha 0.16+-0.007 Peo 9.9+-0.6 || __SVD chain__
 * Sigma 2.34+-0.05**

4.1+-0.2 22.9+-1.0 7.3+-0.8 14.9+-0.7 19.9+-0.4 0.17+-0.005 9.7+-0.5 ||
 * Sigma 2.35+-0.05**

=Comparison of Y Calculations=

This table contains the comparison of the measurements of r500 and Y's for a few clusters, between the original MCMC's, and those run, on the same data, using the SVD. Note that the SVD chains have the position of the decrement fixed.


 * __Original chain__

r500=370+-8 YsphSZ(r500) =31.4+-1.5 YsphX(r500)=29.7+-1.6 ratio=1.06+-0.06
 * A1835**

r500=196+-16 YsphSZ(r500)=5.2+-0.7 YsphX(r500)=6.0+-1.2 ratio=0.86+-0.14
 * MACS0947**

r500=109+-8 YsphSZ(r500)=3.3+-0.3 YsphX(r500)=3.1+-0.5 ratio=1.09+-0.18
 * CL1226**

r500=505+-12 YsphSZ(r500)=45+-3 YsphX(r500)=44+-3 ratio 1.02+-0.05 || __SVD chain__
 * A2204**

365+-5 30.8+-1.52 28.5+-1.4 1.08+-0.04

206+-12 6.0+-0.7 6.7+-1.2 0.90+-0.12

105+-6 3.1+-0.3 2.9+-0.4 1.09+-0.13

506+-11 44+-3 44+-3 1.00+-0.03 || Agreement in values

<=1 sigma <=1 sigma <=1 sigma <=1 sigma

<=1 sigma <=1 sigma <=1 sigma <=1 sigma

<=1 sigma <=1 sigma <=1 sigma <= 1 sigma

<=1 sigma <= 1 sigma <= 1 sigma <= 1 sigma ||