Sub-Pixel Cross-Correlations
This update concerns finding the sub-pixel peak of a cross-correlation surface between two images by using image interpolation. For an detailed example of using a quadratic fit to estimate the location of this peak, see the previous update. The quadratic fit in 6 parameters does not appear to be sufficient to accurately locate the XC peak. For example, here is a pair of test images, where the right one has been shifted by (10.625, -10.375) pixels (a 1/8th pixel resolution). This is an exact shift, as both images were created using continuous functions (superimposed 2d Gaussians). Thus, each image is a continuous function sampled at a discrete regular grid of 512x512 points:

A 32x32 subimage (domain) extracted from the shifted image was cross-correlated with a 96x96 subimage (range) from the first image using integer shifts and direct normalized parametric correlation. The peak of the resulting surface was found at:
xc(65, 65) max = 0.998819 at: 11, -11
A 6 parameter elliptical paraboloid was fit to an 11x11 region around this peak, yielding the following values:
11x11 quadfit: bias=0.998242, xcenter=10.2893, ycenter=-10.8163, xscale=-0.00150941, yscale=-0.00879815, angle=-1.25059

The innaccuracy of this result is typical for quadractic fits using smaller or larger regions around the integer peak location:
3x 3 quadfit: bias=1.00009, xcenter=10.6447, ycenter=-10.5465, xscale=-0.00966501, yscale=-0.00210171, angle=0.312934 5x 5 quadfit: bias=1.00009, xcenter=10.5489, ycenter=-10.4866, xscale=-0.00172648, yscale=-0.00948908, angle=-1.26098 7x 7 quadfit: bias=1.00006, xcenter=10.474, ycenter=-10.5833, xscale=-0.00166612, yscale=-0.0093413, angle=-1.25798 9x 9 quadfit: bias=0.999622, xcenter=10.3854, ycenter=-10.6971, xscale=-0.0091127, yscale=-0.00159235, angle=0.316312 11x11 quadfit: bias=0.998242, xcenter=10.2893, ycenter=-10.8163, xscale=-0.00879815, yscale=-0.00150941, angle=-2.82138 13x13 quadfit: bias=0.995309, xcenter=10.1913, ycenter=-10.9303, xscale=-0.00142313, yscale=-0.00840713, angle=-1.2462 15x15 quadfit: bias=0.990296, xcenter=10.0962, ycenter=-11.0298, xscale=-0.00796007, yscale=-0.00133873, angle=3.47123 17x17 quadfit: bias=0.982878, xcenter=10.008, ycenter=-11.1069, xscale=-0.00748189, yscale=-0.00126005, angle=0.335491 19x19 quadfit: bias=0.972976, xcenter=9.9303, ycenter=-11.1558, xscale=-0.00118934, yscale=-0.00699593, angle=-1.22856 21x21 quadfit: bias=0.960718, xcenter=9.86598, ycenter=-11.1727, xscale=-0.00652047, yscale=-0.00112746, angle=0.349843
One could of course experiment with other fitting functions (e.g. asymmetric Gaussians), but there may be little to be gained from this approach. Instead, I have tried a more obvious 'brute force' approach to achieving sub-pixel estimates by enlarging the images to be cross-correlated. However, simply duplicating the pixels does not yield any increase in accuracy. Here is an example of cross-correlating two subimages which have been enlarged 8x by pixel duplication:
xc took 0 secs xc(65, 65) max = 0.998819 at: 11, -11 xc took 226 secs xci(513, 513) max = 0.998819 at: 11, -11

Note that although the resulting XC surface appears nice and smooth, it does not contain any additional information over the non-enlarged result. Note also that while the initial XC calculation took less than one second, the enlarged calculation took almost 4 minutes. What is needed is a way to interpolate the subimages before performing the enlarged XC so as to find a new peak at a different location in the resulting surface. There are zillions of ways to interpolate an image, and a Google search will find some of them. Although some claim to be better than others for specific purposes, bear in mind that they all share a defect in that they involve creating fake data where none existed before. With that said, my attitude as an engineer is to start by trying the simplest of these (also the computationally fastest) until one is found that does the job. To this end I have created a very simple function that interpolates an image by powers of 2 (actually, by (size * 2) - 1 at each step) which works as follows:

At each step, 5 new pixels are created between every 4 existing pixels. Four of the new pixels (on the X and Y gridlines) are the average of the two pixels nearest them (on the same X or Y gridline). The new center pixel is the average of the four corner pixels. This interpolation scheme has the following effect on the enlarged XC calculation:
xc took 0 secs xc(65, 65) max = 0.998819 at: 11, -11 xc took 213 secs xci(513, 513) max = 0.999999 at: 10.625, -10.375

Note that not only does this change yield the correct shift result, but the match is now exact (0.999999... = 1). Pretty good for such a simple interpolation method! However, the 8x enlarged XC calculation still takes a very long time. To address this problem, I have created an iterated interpolation and XC calculation that performs a 2x interpolation and XC of both subimages within only the new 5x5 region around the peak of the previous XC surface. Thus, at each iteration, a new bit of information in the overall fractional shift is obtained (e.g. {1, 0.5, 0.25, 0.125, 0.0625...} pixel shifts), without the need to perform the direct XC computation for the entire interpolated subimages. This iterated calculation produces the same result in a fraction of the time:
xc took 0 secs xc(65, 65) max = 0.998819 at: 11, -11 n = 0, sub2i(63, 63), sub1i(67, 67) xci(5, 5) max = 0.999752 at: -1, 1 n = 1, sub2i(125, 125), sub1i(129, 129) xci(5, 5) max = 0.999886 at: 1, 0 n = 2, sub2i(249, 249), sub1i(253, 253) xci(5, 5) max = 0.999999 at: -1, 1 n = 3, sub2i(497, 497), sub1i(501, 501) xci(5, 5) max = 0.999999 at: 0, 0 xc took 1 secs total shift = (10.625, -10.375)

In this case, 4 iterations (1/16th pixel resolution) took about 1 second (which also included the time to plot the images). At each iteration, the subimages are interpolated by a factor of 2x, and a new XC is performed in the 5x5 region around the previous peak. There is no limit to the number of iterations that could be performed, each yielding another factor of 2x in the resulting resolution. However, the time and memory required at each step increases at least as O(n^2) (and possibly O(n^4)), so that in practice the effective resolution is currently limited to about 6-7 steps.
To test the robustness of this method, I created a set of multiple tests that extracted random subimages of size 16x16 up to 64x64 from the shifted image and performed iterated XCs wrt subimages of 3x that size in the unshifted image. Here are two excerpts from a 4 iteration test:
n x0 y0 dx dy shiftx shifty --- --- --- -- -- ---------- ---------- 1 150 25 20 20 10.625000 -10.375000 2 308 110 62 62 10.625000 -10.375000 3 269 156 56 56 10.625000 -10.375000 4 193 110 54 54 10.625000 -10.375000 5 213 88 39 39 10.625000 -10.375000 6 108 288 51 51 10.625000 -10.375000 7 286 367 62 62 10.625000 -10.375000 8 91 413 22 22 10.625000 -10.375000 9 295 237 22 22 10.625000 -10.375000 10 173 168 50 50 10.625000 -10.375000 ... 91 387 453 20 20 10.625000 -10.375000 92 186 216 38 38 10.625000 -10.375000 93 325 71 36 36 10.625000 -10.375000 94 152 218 64 64 10.625000 -10.375000 95 267 453 28 28 10.625000 -10.375000 96 181 329 31 31 10.625000 -10.375000 97 467 78 16 16 10.625000 -10.375000 98 180 347 45 45 10.625000 -10.375000 99 374 294 62 62 10.625000 -10.375000 100 299 273 48 48 10.625000 -10.375000 run took 53 sec, 0.53 secs / xc
The entire results can be seen here. In all cases the correct shift was found. To further test the method, I created an additional set of slightly more complicated (multiple Gaussians of varying size) shifted and unshifted images as shown below. The right image is shifted by (10 + 63/64, -(10 + 1/64)) pixels:

A 6 iteration set of XC tests on these two images produced the following output:
n x0 y0 dx dy shiftx shifty --- --- --- -- -- ---------- ---------- 1 88 162 60 60 10.984375 -10.015625 2 329 254 37 37 10.984375 -10.015625 3 422 159 24 24 10.984375 -10.015625 4 213 400 53 53 10.984375 -10.015625 5 38 206 21 21 10.984375 -10.015625 6 274 30 30 30 10.984375 -10.015625 7 157 332 60 60 10.984375 -10.015625 8 312 127 55 55 10.984375 -10.015625 9 264 196 42 42 10.984375 -10.015625 10 352 215 60 60 10.984375 -10.015625 ... 91 138 340 52 52 10.984375 -10.015625 92 72 398 42 42 10.984375 -10.015625 93 365 48 45 45 10.984375 -10.015625 94 128 92 45 45 10.984375 -10.015625 95 196 326 56 56 10.984375 -10.015625 96 220 355 50 50 10.984375 -10.015625 97 355 170 33 33 10.984375 -10.015625 98 314 80 59 59 10.984375 -10.015625 99 123 38 25 25 10.984375 -10.015625 100 56 74 33 33 10.984375 -10.015625 run took 472 sec, 4.72 secs / xc
The entire results can be seen here. Again, in all cases the correct shift was found. As a final test, I extracted a subset of one of the processed composite Venus images and shifted it by an irrational amount of (3.14159265358979, -2.71828182846) using FITSRegister and the FFT sinc shift:

I then performed multiple tests on the shifted/unshifted subsets using only 32x32 domains and 96x96 ranges, with 6 levels of iteration:
n x0 y0 dx dy shiftx shifty --- --- --- -- -- ---------- ---------- 1 53 42 32 32 3.140625 -2.718750 2 42 98 32 32 3.140625 -2.718750 3 62 48 32 32 3.140625 -2.718750 4 83 93 32 32 3.140625 -2.718750 5 45 62 32 32 3.140625 -2.718750 6 86 89 32 32 3.140625 -2.718750 7 64 61 32 32 3.140625 -2.718750 8 73 52 32 32 3.140625 -2.718750 9 48 95 32 32 3.140625 -2.718750 10 63 46 32 32 3.140625 -2.718750 run took 24 sec, 2.4 secs / xc
These results are well within 1/64th pixel of accuracy. Of course the real test will be to try this method on different Venus images which are separated in time. However, in that case I don't know how the result will be judged, since I doubt it is possible to align image subsets to 1/64th pixel by eye.