## 2D Catmull-Rom in 4 samples

October 1, 2016 Catmull-Rom (left) vs bilinear (right) interpolation

Many higher-order interpolation schemes can directly be composed from simpler schemes. One such example is the cubic B-spline, which may be reformulated as a weighted sum of two linearly-interpolated samples. And being a separable kernel, it consequently only requires four bilinear samples in 2D and eight trilinear samples in 3D. See Bicubic Filtering in Fewer Tabs for more details. This optimization trick can be applied directly to all interpolation kernels that solely have positive filter coefficients. Here, I will introduce a way to also apply it to Catmull-Rom interpolation (which requires both positive and negative coefficients) that becomes possible after a simple data preprocess step. Go here for an example implementation in ShaderToy.

One-dimensional Catmull-Rom interpolation is calculated using the weighted sum $w_1 C_1+w_2 C_2+w_3 C_3+w_4 C_4$, where $C_1$, $C_2$, $C_3$ and $C_4$ are the control points (e.g. the image values), and The four original weight functions

$w_1=\frac{1}6 (-3 f^3+6f^2-3 f)$,
$w_2=\frac{1}6(9 f^3-15 f^2+6)$,
$w_3=\frac{1}6(-9 f^3+12 f^2+3 f)$ and
$w_4=\frac{1}6(3 f^3-3 f^2)$

are the weight coefficients. $f$ is assumed to be the desired interpolation position between $C_2$ (at $f=0$) and $C_3$ (at $f=1$). Looking at the plot of these four weight coefficients, we can easily verify that $w_1$ and $w_4$ are always nonpositive, while $w_2$ and $w_3$ are always nonnegative.

Linearly-interpolated sampling is effectively ‘lerping’ between neighboring values where $\textrm{lerp}(A,B,t) = (1-t) A + t B$ and $0\leq t \leq 1$ (as well as $0 \leq 1-t \leq 1$) any neighboring control point values $A$ and $B$. Consequently, the result of an (arbitrarily scaled) linearly-interpolated sample at some position $t$ can never equal a weighted sum of two control points with differently signed weights. Hence, in the case of Catmull-Rom interpolation, the control point pair $C_1$ and $C_2$ and the pair $C_3$ and $C_4$ cannot be weighted correctly using linear sampling. The best one can do in a direct way is to linearly interpolate the pair $C_2$ and $C_3$ (as they both have positive weights), and weigh $C_1$ and $C_4$ separately without any interpolation.

We can, however, do better than this by transforming the problem into an equivalent with better properties. Note that the interpolation formula $w_1 C_1+w_2 C_2+w_3 C_3+w_4 C_4$ is equivalent to $(s_1 (-C_1)+s_2 C_2) - (s_3 (-C_3)+s_4 C_4)$ where $s_1=-w_1$, $s_2=w_2$, $s_3=w_3$ and $s_4=-w_4$. Because $s_1$, $s_2$, $s_3$ and $s_4$ are all positive, the control points pair $-C_1$ and $C_2$ and the pair $-C_3$ and $C_4$ can, in contrast, be linearly interpolated. To see exactly how, we first note that $a A+b B = (a+b)*\textrm{lerp}(A,B,b/(a+b))$ for $a \neq -b$. And so,

$(s_1 (-C_1)+s_2 C_2) - (s_3 (-C_3)+s_4 C_4)$

may be rewritten to

$(s1+s2)*\textrm{lerp}(-C_1, C_2, s2/(s1+s2)) - ((s3+s4)*\textrm{lerp}(-C_3, C_4, s4/(s3+s4))$,

which is simply

$(w2-w1)*\textrm{lerp}(-C_1, C_2, w2/(w2-w1)) + (w4-w3)*\textrm{lerp}(-C_3, C_4, w4/(w4-w3))$. The new weight and position functions

Note that a naive implementation of $w_2/(w_2-w_1)$ and $w_4/(w_4-w_3)$ will cause a $0/0$ division at $f=1$ and $f=0,$ respectively. We can prevent this by simplify factoring out $1-f$ from the numerator and denominator of $w_2/(w_2-w_1)$, and $f$ from $w_4/(w_4-w_3)$. See the code listings below for more details.

We’ve now established that we can do Catmull-Rom interpolation between $C_2$ and $C_3$ based on two linearly-interpolated samples (i.e. the sum of the two scaled ‘lerp’s), assuming the left sample interpolates between $-C1$ and $C_2$ instead of $C_1$ and $C_2$, and the right sample interpolates between $-C3$ and $C_4$ instead of $C_3$ and $C_4$. This can be implemented by preprocessing the data to contain $-C1, C_2,-C_3,C_4$ instead of $C_1,C_2,C_3,C_4$. However, to interpolate between $C_3$ and $C_4$, we’d need the control points values $-C_2,C_3,-C_4,C_5$ instead. Luckily, these alternatedly-signed data sequences can be stored in one and the same data table (e.g. image) because interpolating over the control points values $-C2,C_3,-C_4$ and $C_5$ is equivalent to interpolating over the control points values $C_2,-C_3,C_4,-C_5$ and flipping the sign of the result. Consequently, we can correctly implement Catmull-Rom interpolation by pre-flipping the signs for all odd control points positions, as well as doing a conditional sign flip on the result based on the interpolation position. This could be implemented as follows:

// The original value of the i-th control point.
T[i] = …

// Precalculate the alternating-sign version of T[i].
P[i] = (-1)^i * T[i]

// Linearly interpolate between A (at t = 0) and B (at t = 1).
lerp(A, B, t) :=
return A * (1 - t) + B * t

// Return the linearly-interpolated value of P at t.
linear_sample(A, t) :=
return lerp(P[floor(t)], P[floor(t) + 1], u - floor(t))

// Return the value of T at position t using Catmull-Rom interpolation.
catmull_rom_sample(t) :=
f = t - floor(t)
s1 = ( 0.5 * f - 0.5) * f             // = w1 / (1-f)
s2 = (-1.5 * f + 1.0) * f + 1.0       // = w2 / (1-f)
s3 = (-1.5 * f + 2.0) * f + 0.5       // = w3 / f
s4 = ( 0.5 * f - 0.5) * f             // = w4 / f

weight12 = (s2 - s1) * (1.0 - f)      // = w2 - w1
weight34 = (s4 - s3) * f              // = w4 - w3
pos12 = s2 / (s2 - s1)                // = w2 / (w2-w1)

pos23 = s4 / (s4 - s3)                // = w4 / (w4-w3)
sample12 = linear_sample(floor(t) - 1.0 + pos12)
sample34 = linear_sample(floor(t) + 1.0 + pos23)

signed_result = weight12 * sample12 + weight34 * sample34
sign_flip = (1.0 - 2.0 * floor(t - 2.0 * floor(0.5 * t)))

return sign_flip * signed_result

Note that it’s assumed that the storage type used for P[i] can handle negative numbers. If this is not the case, we can still use the above scheme, but P would need to be packed and unpacked during writing and reading, respectively. For images in the range [0,1] this means that data can be packed using P’[i] = P[i] * 0.5 + 0.5, and be unpacked using P[i] = P[i]’ * 2 – 1.

Because a higher-dimensional Catmull-Rom kernel is separable into the tensor product of one-dimensional kernels, the weights for the 2D case are simply the tensor product of the 1D weights calculated separately for the two dimensions. As the sign-flipping rules are solely based on the weight calculations, these rules can be extended to 2D just as easily. Hence, 2D control points should be modified beforehand by pre-multiplying them by $(-1)^(i+j)$ instead of $(-1)^i$, where $i$ and $j$ is the column and row index, respectively. And likewise, to conditionally flip the sign of the interpolated 2D output, the weighted sum of the (now four) bilinear lookups should be multiplied by $(1-2*\textrm{floor}(u-2*\textrm{floor}(u/2)))*(1-2*\textrm{floor}(v-2*\textrm{floor}(v/2)))$ instead of $1-2*\textrm{floor}(u-2*\textrm{floor}(u/2))$, where $u$ and $v$ form the interpolation coordinate. Doing so introduces a checkerboard pattern of sign flips applied both to the 2D input and the output. For the 3D case and up, the same principles apply.

A full (and further optimized) implementation of the 2D case is shown below. This has been written in the GLSL (ES) shader language, and is the same implementation as used in the ShaderToy example at https://www.shadertoy.com/view/4tyGDD.

vec4 SampleTextureBilinearlyAndUnpack(sampler2D tex, vec2 uv)
{
vec4 sample = texture2D(tex, uv, 0.0);
#ifdef PACK_SIGNED_TO_UNSIGNED
sample = 2.0 * sample - 1.0;
#endif // PACK_SIGNED_TO_UNSIGNED
return sample;
}

vec4 SampleTextureCatmullRom4Samples(sampler2D tex, vec2 uv, vec2 texSize)
{
// Based on the standard Catmull-Rom spline: w1*C1+w2*C2+w3*C3+w4*C4, where
// w1 = ((-0.5*f + 1.0)*f - 0.5)*f, w2 = (1.5*f - 2.5)*f*f + 1.0,
// w3 = ((-1.5*f + 2.0)*f + 0.5)*f and w4 = (0.5*f - 0.5)*f*f with f as the
// normalized interpolation position between C2 (at f=0) and C3 (at f=1).

// half_f is a sort of sub-pixelquad fraction, -1 <= half_f < 1.
vec2 half_f     = 2.0 * fract(0.5 * uv * texSize - 0.25) - 1.0;

// f is the regular sub-pixel fraction, 0 <= f < 1. This is equivalent to
// fract(uv * texSize - 0.5), but based on half_f to prevent rounding issues.
vec2 f          = fract(half_f);

vec2 s1 	    = ( 0.5 * f - 0.5) * f;            // = w1 / (1 - f)
vec2 s12        = (-2.0 * f + 1.5) * f + 1.0;      // = (w2 - w1) / (1 - f)
vec2 s34        = ( 2.0 * f - 2.5) * f - 0.5;      // = (w4 - w3) / f

// positions is equivalent to: (floor(uv * texSize - 0.5).xyxy + 0.5 +
// vec4(-1.0 + w2 / (w2 - w1), 1.0 + w4 / (w4 - w3))) / texSize.xyxy.
vec4 positions  = vec4((-f * s12 + s1      ) / (texSize * s12) + uv,
(-f * s34 + s1 + s34) / (texSize * s34) + uv);

// Determine if the output needs to be sign-flipped. Equivalent to .x*.y of
// (1.0 - 2.0 * floor(t - 2.0 * floor(0.5 * t))), where t is uv * texSize - 0.5.
float sign_flip = half_f.x * half_f.y > 0.0 ? 1.0 : -1.0;

vec4 w 		    = vec4(-f * s12 + s12, s34 * f); // = (w2 - w1, w4 - w3)
vec4 weights    = vec4(w.xz * (w.y * sign_flip), w.xz * (w.w * sign_flip));

return SampleTextureBilinearlyAndUnpack(tex, positions.xy) * weights.x +
SampleTextureBilinearlyAndUnpack(tex, positions.zy) * weights.y +
SampleTextureBilinearlyAndUnpack(tex, positions.xw) * weights.z +
SampleTextureBilinearlyAndUnpack(tex, positions.zw) * weights.w;
}