2D Catmull-Rom in 4 samples

October 1, 2016 Graphics, Math
Catmull-Rom (left) vs bilinear (right) interpolation

Catmull-Rom (left) vs bilinear (right) interpolation

Many higher-order interpolation schemes can directly be composed from simpler schemes. One such example is the cubic B-spline, which may be reformulated as a weighted sum of two linearly-interpolated samples. And being a separable kernel, it consequently only requires four bilinear samples in 2D and eight trilinear samples in 3D. See Bicubic Filtering in Fewer Tabs for more details. This optimization trick can be applied directly to all interpolation kernels that solely have positive filter coefficients. Here, I will introduce a way to also apply it to Catmull-Rom interpolation (which requires both positive and negative coefficients) that becomes possible after a simple data preprocess step. Go here for an example implementation in ShaderToy.

One-dimensional Catmull-Rom interpolation is calculated using the weighted sum w_1 C_1+w_2 C_2+w_3 C_3+w_4 C_4, where C_1, C_2, C_3 and C_4 are the control points (e.g. the image values), and

The four original weight functions

The four original weight functions

w_1=\frac{1}6 (-3 f^3+6f^2-3 f),
w_2=\frac{1}6(9 f^3-15 f^2+6),
w_3=\frac{1}6(-9 f^3+12 f^2+3 f) and
w_4=\frac{1}6(3 f^3-3 f^2)

are the weight coefficients. f is assumed to be the desired interpolation position between C_2 (at f=0) and C_3 (at f=1). Looking at the plot of these four weight coefficients, we can easily verify that w_1 and w_4 are always nonpositive, while w_2 and w_3 are always nonnegative.

Linearly-interpolated sampling is effectively ‘lerping’ between neighboring values where \textrm{lerp}(A,B,t) = (1-t) A + t B and 0\leq t \leq 1 (as well as 0 \leq 1-t \leq 1) any neighboring control point values A and B. Consequently, the result of an (arbitrarily scaled) linearly-interpolated sample at some position t can never equal a weighted sum of two control points with differently signed weights. Hence, in the case of Catmull-Rom interpolation, the control point pair C_1 and C_2 and the pair C_3 and C_4 cannot be weighted correctly using linear sampling. The best one can do in a direct way is to linearly interpolate the pair C_2 and C_3 (as they both have positive weights), and weigh C_1 and C_4 separately without any interpolation.

We can, however, do better than this by transforming the problem into an equivalent with better properties. Note that the interpolation formula w_1 C_1+w_2 C_2+w_3 C_3+w_4 C_4 is equivalent to (s_1 (-C_1)+s_2 C_2) - (s_3 (-C_3)+s_4 C_4) where s_1=-w_1, s_2=w_2, s_3=w_3 and s_4=-w_4. Because s_1, s_2, s_3 and s_4 are all positive, the control points pair -C_1 and C_2 and the pair -C_3 and C_4 can, in contrast, be linearly interpolated. To see exactly how, we first note that a A+b B = (a+b)*\textrm{lerp}(A,B,b/(a+b)) for a \neq -b. And so,

(s_1 (-C_1)+s_2 C_2) - (s_3 (-C_3)+s_4 C_4)

may be rewritten to

(s1+s2)*\textrm{lerp}(-C_1, C_2, s2/(s1+s2)) - ((s3+s4)*\textrm{lerp}(-C_3, C_4, s4/(s3+s4)),

which is simply

(w2-w1)*\textrm{lerp}(-C_1, C_2, w2/(w2-w1)) + (w4-w3)*\textrm{lerp}(-C_3, C_4, w4/(w4-w3)).

The new weight and position functions

The new weight and position functions

Note that a naive implementation of w_2/(w_2-w_1) and w_4/(w_4-w_3) will cause a 0/0 division at f=1 and f=0, respectively. We can prevent this by simplify factoring out 1-f from the numerator and denominator of w_2/(w_2-w_1), and f from w_4/(w_4-w_3). See the code listings below for more details.

We’ve now established that we can do Catmull-Rom interpolation between C_2 and C_3 based on two linearly-interpolated samples (i.e. the sum of the two scaled ‘lerp’s), assuming the left sample interpolates between -C1 and C_2 instead of C_1 and C_2, and the right sample interpolates between -C3 and C_4 instead of C_3 and C_4. This can be implemented by preprocessing the data to contain -C1, C_2,-C_3,C_4 instead of C_1,C_2,C_3,C_4. However, to interpolate between C_3 and C_4, we’d need the control points values -C_2,C_3,-C_4,C_5 instead. Luckily, these alternatedly-signed data sequences can be stored in one and the same data table (e.g. image) because interpolating over the control points values -C2,C_3,-C_4 and C_5 is equivalent to interpolating over the control points values C_2,-C_3,C_4,-C_5 and flipping the sign of the result. Consequently, we can correctly implement Catmull-Rom interpolation by pre-flipping the signs for all odd control points positions, as well as doing a conditional sign flip on the result based on the interpolation position. This could be implemented as follows:

// The original value of the i-th control point.
T[i] = …

// Precalculate the alternating-sign version of T[i].
P[i] = (-1)^i * T[i]

// Linearly interpolate between A (at t = 0) and B (at t = 1).
lerp(A, B, t) :=
    return A * (1 - t) + B * t

// Return the linearly-interpolated value of P at t.
linear_sample(A, t) :=
    return lerp(P[floor(t)], P[floor(t) + 1], u - floor(t))

// Return the value of T at position t using Catmull-Rom interpolation.
catmull_rom_sample(t) :=
    f = t - floor(t)
    s1 = ( 0.5 * f - 0.5) * f             // = w1 / (1-f)
    s2 = (-1.5 * f + 1.0) * f + 1.0       // = w2 / (1-f)
    s3 = (-1.5 * f + 2.0) * f + 0.5       // = w3 / f
    s4 = ( 0.5 * f - 0.5) * f             // = w4 / f

    weight12 = (s2 - s1) * (1.0 - f)      // = w2 - w1
    weight34 = (s4 - s3) * f              // = w4 - w3
    pos12 = s2 / (s2 - s1)                // = w2 / (w2-w1)

    pos23 = s4 / (s4 - s3)                // = w4 / (w4-w3)
    sample12 = linear_sample(floor(t) - 1.0 + pos12)
    sample34 = linear_sample(floor(t) + 1.0 + pos23)

    signed_result = weight12 * sample12 + weight34 * sample34
    sign_flip = (1.0 - 2.0 * floor(t - 2.0 * floor(0.5 * t)))

    return sign_flip * signed_result

Note that it’s assumed that the storage type used for P[i] can handle negative numbers. If this is not the case, we can still use the above scheme, but P would need to be packed and unpacked during writing and reading, respectively. For images in the range [0,1] this means that data can be packed using P’[i] = P[i] * 0.5 + 0.5, and be unpacked using P[i] = P[i]’ * 2 – 1.

Because a higher-dimensional Catmull-Rom kernel is separable into the tensor product of one-dimensional kernels, the weights for the 2D case are simply the tensor product of the 1D weights calculated separately for the two dimensions. As the sign-flipping rules are solely based on the weight calculations, these rules can be extended to 2D just as easily. Hence, 2D control points should be modified beforehand by pre-multiplying them by (-1)^(i+j) instead of (-1)^i, where i and j is the column and row index, respectively. And likewise, to conditionally flip the sign of the interpolated 2D output, the weighted sum of the (now four) bilinear lookups should be multiplied by (1-2*\textrm{floor}(u-2*\textrm{floor}(u/2)))*(1-2*\textrm{floor}(v-2*\textrm{floor}(v/2))) instead of 1-2*\textrm{floor}(u-2*\textrm{floor}(u/2)), where u and v form the interpolation coordinate. Doing so introduces a checkerboard pattern of sign flips applied both to the 2D input and the output. For the 3D case and up, the same principles apply.

A full (and further optimized) implementation of the 2D case is shown below. This has been written in the GLSL (ES) shader language, and is the same implementation as used in the ShaderToy example at https://www.shadertoy.com/view/4tyGDD.

vec4 SampleTextureBilinearlyAndUnpack(sampler2D tex, vec2 uv)
{
    vec4 sample = texture2D(tex, uv, 0.0);
#ifdef PACK_SIGNED_TO_UNSIGNED
	sample = 2.0 * sample - 1.0;
#endif // PACK_SIGNED_TO_UNSIGNED
    return sample;
}

vec4 SampleTextureCatmullRom4Samples(sampler2D tex, vec2 uv, vec2 texSize)
{
    // Based on the standard Catmull-Rom spline: w1*C1+w2*C2+w3*C3+w4*C4, where
    // w1 = ((-0.5*f + 1.0)*f - 0.5)*f, w2 = (1.5*f - 2.5)*f*f + 1.0,
    // w3 = ((-1.5*f + 2.0)*f + 0.5)*f and w4 = (0.5*f - 0.5)*f*f with f as the
    // normalized interpolation position between C2 (at f=0) and C3 (at f=1).

    // half_f is a sort of sub-pixelquad fraction, -1 <= half_f < 1.
    vec2 half_f     = 2.0 * fract(0.5 * uv * texSize - 0.25) - 1.0;

    // f is the regular sub-pixel fraction, 0 <= f < 1. This is equivalent to
    // fract(uv * texSize - 0.5), but based on half_f to prevent rounding issues.
    vec2 f          = fract(half_f);

    vec2 s1 	    = ( 0.5 * f - 0.5) * f;            // = w1 / (1 - f)
    vec2 s12        = (-2.0 * f + 1.5) * f + 1.0;      // = (w2 - w1) / (1 - f)
    vec2 s34        = ( 2.0 * f - 2.5) * f - 0.5;      // = (w4 - w3) / f

    // positions is equivalent to: (floor(uv * texSize - 0.5).xyxy + 0.5 +
    // vec4(-1.0 + w2 / (w2 - w1), 1.0 + w4 / (w4 - w3))) / texSize.xyxy.
    vec4 positions  = vec4((-f * s12 + s1      ) / (texSize * s12) + uv,
     				       (-f * s34 + s1 + s34) / (texSize * s34) + uv);

    // Determine if the output needs to be sign-flipped. Equivalent to .x*.y of
    // (1.0 - 2.0 * floor(t - 2.0 * floor(0.5 * t))), where t is uv * texSize - 0.5.
    float sign_flip = half_f.x * half_f.y > 0.0 ? 1.0 : -1.0;

    vec4 w 		    = vec4(-f * s12 + s12, s34 * f); // = (w2 - w1, w4 - w3)
    vec4 weights    = vec4(w.xz * (w.y * sign_flip), w.xz * (w.w * sign_flip));

    return SampleTextureBilinearlyAndUnpack(tex, positions.xy) * weights.x +
           SampleTextureBilinearlyAndUnpack(tex, positions.zy) * weights.y +
     	   SampleTextureBilinearlyAndUnpack(tex, positions.xw) * weights.z +
           SampleTextureBilinearlyAndUnpack(tex, positions.zw) * weights.w;
}

No comments yet.

Leave a comment