RGBV Texture Compression in DuckTales: Remastered

_{By Shane Calimlim email twitter}

WayForward's art style is one of its signatures. Ensuring that our texture compression can preserve the amazing work done by our artists is a top priority, especially when revisiting beloved franchises like DuckTales. In the past we were bound to paletted pixel art on sprite hardware, but our HD style on modern hardware has very different characteristics. After trying many approaches we landed on something that works well for our art style and excels on a wide variety of images.

In terms of hardware support, DXT is the modern equivalent to palettes. DXT works on 4x4 blocks of pixels, choosing two (16 bit) primary colors, with the ability to represent some colors (linearly-interpolated) between them. Given its blocky nature it tends to artifact in ways similar to JPEG. Like JPEG, DXT works pretty well on most images. Unfortunately our art assets seem to hit DXT’s weak spots on the head and artifacts are very obvious.

Raw

DXT

(1) Images are at 2x zoom.
(2) DXT compression by libsquish with highest quality settings.

Switching away from DXT entirely would be ideal, but the performance sacrifices necessary to do so are unacceptable (much more memory, slower texture lookups). We weren’t alone in reaching this conclusion. There are many variations that seek to utilize the DXT format by applying additional encoding on top of it to improve image quality. We had varying levels of success with these, but kept running into the same issue: while overall color accuracy was improved, blocky artifacts were still common. Most variants also sacrificed the alpha channel to store extra data, but alpha is key to our HD style. This is not surprising, given that DXT works reasonably well on most photorealistic images, so focusing on improving color accuracy beyond DXT’s 16 bit colors is sensible. Modern 3D games also rarely use alpha, so repurposing it incurs no additional cost.

But we were still without a solution, so we opted to try something new. The requirements were strict:

It must pass the scrutiny of art directors and designers
It must not be larger than DXT5, our VRAM budget was already designed around this
It must not impose a heavy shader cost, we were already pushing the GPU hard (1080p @ 60 w/ AA)
It must build quickly, to maintain fast iteration
It must allow us to have soft edges on our characters (lest we fail #1)

Existing techniques improve color accuracy by separating luma/brightness from color data. Our eyes are very sensitive to luminance variation, and splitting this data apart can increase perceived quality significantly. While we opted to focus less on perfect color accuracy and more on reducing DXT block artifacts, we landed on a very similar approach, but admittedly simpler in concept. In our art a lot of color variation was from shading and outlines – both variations in brightness. Outlines in particular tended to introduce more colors in a block than DXT could deal with. We had decided early on to try to keep the shader cost of reconstruction as cheap as possible, ideally a single multiply.

We have one texture with the colors at full-bright and a multiply map to lower them back down. For the color map, colors are scaled around the highest channel. The multiply map stores the value used to scale the color map. Mathematically, our multiply map is equivalent to the V in HSV. And our color map is equivalent to converting to HSV, setting V to max, and converting back to RGB.

Encoding:

if( r == g && g == b ) { // avoid division by zero, and handle gray
	colorMapR = colorMapG = colorMapB = 1.0f;
	mulMap = r;
} else {
	float maxChannel = max( r, g, b );
	colorMapR = r / maxChannel;
	colorMapG = g / maxChannel;
	colorMapB = b / maxChannel;
	mulMap = maxChannel;
}

Color

Multiply

For black pixels, the multiply value is zero – making the color value irrelevant. To reduce artifacts we do some post-processing and ‘borrow’ colors from neighboring pixels when a color value does not matter.

Padded Color

(1) We borrow colors for fully-transparent pixels as well as black, when applicable.
(2) It is important to consider the bit-depth of the destination DXT data when determining whether a pixel is ‘black’; a pixel with a value of 1/255 will floor to zero when stored in a 5/6-bit DXT channel.

We’re left with two maps that fit DXT very well. The color map rarely has more than two distinct colors in a given 4x4 block. The multiply map is single-channel, so its values will always interpolate decently between two endpoints. Reconstruction in the shader is trivial, the output color is simply multiplying the two maps together.

Decoding:

float3 wfDecodeRGBV( float3 colorMap, float mulMap ) {
	return colorMap*mulMap
}

Raw

DXT

RGBV

Show color/multiply maps before and after DXT.
(1) The background is solid white; there is no transparency.
(2) The RGBV image is stored in a DXT5 texture, with the multiply map in the alpha channel.

Adding alpha while staying within our #2 requirement was difficult. Luckily the art style in DuckTales happens to line up with a trick in our texture encoding, so we can work in the subset of alpha we need for (almost) free. Most of the time we only need partial-alpha on the character silhouettes. Character outlines in DuckTales are almost always black. As mentioned earlier, when a pixel is black the color value is irrelevant. We exploit this to store partial alpha values on black pixels.

There are many ways to encode this, but doing so in a manner that interpolates properly and creates minimal artifacts required some trial-and-error. We store our maps in separate DXT1 textures, with the multiply map in the green channel of the second texture, rather than using DXT5 with the multiply map in the alpha channel. This was a gut-decision based on texture-cache characteristics of some of our target platforms – we have not benchmarked to verify if it is worthwhile – and it does reduce the quality of our multiply map. However, this does give us DXT1’s 1-bit alpha flags to play with (and some unused channels that could possibly be used to extend the format later).

If a pixel is non-opaque the color map’s alpha is set to zero and the multiply map stores the alpha value. This reduces the total number of colors DXT can represent in the block from 4 to 3, but this is insignificant in our color maps (and it would always happen within the black outline).

If a pixel is opaque the multiply value is inverted. Our outlines fade out over several pixels, so we have fully-opaque black pixels (rgb=0/255) adjacent to a nearly-opaque ones (a=254/255). With the inversion these two pixels have nearly the same value in the multiply map, preventing artifacts when bilinear filtering is applied.

Encoding:

if( r == g && g == b ) { // avoid division by zero, and handle gray
	colorMapR = colorMapG = colorMapB = 1.0f;
	mulMap = r;
} else {
	float maxChannel = max( r, g, b );
	colorMapR = r / maxChannel;
	colorMapG = g / maxChannel;
	colorMapB = b / maxChannel;
	mulMap = maxChannel;
}
if( a == 1.0f ) {
	colorMapA = 1.0f;
	mulMap = 1.0f - mulMap;
} else {
	colorMapA = 0.0f;
	mulMap = a;
}

Multiply with Alpha

Reconstruction is more expensive than the original single-mul variant of our approach, but still very cheap. If the color map is not opaque black is returned with alpha equal to the value in the multiply texture. Otherwise, the color map is multiplied by the re-inverted multiply texture and returned with full alpha.

Decoding:

if( tex0.a != 1.0 ) {
    return float4( 0.0f, 0.0f, 0.0f, tex1.g );
} else {
    float3 mask = 1.0f - tex1.ggg;
    return float4 ( tex0.rgb*mask, 1.0f );
}

To remove the branch, we use this slight modification:

float3 mask = (1.0f-tex1.ggg) * floor(tex0.aaa);
return float4( tex0.rgb*mask, tex0.a+tex1.g );

The overall result is an image the same size as DXT5, trading some alpha precision for a large reduction in color artifacts. The color map also lends itself well to lossless compression, reducing disk size significantly compared to unmodified DXT when used with wfLZ.

Raw

DXT

RGBV

Show color/multiply maps before and after DXT.
(1) This trick isn't always sufficient, and our tools offer the option of a full DXT alpha channel for a 50% size increase.
(2) Storing the multiply map in DXT1's green channel rather than DXT5's alpha channel decreases quality noticably. (compare with opaque variant above)

Although this method was designed specifically for our art style, to our surpise it works very well on a wide variety of images.

Raw

RGBV

Show color/multiply maps before and after DXT.

Raw

RGBV

Show color/multiply maps before and after DXT.

Texture filtering is not mathematically perfect, but differences are negligible.

Raw (half texel offset, bilinear filtering)

RGBV (half texel offset, bilinear filtering)

There is still room for improvement, especially in general-purpose and photorealistic use.

If the algorithm were more DXT-block aware it could sacrifice color accuracy on pixels with low luminance when the block contained multiple high luminance colors.
For opaque variants, a lower-resolution color map probably works well.
The algorithm attempts to take into account DXT bit depth, but only at select times. Making the entire algorithm aware of these limitations would improve accuracy.
The primary goal of this algorithm is to reduce DXT block artifacts, tighter integration with the DXT encoder would likely improve results.
The color map is independent of the multiply map. The multiply map could be made after the color map is converted to DXT, and a more accurate multiply map generated. This process could potentially be made iterative.
Gray is encoded as pure white in the color map, which may not always be optimal. Gray is an edge case most of the time, but a smarter encoding algorithm could make vast improvements in its handling. In the simple version of the algorithm the entire burden of representing gray lies with the multiply map; this could be split between both maps, improving precision greatly in scenarios where the color map can accomodate extra data without loss.
Precision should be gained by adding scaling. Simple map-wide scaling factors could be added:
```
map = tex*scale + min; // with scale and min passed in as shader uniforms
```
Id Software’s YCoCg variant uses a more advanced per-block scaling method that could be applied here as well.
Our approach is very similar to RGBM encoding, but RGBM is aimed at HDR framebuffers not DXT textures. We have not investigated extending our format to HDR, but a combination of the two could work well.
The free-alpha-on-black variant could be reworked to use DXT5's alpha channel for the multiply map. If we use that trick for another game, we will definitely switch. Alternatively, the unused DXT1 channels could possibly be used to store alternate outline colors. But writing an encoder that could do this automatically would be difficult.

Below is an example encoder for opaque textures stored as DXT5:

#define WF_RGBA_R( pixel ) ((pixel >> 16) & 0xff)
#define WF_RGBA_G( pixel ) ((pixel >> 8) & 0xff)
#define WF_RGBA_B( pixel ) (pixel & 0xff)
#define WF_RGBA( r, g, b, a ) \
	( (((uint32_t)(a) & 0xff) << 24) | (((uint32_t)(r) & 0xff) << 16) | (((uint32_t)(g) & 0xff) << 8) | ((uint32_t)(b) & 0xff) )

void wfRGBV_OpaqueDXT5( const uint32_t width, const uint32_t height, const uint32_t* in, uint32_t* out ) {
	uint32_t pixelIdx;
	const uint32_t numPixels = width*height;
	for( pixelIdx = 0; pixelIdx != numPixels; ++pixelIdx, ++in, ++out ) {
		const uint32_t srcPixel = *in;
		const uint32_t r = WF_RGBA_R( srcPixel );
		const uint32_t g = WF_RGBA_G( srcPixel );
		const uint32_t b = WF_RGBA_B( srcPixel );
		if( r == g && g == b ) {
			*out = WF_RGBA( 255, 255, 255, r );
		} else {
			const uint32_t maxChannel = (r>=g&&r>=b) ? r : (g>=r&&g>=b) ? g : b ;
			const uint32_t cr = (r*255) / maxChannel;
			const uint32_t cg = (g*255) / maxChannel;
			const uint32_t cb = (b*255) / maxChannel;
			*out = WF_RGBA( cr, cg, cb, maxChannel );
		}
	}
}