Card driver writers should first understand how and why the current
allocation methods work the way they do, before deciding whether to
override them with card-specific routines.
[More details below the diagram]
"back buffers" are allocated by the generic routines, as if they were
a window in part of the visible video memory area. Visible windows are not
a contiguous chunk of memory. They have their first row of pixels together,
but then, unless they are the width of the entire screen, there is
a gap in memory until their next row of pixels.
The number of bytes between coordinate (x,y), and coordinate (x, y+1) will
always be
(width of screen in pixels) x (bytes per pixel).
Certain hardware accelerated routines depend on this fact, when copying rectangles of pixels from one part of the screen to another. In the XAA set of routines, this is called a "Screen To Screen Copy". The interesting thing is that when allocating 'back buffer' memory, if we stick to the same memory allocation conventions as regular windows, we can treat all of video memory as part of the same "Screen", even though it is not all visible to the user.
In this manner, we can use XAA's "ScreenToScreenCopy" routines to get hardware accelerated pixel copies (BitBlits), without driver writers having to write a separate routine for their card. Using the XAA extension, we take advantage of the fast bitblit routines that have already been written as part of the X server's 2d driver.
The cost for this generic speed, is some inefficiency of memory allocation. If a GLX-enabled window needs a back-buffer allocated, it will be allocated with the same "gap" per screen-line as the original window. The difference between normal windows and our back-buffers is that the user may squeeze other windows into the memory areas to the "right" of the visible window, whereas we do not try to squeeze extra use out of the memory in the unused portions of the back-buffer allocation.
The same memory penalty will apply to PBuffers, and, if needed, backbuffers for PBuffers. In order to allow for XAA-accelerated copies, these objects need to have their rows aligned to the "screen's" concept of rows, as described above.
The memory penalty can be avoided if a card driver implements its own hardware accelerated bitblit, SwapBuffer, AND backbuffer/PBuffer allocation routines. However, given that texture memory does not have a gap penalty, and allocation of back-buffers is a relatively rare operation, there may not be significant memory gains in doing so.
(On the other hand, there may be performance gains in doing so)
PS: The generic software renderer does not actually store textures in video memory (yet?). However, it provides a pre-initialized framework for drivers to allocate texture memory in video memory. To simplify maintainance (and potentially take advantage of new and impressive texture allocation tweaks down the road) card-specific driver writers are encouraged to use the common framework for texture mem allocation.