An additional plotting option for OS_SpriteOp 34
David Williams (2619) 103 posts |
And possibly also OS_SpriteOp 52 and 56. I think that the vast majority of custom-written sprite plotters simply skip ‘black’ pixels from the source sprite (by black, I mean the source pixel is represented as a zero byte in the case of 8 bpp, or a zero word in the case of 32 bpp [although sometimes, in the case of an 8:8:8:8 ABGR pixel, you might just want to test if the BGR components are all zero because the ‘alpha byte’ might contain something useful – certain bits for collision detection purposes, for example]). Certainly, nearly all of my sprite plotters (most of them in x86 assembly language, admittedly) skip black pixels when plotting what we usually think of as “masked sprites” (not really using a mask, of course). I have found that using black/zero as a transparency key has hardly ever proved to be a restriction in practice (because if you do need to plot actual black pixels, you can either plot extremely dark grey ones, or – if possible – do some palette modification!). Plotting ‘proper’ masked sprites via OS_SpriteOp 34 (setting bit 3 of R5 – “Use sprite mask”) is very slow, as we all know. We recognise that this is necessarily so because SpriteOp’s a very general routine that has to handle a variety of possibilities – clipping, different source colour depths which may differ from the destination’s colour depth, and it’s got to read data from the destination buffer/screen as well as mask data. [EDIT 27-Mar-2015]: Plotting masked sprites with SpriteOp 34 isn’t quite as slow as I led myself to believe – actually I’ve got 80 64×64 masked sprites plotted over a 640×480 background sprite in mode 28 at 60 fps on my RPi2. None of the sprite’s base addresses (for the actual pixel data) are currently 64-byte aligned – or not usually anyway, so that’s quite encouraging. (I’ll ensure that all graphics data base addresses are correctly aligned – critical for the RPi2’s CPU, I believe.) So, I’m thinking, to help those of us who like to make games or demos and whatnot in BASIC, and finding SpriteOp 34 too slow for plotting masked sprites (if a few dozen fairly large ones are required), how about offering the option of faster sprite plotting (transparency key colour black/zero), through setting, say, bit 4 of R5? In this case, it’s just a simple put-pixel and doesn’t have to read any background pixels. If the source pixel’s black, just skip it, otherwise plot it straight. It should be quite a bit faster than plotting masked sprites with R5 = 8. And it’s also more cache-friendly as there’s less data to read (no screen/frame buffer data, no mask data). SYS “OS_SpriteOp”, 512+34, sprArea%, mySpr%, X%, Y%, 16 Can anyone make the dream come true? I believe the speed gains would be worthwhile. David. |
David Williams (2619) 103 posts |
No interest? I think I’m about 24 years too late because back then it really would have made more of a difference (on ARM2 machines especially, I suspect), and there would certainly have been some support for the idea (plotting masked sprites with OS_SpriteOp on ARM2/3 machines was so slow). Actually, an old (1994) memory comes to mind: I wrote a pretty naff game in BASIC called !Elevator which plotted masked sprites in MODE 9 over a full-screen background sprite. 50 fps! Well, I cheated a little (much to the annoyance of a certain Steve Harrison): I employed OS_UpdateMEMC to give my ARM2-based A3000 a little boost. Without it, the frame rate dropped to 25 fps. I’m considering writing my own module to emulate some of the functionality of OS_SpriteOp 34, to include the skip-black-pixels functionality that’s the topic of this thread. Or perhaps I’ll move on to something else… David. |
Rick Murray (539) 13806 posts |
I think the problem is that OS_SpriteOp has to be a “jack of all trades” to deal with plotting all sorts of sprites in all sorts of modes. All of the games that I examined (which could probably be counted on the fingers of one hand) used sprite data in a custom format optimised for the mode in use, and used a custom plotter routine optimised to place the optimised data into the screen as necessary. Somewhere I have a copy of the Archimedes game manual which talks about writing this sort of code, and you can build in some extra functions such as collision detection; stuff that SpriteOp wouldn’t know how to do. The long and the short being that SpriteOp only seems fast now ’cos it is running on a ~700MHz+ processor. ;-) That said, the option to skip pixels of a certain colour is interesting; and may be a useful way to speed up masked sprites. Would it be possible to make the colour configurable? I tend to prefer using magenta as my “don’t plot” colour. |
Chris Evans (457) 1614 posts |
As a non programmer my thought was: have you been able to do some test code that confirms your premise? |
Jon Abbott (1421) 2641 posts |
Doesn’t SpriteOp build dynamic code to do the plotting – or am I imagining that? I’m sure Jeffrey did a big update on it all recently to improve efficiency, so I’d expect plotting to be about as optimal as you can get it without hand crafting. Has anyone looked at using a GPU for sprite plotting? An as yet underutilised piece of hardware on the Pi/Pi2 as far as I know. |
David Williams (2619) 103 posts |
Rick: Last month I coded a shedload of sprite routines for my game library (some ARM-specific annoyances apart, it was nice coming back to ARM assembly language!). Most of the routines (with the exception of the alpha blenders) relied on the ‘skip black pixels’ method which is certainly worthwhile for 32bpp sprites. Back in the early 90s I, like everyone else, coded sprite routines using the fairly standard interwoven sprite/mask data method, black pixel skipping, and also ‘compiled’ suitable sprites into sets of ARM instructions (probably not recommended these days, but it was certainly suitable for the ARM2). I wrote a graphics library called GFXLIB for ‘BBC BASIC for Windows’ and 99% of the sprite plotters use the skip-black-pixels method which is very fast on the x86 (IIRC, GFXLIB_Plot2 allows you to specify an alternative transparency key colour in case you need to plot black pixels). Using a separate (even interwoven) sprite mask on the x86, at least when using non-SIMD instructions, was slower than skipping black pixels when I last tested it. Compiling sprites into x86 instructions is also slower, IIRC. My current interest (before it fades) is making graphical ditties, demos and games in BBC BASIC – no assembly language (which already makes my game library largely redundant!). I expect I may have to resort to a little ARM code for speed-critical tasks in some cases. You wrote: “That said, the option to skip pixels of a certain colour is interesting; and may be a useful way to speed up masked sprites. Would it be possible to make the colour configurable? I tend to prefer using magenta as my “don’t plot” colour.” One possibility might be (from memory – I should really glance at the PRM), in the case of 8bpp unmasked sprites, is setting bit 4 of R5 to invoke pixel skipping in SpriteOp (34), and the value in bits 8-15 could determine the 8-bit transparency key colour. So then you’ll have a choice at practically no cost. For 32bpp unmasked sprites, bits 8-31 might specify the 24-bit RGB (or BGR) transparency key – &FF00FFxx would be magenta, for instance. :) (Someone’s going to tell me that some of those bits are already taken.) I reckon, somewhat presumptuously, that if SpriteOp 34 on my RPi2 can draw around 70 64×64 masked 256-colour sprites in Mode 28 at 60 fps (quite impressive – considering), I think with ‘pixel skipping’ you’re looking at at least twice that many. Difficult to test, though, admittedly. David. |
Jeffrey Lee (213) 6048 posts |
The SpriteOps which are handled by SpriteExtend (e.g. 50, 52, 55, 56) do use a code generator, yes. The code it generates is probably far from optimal (there’s only so much that can be done without implementing a full-on compiler/optimiser), but SpriteExtend is smart enough to call through to the faster kernel routines (OS_SpriteOp 34, 49) wherever possible.
It’s nominally on the GraphicsV roadmap but the API/implementation details haven’t been fully fleshed out yet. Certainly one thing we’d need to do to make it worthwhile would be to make sure there’s some way of batching up render calls – either transparent batching within a driver (e.g. push all render ops into a queue that’s processed by the GPU, and only block for the queue to complete when screen memory is next read/written by the CPU – which would require the driver to be able to enable/disable screen access by fiddling the page tables or similar), or a more explicit API within the OS itself (which will make it tricky to provide good support for in the desktop). For the Pi, the GPU implements support for OpenVG, so under the hood that’s probably what we’d want to use for rendering hardware accelerated sprites. One possibility might be (from memory – I should really glance at the PRM), in the case of 8bpp unmasked sprites, is setting bit 4 of R5 to invoke pixel skipping in SpriteOp (34), and the value in bits 8-15 could determine the 8-bit transparency key colour. So then you’ll have a choice at practically no cost. For 32bpp unmasked sprites, bits 8-31 might specify the 24-bit RGB (or BGR) transparency key – &FF00FFxx would be magenta, for instance. :) OS_SpriteOp 34 is fine, but if you wanted the same feature to be available via OS_SpriteOp 52/56/65 then you’d have more trouble since their flags take up 16 bits (reference). Having just had a quick look at the kernel source, it looks like OS_SpriteOp 34 (when dealing with masked sprites) does a simple read-modify-write sequence on every screen pixel which is underneath the sprite. This is obviously pretty bad in terms of performance, unless you’re dealing with <256 colour sprites with large numbers of transparent pixels. Try using OS_SpriteOp 52 instead – SpriteExtend knows that screen reads are slow and will avoid reading pixels unless it actually needs them for blending/GCOL actions (and obviously avoids writing transparent pixels to screen). It might also be worth experimenting with “new” (RISC OS 3.5) format sprite mode words vs. old format (mode numbers) – the new format will use a 1bpp mask, while the old format will use a mask at the same bpp as the sprite. So the new format will require less data to be read, reducing load on the cache/bus. I do have plans to make more improvements to SpriteExtend in the future, but at the moment those are mainly focused around CMYK support rather than extra transparency modes. |
Sprow (202) 1155 posts |
For the generalised OS_SpriteOp zero doesn’t have to be black, since the palette could set colour 0 to be any other colour, and you can have more than one black on screen at the same time again through the palette.
The OS watches out for all solid replacements and skips the screen read where possible (hunt the sources for “AvoidScreenReads”). There are a couple of places in SpriteExtend that don’t, which is why the 16 colour icon plotting timings for ROMark often come out low for the Iyonix where the graphics memory is a long long way away. When I last looked, the mask read was done at the end of a loop which was too early to know whether to avoid the screen read or not, and there weren’t enough spare registers to remember that fact so the screen read went ahead. You might find that plotting 32bpp sprites is quicker (on some platforms) for that reason, because 32bpp implies RISC OS 3.50 format sprites, which implies 1 bit per pixel mask, which implies fewer mask reads, and an easy test of whether to skip the word read or not. |
David Williams (2619) 103 posts |
Jeffrey & Sprow, thanks for the information. I intend to browse the RO source in the near future, not least because I’m quite impressed by the speed of the OS’s flood fill routine! A little graphical ditty I released recently relies heavily on it, and achieves 60 fps on an RPi2 (but probably not on an RPi1). Jeffrey: I did a simple speed comparison between SpriteOp 34 and 52. I timed 1000 plots each of a 256×256×8bpp (Mode 28) masked sprite, and they both took practically the same amount of time (within 1 centisecond). I seem to remember reading something a certain S. Wilson wrote decades ago on Usenet about speeding up sprite plotting with SpriteOp, and I vaguely recall her writing that in the case of no scale factors specified, and no pixel translation table, SpriteOp 52 just calls SpriteOp 34 itself anyway. I may have my wires crossed a little here, and the code may have changed since then. Chris: You wrote: “As a non programmer my thought was: have you been able to do some test code that confirms your premise?” Well I could end up quite surprised! I’ll do some tests in the next few days. David. |
Jeffrey Lee (213) 6048 posts |
Yeah, I thought that SpriteExtend would handle all masked sprite plots itself (at least for >=8bpp), but it looks like I was wrong. But if you change the sprite to have a RISC OS 3.5 mode word then that will force SpriteExtend to handle it itself (because the kernel routines are sloooow for 1bpp masks), which will hopefully result in a speed increase due to SpriteExtend skipping the masked pixels. |
David Williams (2619) 103 posts |
Slightly off-topic, but one SpriteOp that I (and others in the distant past) feel has been missing from the set, is one which copies a sprite into its own mask such that non-zero pixels in the sprite are copied as 2^bpp-1 (all bits set in corresponding pixels) in the mask; zero-value pixels in the sprite being written as zero in the mask. Kinda like a cookie cutter? Alternatively, in a similar vein to what we’ve been discussing in this thread, you could perhaps specify your transparency colour which may not be zero (in case you need zero-value pixels in the source sprite to be plotted); all other pixel values get written as 2^bpp-1. So, in the case of an 8bpp sprite, and an 8bpp mask, non-transparent pixels in the sprite would get written as &FF in the mask. (Or for extra flexibility, specifying which value gets written to the mask might be worthy of consideration.) I really needed this kind of functionality a few days ago because SpriteOp 49 doesn’t seem to be doing what I expected it to (see my post at comp.sys.acorn.programmer). Most likely down to a misunderstanding on my part. :) Anyway, a little bit of assembly language (which I absolutely wanted to avoid) solved my particular problem. Here’s the result (effectively, flood-filling with BASIC’s FILL command over a starfield): http://www.proggies.uk/risc_os/rool_logo_2 (Filetype to &FFB/BASIC; non-RPi users need to set the RPi% flag to FALSE) If anyone gets any error messages, then please let me know. I’m not yet sufficiently familiar with memory allocation, and many other things, on RISC OS. David. |