Proposed GraphicsV enhancements
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13
Jeffrey Lee (213) 6048 posts |
Roger that! I’ll have a go at putting something together |
Steve Revill (20) 1361 posts |
Cheers. If things aren’t at the right point, feel free to defer my request and keep focus on the important stuff; just be aware that I’ll be asking you for this at the point where we start gearing-up to 5.22. |
Chris Hall (132) 3554 posts |
If you want to try out some stuff on a user, feel free to do so. I am good at commenting on documentation… |
Doug Webb (190) 1180 posts |
Re: Jeffrey’s submission 15th Dec.
I have tried it on my Iyonix with a Gainward Pro660 Fx5200 128MB card fitted and with Geminus installed and I get a rather funky and garbled screen when trying 32K and 64K colour screens with the 2nd Jan 14 ROM and HardDisc image installed. Removing Geminus and rebooting allows the 32K and 64K screens to work correctly. I’m happy to continue without Geminus, unless someone knows how to disable the built in RGB swapping as I believe I also lose some screen acceration options without it as well. Apart from that another excellent addition to RISC OS 5 and more fantastic work from Jeffrey. |
Jeffrey Lee (213) 6048 posts |
Yes, it looks like the red/blue swapped modes are causing problems with Geminus. At the moment I’m thinking the best way of fixing it would be to add a system variable or command which can be used to control red/blue swapping of modes on the Iyonix. That way people will be able to disable red/blue swapping of 32K modes if needed, and it could also allow control over red/blue swapping of 8bpp and 16M colour modes (to allow unmodified cards to be used – similar to the PC RGB feature in Geminus). Eventually I’m planning on updating the screen setup plugin in Configure to allow for driver-specific features to be configured (red/blue swapping on Iyonix, overscan settings on the Pi, TV offset on anything with TV-out, etc.) |
Jeffrey Lee (213) 6048 posts |
When I next have a chance (maybe this weekend?) I think I’ll have a go at rewriting the proposal page so that it correctly reflects the current state of things and what’s left on the todo list. There’s been a fair bit of interest recently in a few different areas of the work so it would be good to update the page to either explain how I think certain things should be implemented or to give a basic roadmap on the implementation order that makes the most sense to me. And with any luck it will reveal some relatively self-contained tasks to allow more people to get involved if they wish. |
William Harden (2174) 244 posts |
ScrnSetup needs a lot of work for different reasons (EDID, multi-monitor, plus your suggestions above). Writing down what you need from above (likely UI changes and outputs) would be useful. I personally think Screen saver may have to be a self-contained plugin as there are a lot of things that now need to come into ScrnSetup and ScreenSaver would function well independently of ScrnSetup. For my bits: ScrnSetup needs a grouped radio ‘Use monitor mode information (EDID)’ versus ‘Use Monitor Definition file’. We then move the MDF selector up with that into a new group. If EDID is selected, the MDF selector is greyed. At plugin start, we try to read EDID and if the registers are unchanged on exit we grey the EDID selection because we cannot support it. In the group below we have mode info. For outputs, if EDID is selected, the command is X ReadEDID. For MDFs we function as previously. I have not yet got a SWI allocation for preferred mode. So for now EDID should just offer its selections exactly as MDFs do. If we have a SWI ScreenModes_GetPreferredMode we read a VIDC3 data block of mode data, and present our preferred option when the radio is changed. A neat option would be to alter *WimpMode to have a *WimpMode Auto option which then calls ScreenModes_GetPreferredMode and sets the mode. This would allow an option button in modes of ‘Auto’ which if selected changes the screen setup output to *WimpMode Auto. The resulting configuration says on startup ‘read the EDID and set the mode to whatever the monitor wants’. |
William Harden (2174) 244 posts |
Jeffrey – also meant to say: I have sorted out a Dropbox for ROOL for EDID stuff (whilst it is pending review and submission). ROOL folk can invite to it, so if you want/need access then let me know or them (I don’t have your email address). I will put the current revision in later tonight. |
Jeffrey Lee (213) 6048 posts |
The rewritten proposal page is now up, although I think it might be a bit more scatterbrained than the previous version. However it should fully document everything that I have planned so far, whether it’s effectively a full specification (like the pointer changes) or just a quick “this needs fixing but I haven’t decided how yet”. You’ll also spot that there isn’t an area dedicated to the display manager or the screen setup plugin – because I haven’t really given those much thought. After all, the document is primarily about what needs changing in GraphicsV, not what needs changing in other parts of the system in order to properly take advantage of those GraphicsV changes. But there are a few notes dotted around in each individual section wherever I think display manager or screen setup changes are required. |
Chris Hall (132) 3554 posts |
Wow! I’m impressed. Coding and documentation… |
Jeffrey Lee (213) 6048 posts |
There’s an interesting thing I’ve spotted recently – now that the display manager asks for a full 256 colour palette when you select a 256 colour mode, redrawing the desktop (particularly filer windows) is several times slower than it was before (on an Iyonix, at least). After doing a bit of profiling last night it looks like most of the time is being spent in ColourTrans_SelectTable, where the Wimp is building translation tables to map from the filer sprites to the desktop palette. If the default 256 colour palette is used then ColourTrans will use a specially optimised routine for generating the translation tables, but in full 256 colour modes the check for that optimisation is skipped (even though we’re still using the default palette) – so it falls back to a much slower brute-force routine for generating the translation tables. So the easy fix for this is to allow ColourTrans to detect when the default 256 colour palette is being used in a full 256 colour mode, so that it can use its optimised routine as usual. But I’m also considering some other improvements that should help things further:
|
Jon Abbott (1421) 2651 posts |
I did wonder why desktop 8bpp modes were so slow under RO5, I can now see why. Where is the brute force routine? It sounds like it could probably do with a rewrite if it’s still optimised for ARM3. Everything you’ve suggested sounds like a good route to take. I’ve never looked at ColourTrans or had and reason to use it, but could it not cache translated palettes and just return the pointer to one it’s seen before? I don’t know how it works, so am not sure if that’s actually possible. |
Jeffrey Lee (213) 6048 posts |
ColourTrans_SelectTable calls best_colour_safe 256 times, in order to map each source palette entry to a destination palette entry. And best_colour_safe is merely a wrapper for best_colour_fast, which is just a wrapper for the FindCol macro, which will invoke the CompErr macro 256 times in a loop in order to find the closest output colour for the given input colour. (all those routines are in that same source file) You can’t really avoid calling best_colour_safe 256 times (after all, you have 256 palette entries to find the nearest colour for), so the key bit is either making CompErr faster or avoiding calling it 256 times from within FindCol. (Or cache the results better) There’s also best_colour256_safe, which is what’s used when ColourTrans detects the default 256 colour palette. That one’s significantly faster because it only loops 4 times (one per tint value) instead of 256 times.
Yeah, I’ve basically ended up with three versions of each routine:
So far I’ve only been testing the code on my Iyonix, but tonight I’m planning on trying the BB and RiscPC to get some timings from that. But having only touched ColourTrans so far, the prognosis is good – I’ve fixed the default palette detection so that full 256 colour modes aren’t hideously slow any more, I’ve added the optimised greyscale function (which makes 256 greyscale modes about 20% faster than 256 colour modes – in previous OS versions they would have been about 5 times slower than 256 colour modes due to using the generic best_colour_safe function), and I’ve optimised the various routines so that they’re significantly faster than before. E.g. Find256 (which is the core of best_colour256_safe) should now be over twice as fast as it was before, even on ARM2.
It caches the ‘32K’ style tables that are used to map true colour sprites to palettes (as those require significantly more work to compute – I think taking at least a second or two per table on an ARM610), but it doesn’t bother caching the regular lookup tables for mapping from one palette to another. Perhaps that would be worth implementing, at least for the case where the slow best_colour_safe routine is used (the other routines are fast enough that it’s not really necessary). However it’s also arguable that it should be the caller’s responsibility for caching the tables better, especially the Wimp which will generally be the one piece of software which uses ColourTrans the most for translating palettes. |
Sprow (202) 1158 posts |
I recall sitting through Ben (Avison’s) quite lengthy rant about the Iyonix sprites each of which has a customised 256 entry palette, and how that thrashes the colour lookup code in the Wimp. Compare with the Ursula sprites, which all use a default desktop palette. From your list
I’d not bother doing anything special with 256 greys. |
Jeffrey Lee (213) 6048 posts |
Too late – I’ve already written the code! The ColourTrans changes have now been submitted to CVS. Here are some unscientific timings, based around how long it takes to redraw the desktop in a 256 colour mode, with the only window visible being a filer window containing around 80 files. Iyonix (1920×1200)Apart from the obvious fixes, there are also some minor gains thanks to changes such as not converting the errors to absolute values before squaring them.
BB-xM (1280×1024)Some significant gains here for non-default palettes due to creating a CompErr implementation that is scheduled for the A8 pipeline.
StrongARM RiscPC (1280×1024)Being an IOMD build, this has to support ARMv3, and so the code to deal with slow multiplies is still in place. So best_colour_safe doesn’t see any gains (and actually gets a bit slower somehow)
|
WPB (1391) 352 posts |
Great results! You must be pleased with that. Do you expect similar speed-ups on the R-Pi, or is the instruction scheduling not going to result in the same gains? |
Chris Evans (457) 1614 posts |
Thanks. Great work. |
Jeffrey Lee (213) 6048 posts |
I’d expect the gains to be somewhere inbetween the gains for the Iyonix and the BB, although I’m not quite sure where. Reduced instruction count (due to some ARMv6+ instructions) will definitely help, as should the better scheduling. But the Cortex-A8 should always gain the most from proper scheduling because it has two pipelines to fill instead of one (and so can do roughly twice as much work while waiting for the results of slow instructions like MUL/MLA). For comparison do you have any figures for 16m colour modes. i.e. is 256C still slower than 16M? Luckily for you I did do a brief bit of testing in true-colour modes. These changes won’t speed them up at all, but the figures were (roughly) 60cs for 32K/64K modes and 90cs for 16M colour modes. So 256 colour modes are back as being the fastest available on the Iyonix. |
Jon Abbott (1421) 2651 posts |
Out of interest, have you tried StrongARM without the fix for slow MUL? Certainly sounds like some great improvements, I’ll be going back to a 256 colour desktop as soon as it’s available :) Regards the Wimp, I agree it should do the caching, perhaps it should cache all sprite palettes once translated. Sprow mentioned it’s using the RMA, would it not be advisable to move it to it’s own DA? |
Jeffrey Lee (213) 6048 posts |
No. But I’d expect it to see around the same gains as the Iyonix – i.e. about 10% faster for non-default palettes, and maybe a small gain for default palettes.
It’s available now – the changes are in today’s development ROMs.
At the moment I’m only thinking about caching the most recently used translation table, whether the sprite is from the Wimp sprite pools or elsewhere. But extending it to cache translation tables for all the sprites in the Wimp sprite pools would probably be wise, and shouldn’t be too tricky to implement considering that the Wimp tries to keep those sprite pools shielded from direct manipulation. I.e. we can assume that nothing’s going to be modifying Wimp sprite palettes without the Wimp knowing about it. One thing I’m interested in seeing is how much the filer redraw order will affect things, so I’ll definitely be trying that out at some point. The good thing about that change is that it should help all screen modes, not just palettised ones. |
Jeffrey Lee (213) 6048 posts |
Now in CVS:
Some brief stats, using the same basic setup as before (although not all modes tested): Iyonix
BB-xM
RiscPC
I wonder how long it will take me to get used to the new filer window redraw order – it’s odd seeing them redraw in what looks like a random order. But it is definitely faster than it was before! |
Jon Abbott (1421) 2651 posts |
I’ve just taken a quick look at the code, how accurate does the result have to be? Instead of using the Pythagorean equation in CompErr to work out the 3d vector distance between two colours, could we not use an approximation? FindCol could also exit early for an exact match. A further improvement may be to pre-calculate a distance table for each palette when it’s loaded that contains the distance from (0,0,0) for each entry and then in FindCol calculate the distance from (0,0,0) for the colour you’re searching for and look for the closest value, eliminating CompErr entirely during the redraw phase. What are all the references to load/loading? Is that a weighting to correct the RGB colour space? |
William Harden (2174) 244 posts |
Jeffrey: a less technical addition to the problem – but why do the Raspberry Pi sprites have custom palettes at all? Could we not redo them with default palettes? |
Michael Drake (88) 336 posts |
Because they don’t use colours that exist in the default palette, and have subtle graduated fills. You can convert them to use the default palette using error diffusion, but the result is pretty nasty looking. Anyway, iirc all the toolsprites share the same palette anyway, so it should be quite optimal with caching. Jeffrey: Is there any more scope for optimisation of 16M colour modes? I don’t think I’ve had reason to use the desktop with <16M colours for years. |
Jeffrey Lee (213) 6048 posts |
It basically already is an approximation, when you consider that it’s operating in RGB colourspace with an unspecified gamma curve :)
You can’t just calculate the distance from (0,0,0) in the source and destination palettes and match them up. (r12 + g12 + b12) – (r22 + g22 + b22) != ((r1-r2)2 + (g1-g2)2 + (b1-b2)2) However, you could generate a table mapping arbitrary source RGB values to the closest palette entry (which is what the “32K” style tables are). Then when calculating the translation table for a palette you use the source palette to get the RGB, and then use the table to get the destination palette entry. The only trouble is that generating such a table takes time, potentially more time than would be spent if the other algorithm was used (consider the case of a frequently changing palette). At the moment the “32K” style tables are just a flat [R][G][B] array, with between 4 and 6 bits of precision per component. This is generally OK, but will give sub-optimal results for some palettes (e.g. you’ll get a fair bit of banding in greyscale modes). And since it’s the non-standard palettes which we’re looking to optimise (colour lookup for standard palettes is now back at O(1)), it would probably be best to find a better table structure to use, e.g. something like an octree which will be able to adjust itself to exactly the right level of accuracy that’s needed. Octrees are commonly used (citation needed) for colour quantization (e.g. generating a custom palette for a true colour image), I see no reason why they couldn’t also be used for pre-existing palettes.
Yes, those are weighting values (PRM 3-339). Surprisingly the APIs to control the values are all “for internal use only”, which makes me wonder if anything actually modifies them – if the weightings were fixed then it would allow for extra optimisations. It’s also worth pointing out that different algorithms appear to use different weighting methods:
So there’s definitely room for improvement by settling on a standard error weighting metric.
The best way of dealing with true colour modes (and palettised modes, really) would be to make better use of GPU acceleration. At 1920×1200×32bpp it takes my Iyonix 50cs to redraw an empty screen. Remove the backdrop and the time drops to 4cs. The pinboard creates a cached version of the backdrop sprite for the current screen mode, so when it’s drawn all that happens is that it uses the kernel OS_SpriteOp that uses a (somewhat) optimised block copy. So all the time is being spent waiting for the memory bus (in particular the slow VRAM writes over the PCI bus). If there was a mechanism for creating and rendering GPU-optimised sprites then the pinboard, Wimp sprite pool, font rendering, etc. could all be changed to use that and the CPU would rarely have to touch the screen itself. In fact, I should probably add this to the todo list since it’s a pretty major bottleneck, and should be relatively easy to implement once screen memory handling is revamped. Other improvements could come from screen memory caching, or an implementation of ROL’s tiled OS_SpriteOp. Even without GPU accelerated sprites the tiled OS_SpriteOp could be a big help – the implementation could just draw the sprite once and then use the existing rectangle copy GraphicsV operation to copy it to all the other tiles. Or for situations where that isn’t possible (masks, alpha blending, etc.) a custom plotter could be used which tries to ensure the sprite is kept in the CPU cache (i.e. draw the first row to all the tiles, then draw the second row to all the tiles, then the third, etc. so that the current sprite row stays in the cache). |
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13