Proposed GraphicsV enhancements
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13
nemo (145) 2546 posts |
Rick said:
Actually, packages on other system’s don’t tend to get it right, even now. For example, Photoshop famously gets it wrong when resizing images. No, the holy grail you refer to is mostly achieved by ICC colour profiling when converting from medium to medium, not by correct gamma handling when processing. By “expensive” I mean that for most operations you spend more time doing gamma conversions than doing the actual processing, and with only 8bits per component any such repeated processing necessarily introduces quantisation and noise. Sadly, working at 1.0 gamma with only 8bits per component also introduces banding, but only a fixed amount, not increasing with every additional processing step. Ideally one doesn’t use linear mapping unless you have floating point components… or, in this case, as needs must.
Your gamma is wrong.
I’ve produced hundreds of colour magazines and all kind of printed material via this workflow since 1994. |
nemo (145) 2546 posts |
Jeffrey announced:
One would only have to define the lowest three bits as “010”, the fourth can be reserved for something within that format (b2 selecting yet-unthought-of-mode-specifier format). OK worked it out now, it’s sprite types 15 (and 31 under the 3.5 definition). Pity, I was using that for multicomponent sprites. I’ll have to get a sprite type allocation. Don’t think I can do that with !Allocate. Once upon a time one could just email Alan in an airport somewhere and hey presto. |
Sprow (202) 1158 posts |
I’m not sure the “top bit set if control list item is new” idea is wise. There may have been some stick misgrabbage. However, your DPMS example highlights that the informational bit can’t easily be retrospectively applied to preexisting control list items. Are the changes big enough to merit a VIDC type 4 list instead? Or some flags being shoehorned into (for example) log2bpp such that the list would be otherwise rejected on things that didn’t know about the extensions? Actually the bits 4-31 look free in the (interlace) flags word.
I know ScrModes generates VIDC lists (and uses the DPMS state from the MDF in the control list), but doesn’t look to parse incoming lists fortunately. |
Jeffrey Lee (213) 6048 posts |
I did consider that at one point, but the effort involved in doing it in a backwards-compatible way (e.g. making sure code can still parse type 3 lists, and/or making the OS fall back to using type 3 lists in situations where a type 4 list isn’t necessary) is almost certainly greater than the effort involved in making control list parsing sensible. Both Geminus and Aemulor intercept GraphicsV in order to do various things; it would be nice to make sure existing copies continue to function, even if having them running might prevent users from using any new features the new GraphicsV provides. And if we did want a VIDC type 4 list, we’d probably want to drop the pixel format information from it entirely – because in the brave new world of an overlay-aware GraphicsV, the mode timings and the desktop overlay parameters (pixel format, width/height, Z layer, etc.) would be two completely seperate entities. They might still need to be specified together in a vet mode call for the sake of checking memory bandwidth constraints, but from an API point of view we’d want to make sure the desktop and mouse pointer uses the regular overlay API as much as possible.
Too ugly. Plus, I discovered to my horror that the NVidia driver currently doesn’t implement the vetmode call at all. This will change, but it’s something to be aware of when considering how old drivers may react to an updated list format. I’ve sent Adrian a few questions about what kinds of API changes we can safely make without breaking Geminus/Aemulor, so once I’ve heard back from him we should have a clearer idea about how to handle bigger subjects like type 4 VIDC lists or redoing vetmode so that drivers can say which bits are or aren’t understood. I’m happy to forget my “top bit set if can be ignored” idea until that time. For now I was mainly interested in making sure people wouldn’t grab their pitch forks at the thought of a couple of new control list items, so that we have a solution available if we decide we don’t want to go down the route of an all-new type 4 VIDC list. |
Sprow (202) 1158 posts |
I can imagine that would be the case. The kernel’s monitor type/mode vet/mode set stuff is already pretty hairy.
Eek!
I don’t see that as a big issue myself, as (aside from Geminus perhaps) the GraphicsV and kernel are generally bound together by a ROM image. Therefore
The reason you’re needing to be careful is VIDC (type 3) lists are in the PRM and part of the broader API. Obsoleting bits of GraphicsV is much easier as there are at most 4 clients to update (unless someone’s secretly working on a VPod or ViewFinder one). |
Rick Murray (539) 13840 posts |
Probably. However I have intentionally not corrected my netbook – because it is “bog standard” and should represent more or less (note the vagueness) what every other domestic PC looks like. 1 1 Lowest common denominator, true. But it’s for the internet, often pictures I scale down for upload and paste a watermark upon, so I’m specifically aiming for “bog standard”. ;-)
Err – I was talking about 1.0 gamma. Such things exist because phosphors don’t react in a linear pattern to beam intensity. Our own eyes see different colours in different ways. Twice as much x works well digitally, but it doesn’t work like that in real life. This is where the gamma comes in, especially the fancy ones where you can set the curve for each primary separately. |
nemo (145) 2546 posts |
I’ve produced hundreds of colour magazines and all kind of printed material via a gamma 1.0 workflow since 1994. It’s unthinkable to produce professional quality artwork on RISC OS unless you have a linearly calibrated workflow.
Actually… no. That’s not why computers still use a 2.2 gamma. It’s a compromise between backwards compatibility (both at the software level and the monitor hardware level) and the acuity of the human eye. Specifically, it’s a pretty good solution to the problem of minimising quantisation banding across the luminance range. As we’re more sensitive to small deltas in a low volume signal, we need more dark shades and fewer bright shades. That is the reason that luma != luminance. Once you escape from 8bit quantisation that’s much less of an issue. |
Rick Murray (539) 13840 posts |
Mmm, that could explain the icky artefacts in XviDs – like something in dark or night has a lot of visual blotches that you don’t see in well lit scenes. Maybe we’re descended from cats? :-) |
Jeffrey Lee (213) 6048 posts |
Progress update: I’ve taught OMAPVideo, ScreenModes, and the display manager about all the new RGB pixel formats. This includes the two new VIDC list control list items, a new GraphicsV call for getting a list of supported pixel formats (basically returns a list of (NColour, ModeFlags, Log2BPP) tuples), and a new version of the mode provider structure that’s used by OS_ScreenMode 2/Service_EnumerateScreenModes (not to be confused with mode specifier blocks which are used when setting the screen mode). Basically I’ve replaced the single pixel depth word with another (NColour, ModeFlags, Log2BPP) tuple. In theory this change is backwards-compatible (I’ve incremented the version/format number), but it remains to be seen how much broken software is out there which doesn’t check the block headers properly. Worst case it’ll just display a garbage mode name and try selecting an invalid mode if the user tries to change to it. And if Nemo or anyone else has secretly created their own new mode provider format then now’s a good time to mention it :) For the display manager update, I’m currently trying to go down the KISS route (that’s keeping it simple, not using lots of makeup). The colour depth menu has two new options for 4K & 64K colours, everything else (e.g. deciding whether to use TBGR or TRGB ordering, or 32bpp vs. 24bpp packed) is hidden behind the scenes. When you select a mode it’ll try and pick the most compatible mode available; if you want to specifically use any of the others you’ll have to use the text box to enter a custom mode string. If anyone has any thoughts on ways we could handle the new pixel formats in the display manager, feel free to share them! I’ve also implemented support for ROL’s OS_ScreenMode 13-15. Compared to ROL’s version there are a couple of bits missing (teletext attributes, and setting the “greyscale palette” flag when switching output to/from sprites), but the implementation does support all the features needed for the new pixel formats (e.g. the “L” attribute for specifying RGB order & alpha/transparency). The display manager now unconditionally uses OS_ScreenMode 14 for generating its mode strings. *WimpMode will use OS_ScreenMode 13 if it’s available, otherwise it will fall back to its own code (which I haven’t bothered to teach about the new formats – the fallback code is only really there so that softloading the Wimp still works on old machines) A couple of notable bugs I’ve encountered:
The kernel bug probably isn’t that major (an old kernel is unlikely to receive a VIDC list containing the new NColours control list item), but the Wimp bug is worth bearing in mind should any third party apps try generating mode specifiers containing NColours (+ ModeFlags) values. So for compatibility it’s probably best to avoid specifying those values unless they differ from the defaults a 3.5-era kernel would generate automatically. In terms of actually teaching the OS about how to draw things properly using the new pixel formats, I’m part way there. Although I said I’d start off with the 16bpp 565 format, I realised that that’s a bit silly since it’s usually easier to update a component to work with all the new formats at once than to update everything one for one format and then go back through to add the others. So I’ve started work on getting all the 16bpp and 32bpp RGB formats working (with preliminary support for the alpha channel – i.e. it will just set it to the max value), and adding preliminary support for 24bpp packed. 24bpp packed is going to be a bigger job than a lot of the others because it’s the first to have a log2bpp value which isn’t actually the log2 of the bpp. I think I’ve updated the kernel’s text handling to deal with it, but my BB currently hangs without an error when trying to enter a 24bpp mode, so it’s probably going to remain broken for some time until I can be bothered to debug the hang. |
Sprow (202) 1158 posts |
You could follow the same recommendation as for when selecting sprites (only to use the new sprite type if no old mode number exists), so here if there was an ambiguity between 32bpp RGB versus 32bpp BGR then the older layout would win. Users wanting a specific one would need to enter a mode string, just the same as they do now for weird eigen factors.
I’m not sure I (personally) see much point in packed 24bpp, other than that the sprite type number got allocated some time in the early 90’s. |
Rick Murray (539) 13840 posts |
Makes sense. The HAL ought to have inside information on the capabilities of the video hardware (including differences between HDMI and composite outputs, in the case of the OMAP); so it makes sense to default to the combination that will just work, and if the user specifically wants something different, then they can do the mode string thing.
Does anything actually use this layout? |
Jeffrey Lee (213) 6048 posts |
24bpp packed is going to be a bigger job Fair point; (to answer Rick) the OMAP supports it, and I suspect the Pi does as well, but I don’t think there are any situations where 24bpp is available but 32bpp isn’t – so there’s no real reason anyone would 100% need to use it for the desktop or for any overlays. So I might back out some of the changes I’ve made with regards to non-power-of-2 pixel sizes and leave them on ice until we know if we’ll need them for something else. It’ll certainly save me from a lot of extra hacking around in the VDU driver! |
Jeffrey Lee (213) 6048 posts |
Mini-update: After a couple of evenings staring at ColourTrans, I’ve come up with a smaller and faster but functionally equivalent version of the algorithm that ColourTrans_SelectTable/GenerateTable uses to build 16bpp-to-palette lookup tables. On StrongARM & Iyonix it’s about 25% faster (down from 85cs to 69cs and 23cs to 18cs respectively), while on my BB-xM it’s about 85% faster (13cs to 7cs). The larger gain on the BB-xM compared to the StrongARM & Iyonix is most likely down to better instruction scheduling (more opportunities for dual issue). |
Jeffrey Lee (213) 6048 posts |
Scratch that: Now it’s 80% faster on StrongARM (and 43% faster on Iyonix) after optimising for smaller caches/slow memory. Unfortunately this makes it a bit slower on my BB-xM, and on anything without a cache. Doh! If there were a sensible way to get the cache size from the OS, I’d be tempted to include both versions of the code in the module, as it’ll only add around 1.5K for all the variants we’ll need. |
Colin (478) 2433 posts |
Do both the first time, time them and pick the fastest in future |
nemo (145) 2546 posts |
Do one the first time, the other the second time, and then thereafter whichever was faster. |
Rick Murray (539) 13840 posts |
Maybe the HAL needs a CPU information block, and an entry point in ReadSysInfo to get at it?
Means saving state, plus different requests might take slightly different durations… Why not do both at module init with a fake call of the exact same data, time them, pick the faster, then just go with that. |
Jeffrey Lee (213) 6048 posts |
Progress update: I’m now half-way through updating SpriteExtend to cope with the new screen modes. Scaled sprite & JPEG rendering is done, so the desktop is now 99% correct when in the new modes. However transformed sprite plotting is still left to do; I have a feeling I’ll be rewriting the existing assembler plotter generator in C so that it can share the extensive pixel format conversion code that the C-based scaled sprite/JPEG plotter generator uses. Once I’ve finished fixing all the issues with rendering to the new modes I’m planning on starting work on adding support for the new sprite types, which I’m hoping will be fairly straightforward as by that point most of the code for handling the new pixel formats will be in place. At some point (after adding overlay support) I may also add support for YUV sprite rendering – fast YUV sprite rendering support in the OS could make YUV sprites an attractive solution for any video player which can’t get hold of a YUV hardware overlay. I’ve also run into a bit of an issue for which I can’t decide on an elegant solution. I’ve extended ColourTrans_SelectTable/GenerateTable so that it can now return 4K and 64K tables for mapping 4K/64K colours to palettes. It’ll also use the right RGB order for the table indices. However, I need to implement a way for the user of the table to determine what format the table is: At the moment when you ask ColourTrans to map a 32K/16M mode to a palette, instead of returning the 32K table directly it actually returns a 12 byte structure. The first and last word contain the magic string “32K.”, while the middle word is the actual table pointer. For 4K and 64K modes the obvious solution would be to replace the magic string with “4K.” (or “4K..”?) and “64K.”. But then how would the RGB order be indicated? Replace the last word with a string of “RGB.” or “BGR.”? Although that would work, it will mean that code won’t be able to do a quick sanity check of the table by comparing the first and last word. So maybe replace the dot with something else? I also don’t like the idea of using lots of strings – a header word containing flags will be quicker for programs to parse. Also I could make it so that the new format tables are only returned if a new flag is passed to SelectTable/GenerateTable, but then existing code which generates a table and passes it straight through to something else (e.g. OS_SpriteOp) wouldn’t be able to automatically take advantage of the new format. Anyone have any thoughts? Also, for the table generation optimisations, I’ve currently gone with a cop-out approach of fixing the choice of algorithm (large cache vs. small cache) at compile time based around the target CPU architecture. This shouldn’t be much of an issue, since ROOL currently don’t provide ColourTrans in softload form. |
Colin (478) 2433 posts |
I’d go with replacing the ‘.’ at the end with a code eg 4K.R 32K. ‘.’ means BGR so the current format is unchanged, ‘R’ RGB etc. Can’t the first and last word be the same? Re optimisations. Why fix it at compile time when timing it means you don’t have to worry about it for different architectures when porting. |
Rick Murray (539) 13840 posts |
Why are we wasting 3 bytes on a format specifier? This isn’t supposed to be human readable so wouldn’t a bitfield be better? Not to mention quicker to parse… |
Sprow (202) 1158 posts |
Ha ha! Now you admitted at least 1 more person round here knows how it works…
I found SprTrans to be one of the more clear parts of SpriteExtend, adding a few more input formatters should be relatively simple – trnslp_readpx, or now I look at it again, some different colour mungers (since the read/mask/store already copes with 1/2/4/8/16/32bpp).
I’d always understood those guard words to be just that: they are just to denote that the thing passed is actually a pointer to the actual conversion table, but that the format of the conversion table is implied by the sprite being plotted. “32K.” does like a poor choice of guard word now you want to pass other size tables, but that’s only because you printed it out and saw it said 32K! If you particularly want to change the guard words then you could retrospectively define some of the 4 set bits in the ASCII ‘.’ to mean something then use the flags in a different way for other table sizes, but I think just treating them as equal guard words is probably good enough.
That sounds sensible. If a disc loading version were produced it would select the lowest common denominator (most generic ARM code) anyway, which from the sounds of your earlier post is faster than it was before, just not as fast as it would be in a tailored ROM. Perfect. |
nemo (145) 2546 posts |
b0&1 of the pointer (to the trampoline) |
Jeffrey Lee (213) 6048 posts |
If we’re sticking with the ASCII approach then that sounds like the best way of doing things to me. The “R” could stand for “reverse”, i.e. reverse RGB order compared to traditional VIDC modes. Initially I was thinking of doing something obscure with the ‘.’ (e.g. using an apostrophe to indicate the reverse order, or a set of flags as suggested by Sprow), but keeping it human-readable would be in keeping with whatever Acorn were thinking when they decided on the content of the guard words.
Timing is a bit fiddly; on a BB-xM there’s only about 1cs difference between the two algorithms, on a Pandaboard and any future architectures I’d expect the difference to be even less. So we’d need a higher resolution timer than the standard centisecond one. Although we have the HAL timer API, it currently isn’t regulated, so if ColourTrans were to use it it could unknowingly be messing with a timer which is already in use by something else. Plus it’s not portable for if someone wanted to softload on something without the RISC OS 5 HAL.
Dunno – ask Acorn! I think the main problem is that the original ColourTrans API didn’t have a way to allow ColourTrans to indicate the type of the returned table. When you pass a table to OS_SpriteOp 50/52 you don’t tell it the size of the table, so they needed to choose a guard word pattern which was unlikely to be mistaken for valid data, and the “table” size had to be small enough that you weren’t likely to get data aborts from code looking for a guard word which doesn’t exist because the call had been passed a 2/4bpp palette translation table and not a 32K one. So without changing a bunch of APIs, I think we’re restricted to using obscure guard word values which aren’t likely to be seen in valid data. Perhaps we should replace the “32K.” guard words with “NEW!”. The pointer in the middle will still point at the lookup table, so that naughty software which doesn’t check the guard words and just assumes it’s been given a 32K-style table will stand a chance of still working. But just before the table we can place a machine-readable header describing the table format.
Unfortunately it’s not just “a few” that need adding – by my count it’ll be somewhere around 150 by the time you take into account all the different input/output combinations. Apart from a few hard-coded cases, PutScaled handles this using a general-purpose routine which generates code which pulls apart the source pixel one component at a time and reassembles it in the correct form, with an optimisation or two to try and avoid redundant instructions. I’d rather not rewrite that code in assembler, or list out all the possible permutations! (even if I cheated and had PutScaled generate the assembler source for me)
That won’t work – the memory used for the 12 byte trampoline is allocated by the user and passed in to ColourTrans. |
Rick Murray (539) 13840 posts |
Makes sense… . – traditional style
1cs over how many calls? If a low number, 1cs could add up with intensive stuff. If a high number, then I’m inclined to say pffft!
But don’t forget, broken software is broken software. Perhaps the acid test is to make a build without, and see if any of the biggies (PhotoDesk, Paint, DTP etc) fall over? By the way, does this mean I’ll soon be able to use RISC OS in a high colour mode on the Beagle with s-video output? :-) :-) |
Jeffrey Lee (213) 6048 posts |
1cs over one call. Of course, ColourTrans caches the 32K-style tables, so unless you’re dealing with lots of different palettes the differences will be negligible in practice. But on the other end of the spectrum, an ARM2 takes around 20s to generate a 32K table, and there’s several seconds difference between the two algorithms (although those timings were taken under emulation, so should be taken with a pinch of salt).
Yes, for an appropriate definition of “soon”. I’m a bit hesitant to check anything in until I’ve settled on final APIs for a few things, but if things start to drag on for too long then I’ll have to check in at some point just to avoid dealing with big merges caused by other peoples changes. |
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13