GSRead has never been threadsafe
Pages: 1 2
nemo (145) 2546 posts |
How have I never been bitten by this? OS_GSRead must not be used from user-mode in a TaskWindow or other pre-emptor. When expanding SysVars during GSRead the Kernel uses a stack of expansion buffers to capture the SysVar value(s) and previous state. However, the Kernel never got the memo about there being multiple threads since 1988. So as it treats these buffers as a simple stack, callers will get the wrong pointer restored if any GSRead expansion is interrupted by any kind of context switch. Trivial demo inadvertently swapping contexts by context switching during SysVar expansion: This can be solved by storing all context within the expansion buffer and employing the buffer number field of the R2 state, and hence having in extremis multiple linked lists of state unambiguously pointed to by each thread’s R2. Care will have to be taken with correlating state with buffer in the case of buffer reuse due to exhaustion. Thank heavens CLI is the heaviest user of GST. |
nemo (145) 2546 posts |
BTW Until RO4, the GST swis used the scratch space, so use inside a TaskWindow throughout RO3 would have been highly precarious. The RO3.5 rework is a bit odd but persisted through the move to SystemHeap, and may have been an attempt to allow some degree of simultaneity. But GSInit is either mysterious or misguided in its manipulation of the stack depth… did it mean to increment? Odd. |
Jon Abbott (1421) 2651 posts |
In the thousands of Absolute’s I’ve debugged over the years, I don’t think I’ve seen the GS SWI used once, so the chance of the issue occurring and being reported was probably low in the first place. To make it truly thread safe, OS_GSInit should require a working buffer, to avoid OS internal buffer exhaustion. The same goes for any other SWI that require transient buffers that aren’t manually removed once finished with. Scratch space should probably be avoided by the OS – or its use should be via OS_Heap so it’s semi-thread-safe, personally I’d not use it in the OS and deprecate or hand it over to the application and task-switch it. There was some discussion around deprecate it a few years back. Links to the OS code you’re referring too might help with regard to the potential stack depth issue you’ve mentioned. |
nemo (145) 2546 posts |
Yes, most GST use is by the Kernel which isn’t in USR at the time, so problems are somewhat averted. But GSInit/GSRead are extremely vulnerable. I view this as worse than the Scratch space cannot be deprecated it’s a defined part of the API… though with the rather laden requirement that you “must know what you’re doing”. The problem isn’t that it exists but that it hasn’t been adequately documented. I often wonder what problem people think they’re solving when they invent new incompatibilities like that. GSInit doesn’t need a buffer as such (though it may allocate a stack level), but GSRead needs buffers for expanding numeric SysVars. These are small. GSInit can’t tell the difference between <7> and <G> so is pessimistic about this. I’m reimplementing the GSTs with all of the above in mind. Wish me luck. |
Rick Murray (539) 13840 posts |
It might have come about early in Arthur, as my v0.30 ROM allocates the memory in 32K chunks (quite unlike RISC OS). So way back when, they needed some OS workspace. Page zero was at &0, the app was at &8000, and somewhere in between was a nice wodge of memory with no defined purpose, the perfect place to dump “temporary internal stuff”. Thirty six years later, you find a bug in a rarely used system call that was probably conceived before multitasking, never mind TaskWindows. ;) To my mind, there’s quite a lot of things that have single process state. Colours and cursor position, current font, etc etc. |
nemo (145) 2546 posts |
As it happens, Templates could easily have been held open over Wimp_Poll – it’d only be another byte per task and there’s loads free in the slots. They just decided ‘Nah, not necessary’.
TaskWindow is bad at VDU stuff though. TaskWindow is bad. |
Rick Murray (539) 13840 posts |
Gerph’s ramble on TaskWindows is enough to put the fear of Cthulhu into a person… |
David J. Ruck (33) 1635 posts |
It doesns’t really matter if the taskwindow context switches when the task is mid VDU sequence, as the VDU stream is sent to and acted upon by managing WIMP application. If it does handle multi byte VDU sequences (most only do plain text, so ignore them) then it should buffer until it receives a whole sequence. I didn’t have any problems with how the taskwindow module delivered VDU to !GraphTask, it was the amount of code meed to simulate VDU operations, to make writing to a sprite look the same was as if the task was running outside the desktop. |
Jon Abbott (1421) 2651 posts |
Good luck! Are you going to feed your fix back into the OS source? Off topic, but whilst reading through your BASIC changes the other week I noticed you’d fixed a bug I reported last year where LOCAL/ERROR corrupt variables. Could that fix be fed back to the OS source? |
Simon Willcocks (1499) 513 posts | |
nemo (145) 2546 posts |
druck declared
I haven’t tried this in GraphTask, but I can guess:
And that’s not mentioning the signed comparison – Jon asked
Astonishingly, no. Let me explain. There are loads of new features in my Basic, and they’re all bracketed by build switches partly to enable debugging (and unintended interactions) but mainly so they could be easily separated and ported. Almost all of my changes could be fed back. However, the only way to fix the So this is one of the few changes that can’t be unilaterally applied to Basic V, as it could cause tight-fitting programs to run out of memory. This is why my version requires |
nemo (145) 2546 posts |
Simon tersed
Dissuading engineers from “fixing it with a new one” is Project Management 101. Suggesting rewriting RISC OS from the documented API is as near to gibbering insanity as makes no odds. The documentation is almost always wrong. And anyone who requires GST* to be rewritten in C in order to maintain it is certainly going to break it in the process. I. Can. Guarantee. That. |
Simon Willcocks (1499) 513 posts |
@nemo This thread shows that it already is broken. My idea is to implement GSTrans as an atomic operation, and have the result stored in temporary (task-specific) memory if it is to be read with GSInit/GSRead. That eliminates the multi-threading problem altogether, and the memory can be freed when the task exits, even if it does GSInit and stops reading before the end of the string. I may be insane, but I’m not always gibbering (although some of the OS APIs do bring it on). GSTrans seems to be capable of reading a code variable’s code just once when reading a code variable in angle brackets, but my code that asks for the length, then the value calls it twice (iirc). |
Rick Murray (539) 13840 posts |
Oh, now that’s evil. Command line: prints “DoYouSee?”. Zap and regular TaskWindow: prints nothing, no obvious side effects. GraphTask (v4.03): outputs a blank line, the ‘D’ character definition is corrupt. |
David J. Ruck (33) 1635 posts |
That’s mixing a VDU sequence which is sent as stream to the TaskWindow handler, with a SWI which is acted upon immediately, which wont work with anything based on TaskWindows. It would work if I ever get around to the GraphTask Mk2, which instead redirecting output to a spite when the GraphTask application is active and replaying the VDU stream from the task, would redirect output when the TaskWindow task is actually running. Then any SWIs such as that one, and along with SWIs which produce graphical output e.g. Sprite and Draw would then also write in to the sprite instead of the screen. It would also need a lot a SWIs intercepted such as mode changes so they affect the sprite and not the desktop. |
nemo (145) 2546 posts |
Sorry Druck, but that goes wrong because TaskWindow fails to emulate the VDU well enough. It ought to sit on the VDU queue until complete, allowing it to be cancelled if required. I won’t mention PLOT212 et al and the impossibility of knowing what a VDU sequence is and how long it might be, because despite appearances I’m not cruel. It also doesn’t simulate cursor position, so POS and VPOS in Basic are useless, and so on. Yes, a sprite with a save area would improve things, and though one ought to be able to bracket such tasks in Pre/PostFilters, be aware that TaskWindow’s messed-up CallBack handling (certainly in some versions) can result in a hosted code’s CallBack not occurring until another task is paged in. Sigh. |
jgharston (7770) 14 posts |
OS_GSTrans is an atomic operatio, it’s doing OS_GSInit/OS_GSRead that’s not atomic. Off the top of my head, all the RISC OS code I’ve written that translates strings uses OS_GSTrans to get the whole string in one go, then works with that locally. Eg: SYS “OS_GSTrans”,cmd$ TO str$:PRINT str$; to do a GSEcho. |
nemo (145) 2546 posts |
Safe. It’s GSRead from USR mode that was always broken. I’m testing a replacement module. |
David J. Ruck (33) 1635 posts |
Graphic Task Windows simulate the cursor position, you can even do cursor copying. It doesn’t support POS and VPOS though, I could have done in the same way as I make the corrected mouse co-ordinates available to the a task, but perhaps I didn’t find enough programs to make use of it to make it worthwhile.
When RISC OS 3 introduced filters, I planned to use that, but never had the time – nothing has changed since. But hopefully the CallBack stuff will be sorted out by the time I retire and get around to looking at this again. |
nemo (145) 2546 posts |
LOL. <thumbs up emoji> |
Simon Willcocks (1499) 513 posts |
Not exactly, because it enables interrupts. I’m trying to get multi-threading and multi-processing working, and that needs to lock the system variables (a shared resource) while one is read or written. |
nemo (145) 2546 posts |
GSRead retains pointers into SysVar memory – and there can be many such pointers even for a single string. I’m going to go out on a limb and say you can’t multicore the Kernel. There’s simply too much module code that knows that it can twiddle MosVars, VDU workspace and god knows what else secure in the knowledge that it’s safely in SVC and/or (depending on context) IRQs disabled. Another core would kill all that. I wish you luck. I think it’s the wrong model for RO. Actually I’ll go further. You can’t modify SysVars while something is iterating over SysVars in the same way as The problem of modifiable (threadsafe) iterators is pretty well understood (and I’ve implemented the model elsewhere), but it’s much too late for RISC OS. |
Simon Willcocks (1499) 513 posts |
Strictly speaking it’s the GSRead implementation that “retains pointers into SysVar memory”, it’s not required by the interface. I’d GSTrans the whole string, then let the legacy application read through the output, freeing the output string when it had all been read (or the application completes). I’m thinking more of new applications being able to use the more advanced features and avoiding those sorts of assumptions. Things seem to work pretty well with TaskWindows despite the decades old bug described here, and locking system variables while a legacy application holds the Wimp might work. It is a bloody nightmare, though, and my enthusiasm comes and goes. |
Charles Ferguson (8243) 427 posts |
Of course not :-) OS_GSInit/OS_GSRead are attrocious interfaces, and you should really only be using OS_GSTrans. However, it is only the implementation which is unsafe, not the design, which is pretty much what others have said. The implementation within RISC OS Pyromaniac is safe (even allowing for code variables which perform other operations), I believe. Instead of OS_GSTrans being implemented in terms of OS_GSInit and OS_GSRead, it is the other way around. OS_GSInit and OS_GSRead are implemented in terms of OS_GSTrans calls. This is hugely inefficient and potentially produces odd results if you read a dynamic value like `Sys$Time`, but it is however safe. It doesn’t need to give you correct results in the face of a changing system (and a poor interface) but does need to be safe. Full implementation of OS_GSInit and OS_GSRead within RISC OS Pyromaniac follows: @handlers.swi.register(swis.OS_GSInit) def swi_OS_GSInit(ro, swin, regs): """ OS_GSInit - Initialises registers for use by OS_GSRead => R0 = pointer to string, terminated by ASCII 10 (LF) or 13 (CR) or 0 (NUL) R2 = flags <= R0 = value to pass back in to OS_GSRead R1 = first non-blank character R2 = value to pass back in to OS_GSRead """ data = ro.memory[regs[0]].string_ctrl flags = regs[2] & 0xE0000000 # Leave the flags with a lower part = 0 (index to the characters) regs[2] = flags # FIXME: This isn't actually the first non-blank character regs[1] = ord(data[0]) @handlers.swi.register(swis.OS_GSRead) def swi_OS_GSRead(ro, swin, regs): """ OS_GSRead - Returns a character from a string which has been initialised by OS_GSInit => R0 from last OS_GSRead/OS_GSInit R2 from last OS_GSRead/OS_GSInit <= R0 updated for next call to OS_GSRead R1 = next translated character R2 updated for next call to OS_GSRead C flag is set if end of string reached """ data = ro.memory[regs[0]].string_ctrl flags = regs[2] & 0xE0000000 offset = regs[2] & 0x1FFFFFFF use_escapes = not (flags & (1<<30)) terminate_on_space = (flags & (1<<29)) quotes_special = not (flags & (1<<31)) # Note: This is a very inefficient implementation - we expand the string every time the # OS_GSRead is called. It's possible that we could cache the result and return a # character from it (which is I think what the original implementation does. (consumed_index, output) = ro.kernel.gstrans.translate(data, use_escapes=use_escapes, terminate_on_space=terminate_on_space, quotes_special=quotes_special) if offset == len(output): regs.cpsr_c = True # Classic RISC OS appears to terminate the string with a CR, even though the character is not part # of the string, and it's not documented. FileSwitch requires this. regs[1] = 13 else: regs[1] = ord(output[offset]) regs[2] = flags | (offset + 1) regs.cpsr_c = False Of course, now that I’m thinking about it, I believe there should probably be some warnings raised if the GSInit/GSRead calls are made, or even a configuration option to make them just return errors so that the use of these operations are located. Maybe I’ll do that at some later point. |
nemo (145) 2546 posts |
Simon suggested
You do know if it is all read, you don’t know when the application completes – it is entitled to call OS_GSRead once and then never again if it wants. This is a bad API. My redesign uses the same concept of 128 concurrent contexts but allocates a new one on demand, frees them when they’re known to be finished and, crucially, reuses them once all have been exhausted (i.e. are either in use or have been abandoned). Reuse is detected so that it is not (reasonably) possible for two threads to be using the same context (the earlier one will truncate if it is reused). These contexts could have allocated buffers associated with them to avoid holding pointers into SysVar memory (they already have small buffers for the integer and Unicode cases), and though I’d like to keep the pointers as they are (it’s only a page of memory in total!) it’s very difficult to justify… <sigh>… I can see where this is going. I’m allocating the “GeneralString” module name and I’ll release it soon1. 1 nemo soon |
Pages: 1 2