RISC OS on the RPi4, RPi 4
Clive Semmens (2335) 3276 posts |
It’s all fixed values once its populated, but calculating each value in it takes time, so you don’t want to have to recalculate them every time you want them. Once you run off the end of the precalculated values, everything slows down rather a lot. You can get an awful lot further with a Pi 3B than you could with a RiscPC, and you could get further still with a Pi4, especially if you could address 4GB (minus what has to share the same wimpslot) of RAM…
I think that’s easily avoided as long as you’re conscious of what you’re doing. It certainly is in assembler, I’ve not pushed at it in BASIC. |
David J. Ruck (33) 1636 posts |
Wimpslots are definitely paged. Unpaged Dynamic Area’s make having large wimpslots, and hence being able to make real use of 8GB, more difficult. A large part of the 32 bit address space has to be permanently reserved for DA’s and the total size of DA’s can never exceed this. Where as paged wimpslots also have a fixed allocation of 32 bit address space, the total size of applications can exceed 4GB, as they aren’t all paged in at once. However, I suspect that paging huge wimplots when you are making use of >4GB, made worse by not being able to share common pages, will prove to be painful. |
Clive Semmens (2335) 3276 posts |
Imagine having several apps running, that you’d like to keep resident but that don’t need to be active while the BIG task is running in BASIC (with possibly some assembler). The big task doesn’t poll the WIMP at all (or possibly infrequently) while it’s doing its heavy lifting. It is of course entirely within its 4GB (or if need be, 1.5GB, which is still a lot better than I can do now) address space. In that situation, would there be any paging activity, or would all the page translating be static? Would there be any way to minimize the paging overhead? |
Clive Semmens (2335) 3276 posts |
What would be nice would be if (say) the bottom 4GB was used by RISCOS just as though it was a 4GB machine, and the top 4GB (say) was all statically allocated to BASIC, so no address translation was needed at all for a BASIC application…but if I’m the only person who’d like such a thing there’s no reason why it should happen; I certainly don’t have the skills to do it myself 8~( |
David Pitt (3386) 1248 posts |
The view from an 8GB RPi4. The news is not entirely good, Iris and gcc10 can fail with a “process not known to ARMEABISupport” if started after large TaskWindows have been created. This does not happen on the 4GB Rpi400. It would perhaps be unreasonable to jump to conclusions at this point, this is with unreleased 8GB, unreleased big slots, unreleased gcc10 and Iris still under development. |
Stuart Swales (8827) 1357 posts |
If the configuration is never tested, the bugs will never be found ;-) |
Julie Stamp (8365) 474 posts |
This is because ARMEABISupport uses physical addresses for wimpslot pages. |
Stuart Swales (8827) 1357 posts |
This is surely crazy? Consider the case on an older machine where you’ve started off in a low-res, low-bpp screen mode. Wimp tasks start. User changes to a high-res, high(er)-bpp screen mode, which requires the OS to reallocate physical pages (swapping page contents) so that it can assemble a contiguous set for video DMA to use. Physical pages owned by each wimp task will be different to those when they started. |
Jeffrey Lee (213) 6048 posts |
The fix for that will be to use the ARMv6+ dual translation table pointers, which will allow us to swap out the lower part of the address space in one quick & easy operation, with minimal cache/TLB maintenance.
Yes. There’s been some discussion about how to fix it. |
David J. Ruck (33) 1636 posts |
@Clive
This can’t happen on a 32 bit machine RISC OS machine. In the 32 bits / 4GB that the processor can address you need to fit in the wimp slot, module area, screen, dynamic areas, I/O (and other stuff). Just like with other OS’s, applications are limited to a part of this 32 bit address space, say 2GB. Having an 8GB machine allows you to use up to 8GB of memory (minus some memory used for other things) for all applications by paging them in one at a time, but no single one can be larger than 2GB. To get 4GB or more for a single application requires a 64bit OS, hence that’s why the world and his dog have left 32 bits behind. |
Colin Ferris (399) 1818 posts |
How come you can’t switch from 32bit mode to 64bit mode with a ARM processor? |
Stuart Swales (8827) 1357 posts |
It’s a different instruction set for a start, Colin! |
Clive Semmens (2335) 3276 posts |
I get that none of them can be quite 4GB, because of all the OS stuff that has to be in the same address space as whatever app is running – but given that the whole OS is nowhere near 2GB, I don’t understand why the app can’t be well over 2GB, indeed (since the OS is well under 1GB) why it can’t be a bit over 3GB. I’m not doubting you, but I would like to understand. |
Stuart Swales (8827) 1357 posts |
@Clive There may well be things that break if you stick the RMA above 2Gb to try to get a large contiguous wimp slot. Which we ought to try to fix. For your ultra-large memory use case – could you allocate a set of dynamic areas and use an indirection table akin to a virt-to-phys translation table? |
Rick Murray (539) 13851 posts |
And now we have a solid technical reason why support for the older processors should look to coming to an end.
Not only is it a rather different “processor” (instruction set, registers, behaviour…), it is also not permitted for the same sort of reasons as why you can’t switch to SVC mode from USR mode.
It’s only recently that such memory sizes were even a thing, I mean, not so long ago it was like “whoa, the Pi has a whole gigabyte” and now it’s “four isn’t enough!”. ;-) |
Colin Ferris (399) 1818 posts |
How does swi OS_EnterOS do the change to SVC mode? |
Steve Fryatt (216) 2105 posts |
Don’t forget that in addition to the memory actually used, you must have space in the memory map for the memory that might be used. The classic example is Dynamic Areas: if your DA needs to be able to grow to 512MB, then you “must” allocate 512MB of the memory map to it, even if you’ve only allocated (and paged in) 4MB of actual memory for now. This is why “unlimited” DAs have been bad since the days of the RiscPC, and why there are many third-party bodges to work around anti-social applications which didn’t set a maximum size (and which therefore took the RAM size from the map, which was fine on a 4MB RPC but a little less so on a 128MB one). Since the issue affects every bit of memory that can be allocated, the 4GB map soon fills up with “might be needed” space, even if everything is well-behaved and only reserves the minimum that it can get away with. |
Stuart Swales (8827) 1357 posts |
@Colin – think about what the SWI instruction does! @SteveF – Clive (or A.N.Other helpful person) might be able to brew a custom ROM tailed for his application with minimal modules to reduce initial DA use to maximise available virtual memory address space. |
Clive Semmens (2335) 3276 posts |
Needless to say (?) that’s what I’m doing at the moment. How far I can get will always be limited, but the faster the processor and the slicker the programming, the further I’ll be able to get – and how fast it is depends strongly on how much memory I can access, and how slickly I can access it. But it’s all just for fun, it’s not a big deal at all. No-one should worry too much about it! I’m not mining bitcoins or anything…although some bar steward might repurpose what I’m doing to that end, so perhaps I should give up. But if I was really serious about this I’d probably be getting a new Mac and learning v8 assembler. |
Rick Murray (539) 13851 posts |
Can’t help but think that mining bitcoin in BASIC would be the very definition of futility. |
Clive Semmens (2335) 3276 posts |
8~) Indeed, absolutely. It wouldn’t be repurposing the actual programs, but the theoretical work that I’m playing with. Even for the explorations I’m playing with, BASIC certainly wouldn’t be the first choice, but it’s the easy language for me. In the early days of ARM machines, ARM assembler was a perfectly sensible choice for these games, but the world has moved on by several orders of magnitude since then. |
Julie Stamp (8365) 474 posts |
In case it helps, you can use this to see how the 4GB of logical address space are laid out on your machine, though it doesn’t include all the IO regions. |
Clive Semmens (2335) 3276 posts |
I should write up something about the games I’ve been playing. I’ll put it on my website when I’ve written it up, and put a link here. Suffice it to say for the moment that it’s something to do with number theory, prime numbers, and factorizing products of VERY large primes. The Pi certainly wouldn’t be the ideal machine for actually doing that factorizing, but it’s fine for playing with the ideas – although even for that, a more capable machine would be better. But I don’t have one, and it doesn’t matter enough to me to care. |
Rick Murray (539) 13851 posts |
Oh, I know. The reason I wrote “theoretically” is because of lazy paging – where switching between two 28MB applications doesn’t mean endlessly paging in and out entire 28MB slots.
Ah, the infamous “max size is -1”. That was a bad idea right from the start. I’m not quite sure what a better idea would be (maybe, like the RMA, to have used some sort of indirected pointers so the thing could be compacted without breaking everything?), but it just seems like an idea to solve today’s problem without thinking what may happen down the line.
Yup. Here’s Linux-on-ARM: https://www.kernel.org/doc/html/latest/arm/memory.html It was okay on a 64MB MEMC system when the most actual memory that could be installed was 16MB. For this, there was 32MB of logical memory, 16MB of physical memory, plus the ROMs (read) and MEMC (write), and some space for I/O and VIDC. Yes, you had logical and physical visible at the same time, I play with that here: https://heyrick.eu/blog/index.php?diary=20201004 There wasn’t any issue with the Beagle or any of the original Pi machines (256MB-1GB). Allocating memory assignments was easy when you had lots of address space and not so much actual memory. But once your actual memory grows to be more than about half your address space, it gets a bit harder to shuffle. And once your actual memory gets to be the size of your address space (or larger), then there’s a problem. Easy fix! Let’s make the processor 64 bit! That way there’s space for… <twiddles thumbs> …a really big lot of memory. Epic big. Massive big. Biggest ever. Ever ever. ;-)
I’d be interested in seeing what sort of things you’re doing with memory… though, it would possibly help if you wrote the mathsy bit aimed at a five year old, then I’ll be able to understand it. :-)
Oh, I don’t know. Running in assembler (or compiled) with VFP and/or NEON, it might be able to chunder a reasonable amount of data. Do you have ABC? I wonder if compiling the program might give any useful speed increase?
Seems to be the way politicians are behaving these days. :-/ |
Clive Semmens (2335) 3276 posts |
Back in RiscPC days I wrote the heavily-used loops in assembler. I might rewrite them in 32-bit assembler – I’ve not actually rewritten them in BASIC yet. So much less time is spent in the BASIC that calls the heavily-used loops than in those loops that there’s little point compiling them. In v7 my assembler is probably at least as good as ABC would do – especially since it’s all integer (and >32-bit integer, not stuff NEON might help with), no floating point at all. Don’t (yet?) know enough about v8 to know whether the same would be true there, but obviously there isn’t a BASIC compiler for v8 (yet?) anyway. |