C Kernel
Stuart Swales (8827) 1357 posts |
There are fundamental problems with using either the current AnsiLib or Stubs to link transient utilities and it does state in the Desktop Tools manual that you shouldn’t be using the C library. Neither is suitable as it stands even for utilities with no arguments as they use application memory for the stack and don’t exit correctly – see the PRM for the environment that transient utilities must work in. Both AnsiLib and the SharedCLibary also use software stack checking – that’d need a separate stack block setting up as the transient workspace is tiny. |
Simon Willcocks (1499) 513 posts |
Again, there’s no need for any C library, shared or otherwise, in a kernel. Each core can allocate a stack and call a C routine in about a dozen assembler instructions. If you avoid recursion, you can check that it won’t all be used. While we’re at it, there’s no need to copy the kernel from where it’s loaded, just enable the MMU with the area including the OS mapped to wherever it already is, and near the top of memory (0xfc000000), add the difference to the PC (and update the stack pointer, and never return from the routine that does it), then remove the temporary copy. (I assume no bootloader loads the “ROM” overlapping the virtual area, but that would simply require an intermediate mapping to, say, 0×80000000.) I’d also take the opportunity to strip the HAL down to the bare minimum; the kernel can run with just a CPU, memory, and an MMU. Most drivers and applications require a timer, and interrupt management. The Wimp requires a frame buffer, etc.. The boot sequence requires NV/CMOS RAM equivalent, which can easily be reserved space on a disc or equivalent, and maybe give the HAL a chance to choose the boot sequence type (like shift-boot, etc.). (Is there documentation about the kernel boot sequence, somewhere?) |
Simon Willcocks (1499) 513 posts |
Here’s a portion of a single-core approach with arm code. You have to make sure kernel_start starts at the beginning of the “ROM”, and now (this is from 2017) I’d allocate memory for each core’s stack in the code at initialise, and add attribute(( noreturn )) to initialise_with_stack. This time, the “naked” attribute means that the routine doesn’t try to do anything with the stack on entry, and it has the advantage that you can pass C constants into the assembler like sizeof( stack ), sizeof( translation_table ), etc..
|
Jeffrey Lee (213) 6048 posts |
That’s what the kernel already does. The Pi HAL copies the ROM to the high end of the address space for other reasons (I think the main reasons are to ensure 1MB alignment for more optimal page table configuration, and to ensure there’s a consistent physical memory map regardless of how the ROM was loaded)
Probably nothing that’s fully up-to-date. Just open up s.HAL, find RISCOS_Start, and read away!
Don’t expect there to be a lot of reason or significance behind the order that different things happen, a lot of it is just because this is a 30+ year old code base that’s grown and twisted over time. A lot of stuff could be tidied up in a rewrite. |
Simon Willcocks (1499) 513 posts |
I was afraid someone might say that! What’s the CAM? |
Jeffrey Lee (213) 6048 posts |
Technically it’s the “CAM soft copy”. It’s a big array which keeps track of the state of each physical RAM page. In the ARM2 days, this was a literal soft copy of the MEMC page tables, which were stored in content-addressable memory within the MEMC chip, which the ARM only had write access to. Modern ARMs use a completely different page table system, but for the kernel the CAM soft copy is still very useful data structure since it stores extra info that can’t be stored in the page tables, and allows for fast indexing by physical address or page number. |
Simon Willcocks (1499) 513 posts |
Ah, thanks. |
David J. Ruck (33) 1635 posts |
@StuartS I had hoped that the use of Utilities would be quite rare, but it seems there a large number in my current set of 32 bit compatible programs, although not as many as in the older 26 bit set. |
Rick Murray (539) 13840 posts |
There’s nothing wrong with utilities, and they’re useful to be used without replacing (or affecting) a currently running program. However, given their environment, it’s mind boggling that writing a utility in C works at all… It kinda really shouldn’t. :-) |
Simon Willcocks (1499) 513 posts |
Wanna see a way to do it with gcc? :) You compile it like this: arm-linux-gnueabi-gcc-8 example.c utility.c -nostdlib -fPIC -O4 -T utility.script -o example.elf utility.c and utility.script are the “framework”, example.c is whatever you want in your utility. utility.c:
utility.script:
(Don’t worry, I wrote it by reading the documents, it’s not come from any GPL source!) example.c:
The resulting file is 104 bytes long (I also did a version of your ListOpen example, which was around 400 bytes):
You can’t use global or static local variables, and the -fPIC flag is essential if you have rodata like strings (otherwise you get absolute pointers embedded in the code). Just an example of what you can do in (mostly) C without libraries. |
Rick Murray (539) 13840 posts |
Be more impressive if it wasn’t using the C compiler as an assembler. ;-) PS: what a horrible compiler that each line of assembler needs \n, makes the code rather ugly. |
Simon Willcocks (1499) 513 posts |
It’s all one string, with lots of lines. The point is that things like loops, register allocation, and optimisation can be taken care of by the compiler. Here’s the main part of the OpenFiles utility, not an assembly instruction in sight:
|
Simon Willcocks (1499) 513 posts |
You’ll have to imagine the correct indentation, I’m done fighting with the forum software for the night! |
Simon Willcocks (1499) 513 posts |
Say, for the sake of argument, someone made a start on a C kernel that gets as far as running C code at the proper virtual address on all four cores with the only assumptions being:
Each core has its own translation tables and mapped workspace, plus an area of shared workspace for locks, etc. What kind of copyright/licence should be put on it, and where could it reasonably be uploaded to? (It’s very much a work in progress, and has only been tested on QEMU so far, since you couldn’t tell if it was working without it hitting peripherals.) |
Steve Pampling (1551) 8170 posts |
I would say Apache to match the existing source (majority of it)1 Where to put it? Well you could talk to Paolo about using the community github1 here If ROOL are happy with it then it could move to here Or even start there1 1 But: What do I know? |
Jeffrey Lee (213) 6048 posts |
Copyright is your choice. The important thing is the licence, it’ll need to be something fairly permissive like 2 or 3 clause BSD, Apache, CDDL, MIT, etc. For source hosting, any kind of git repo should be fine. Obviously if you want other people to contribute, a platform that makes contribution easy (merge/pull requests, etc.) would be best. You don’t necessarily have to use ROOL’s gitlab, the code can easily be moved there later on if it reaches the point where you want to try getting it into the main source tree. Good luck! |
Simon Willcocks (1499) 513 posts |
OK, I’ve got an account on github, since I started helping with BBCSDL, so I guess I’ll put it there, unless it’s a bad idea? It won’t be until I’ve gone through their training program, though… |
Simon Willcocks (1499) 513 posts |
OK, I’ve put it up here: https://github.com/Simon-Willcocks/RISC-OS-Kernel-in-C It’s set up a vector table, but doesn’t handle events, yet, let alone interrupts. |
Simon Willcocks (1499) 513 posts |
I have a couple of questions: 1. Could I include a binary module from RISC OS on github and in my build? I’ve got all four cores running DrawMod independently, rotating the RISC OS cog (which I’ll have to ask Richard Hallas for permission for). There are still some major problems with cleaning the caches and one core gets an abort for no readily apparent reason, but it might pique someone’s interest. 2. Does anyone know: can the vdu code handling code be extracted into a module that claims WrchV? It’s obviously a lot of legacy code that would have to work just right. |
Julie Stamp (8365) 474 posts |
Haha, snap!
With VDUtext I was experimenting with that sort of thing. I never looked too in detail, but there’s more than WrcHV, for example there’s calls like OS_ReadVDUVariables etc. I believe it was done in RISC OS Select though. |
Simon Willcocks (1499) 513 posts |
I’m OK with it setting various VduVars, and I’d be fine with tweaking it a bit, I just don’t really want it in the kernel itself. Look forward, not backwards, and all that. I’m hoping someone can help me with some cache-fu (core-fu?), since my interpretation of the ARM DAI 0527A, Bare-metal Boot Code for ARMv8-A Processors, is duff, but the code makes a good illustration of the problem. |
Jeffrey Lee (213) 6048 posts |
Yes. Technical hurdle: C modules and maybe a few assembler modules will be statically linked to certain base addresses, so you won’t be able to use a binary directly. Instead you’ll need to use an AOF and link it to the correct base address during the build (e.g. the same way that ResolverBlob works). If AOF isn’t suitable, you might be able to convince the Norcroft linker to output a usable ELF (there is a command line option for it) Licensing: ROM and !System downloads just contain a copy of the Apache license file. So presumably it’d be sufficient to just include a copy of that alongside the binary in the git repo.
As Julie says, there are a number of places where the code would need decoupling from the kernel. It’s not impossible, it’d just take some time to make sure it’s done correctly and that any new interfaces are sensible. |
Simon Willcocks (1499) 513 posts |
Well, there’s a simple demo here, which limps along rotating flickery cogs. (Turning off the caching for the screen memory gets rid of the flickering.) https://github.com/Simon-Willcocks/RISC-OS-Kernel-in-C/files/6854988/kernel7.zip If someone can point out the obvious mistake, I’d appreciate it! |
Jeffrey Lee (213) 6048 posts |
I haven’t looked at the code, or tried running it, but immediate things that come to mind for cache/TLB related issues are:
n.b. Device/Strongly-Ordered/Normal are all ARMv7 names for memory types, I’m not sure off the top of my head what they’re calling them in ARMv8 Since there are lots of ways for things to go wrong, and only a couple of ways for things to go right, and your aim is to create a kernel which is RISC OS compatible, I’d say that it makes sense to lift the cache/TLB/page table code from the current kernel and use that as a starting point for your code. E.g. start with the current assembler code (modifying it as necessary to make it run, e.g. adjusting zero page references), make sure it fixes the problems you’re seeing, and then rewrite any bits you want in C. Some relevant places to look:
|
Simon Willcocks (1499) 513 posts |
Thanks! I found the source of the crash on core 1, it happened to be the one that set up the DAs, and it mapped the RMA as unshared. So that’s nice. The “random” flushing of cache lines was driving me mad, so I ended up with your screen memory policy. I don’t make everything sharable, as each core gets its own workspace, stacks, “zero page”, vectors, translation tables, etc. Most things will be, though; the RMA is shared, but modules in different cores will allocate different workspaces. I’m still having difficulty getting my head around what caches are available from where. The only device access is so far is the screen and a gpio pin, which is configured as you suggested. The plan is that the first module will set up the hardware devices, and carry on from there.
Sounds like the voice of painful experience! Does the processor grab instructions from anywhere it gets a pointer to, or something?
With EAE = 1, I see. That’s similar to the aarch64 environment. [ More good advice… ] I’m keeping the translation tables local to each core, so I hope they will be reasonably independent. Currently, I’m using short page tables; long page tables would go hand-in-hand with moving the kernel to 64-bit.
Device-nGnRnE or similar. I’m kind of coming at this from two directions. I started writing a strange sort of kernel where 99% of code runs at EL0, and the kernel allows calls between “maps” (not quite the same as processes). Once that was working moderately well, I started to implement a VM to run RISC OS in, but ground to a halt with (surprise surprise) MMU management problems; how much can be left to the processor’s non-secure mode, and how much do I have to emulate? So, I took a break from that to do the aarch64 assembler for BASIC, and then started on this project, using what I’d learned from the earlier one. If I get it right, then it should be easy to move to aarch64 at EL2. Now, I’ve got to spend a week or so in quarantine, moving from a country with 1,300 new cases a day to one with getting on for 50,000 (which won’t accept my vaccinations because they weren’t done by the NHS). Boo. (Is it just me, or are there more configuration registers in todays ARMs than there were instructions in the ARM2?) |