RISC OS Open: Forum: C Kernel

Jun 8, 2021 11:44am

There are fundamental problems with using either the current AnsiLib or Stubs to link transient utilities and it does state in the Desktop Tools manual that you shouldn’t be using the C library. Neither is suitable as it stands even for utilities with no arguments as they use application memory for the stack and don’t exit correctly – see the PRM for the environment that transient utilities must work in. Both AnsiLib and the SharedCLibary also use software stack checking – that’d need a separate stack block setting up as the transient workspace is tiny.

Jun 13, 2021 5:40pm

Simon Willcocks (1499) 513 posts

Again, there’s no need for any C library, shared or otherwise, in a kernel.

Each core can allocate a stack and call a C routine in about a dozen assembler instructions. If you avoid recursion, you can check that it won’t all be used.

While we’re at it, there’s no need to copy the kernel from where it’s loaded, just enable the MMU with the area including the OS mapped to wherever it already is, and near the top of memory (0xfc000000), add the difference to the PC (and update the stack pointer, and never return from the routine that does it), then remove the temporary copy. (I assume no bootloader loads the “ROM” overlapping the virtual area, but that would simply require an intermediate mapping to, say, 0×80000000.)

I’d also take the opportunity to strip the HAL down to the bare minimum; the kernel can run with just a CPU, memory, and an MMU.

Most drivers and applications require a timer, and interrupt management. The Wimp requires a frame buffer, etc..

The boot sequence requires NV/CMOS RAM equivalent, which can easily be reserved space on a disc or equivalent, and maybe give the HAL a chance to choose the boot sequence type (like shift-boot, etc.).

(Is there documentation about the kernel boot sequence, somewhere?)

Jun 13, 2021 6:04pm

Simon Willcocks (1499) 513 posts

Here’s a portion of a single-core approach with arm code. You have to make sure kernel_start starts at the beginning of the “ROM”, and now (this is from 2017) I’d allocate memory for each core’s stack in the code at initialise, and add attribute(( noreturn )) to initialise_with_stack. This time, the “naked” attribute means that the routine doesn’t try to do anything with the stack on entry, and it has the advantage that you can pass C constants into the assembler like sizeof( stack ), sizeof( translation_table ), etc..


static void initialise_with_stack();

void __attribute__(( naked )) kernel_start()
{
asm volatile goto (
      "B %l[initialise]"
  "\n\tB %l[undef]"               // Enter current map's handler, or throw exception
  "\n\tB %l[swi]"                 // Enter current map's handler, or throw exception
  "\n\tB %l[instruction_abort]"   // Return from inter-map call, or as above
  "\n\tB %l[data_abort]"          // Inter-map call, memory management needed,
  "\n"                        // otherwise handler/exception
  "\n\tB %l[unused_vector]"       // No idea what to do if we get here!
  "\n\tB %l[irq]"                 // Pass interrupt to driver

  "\nfiq: .space 256, 0"      // Space for later FIQ code, initialise stack.
  "\ninitialise_stack:"
  :
  :
  :
  : initialise, undef, swi, instruction_abort, data_abort, unused_vector, irq );

initialise: // A C label
  asm volatile( "adr sp, initialise_stack" );
  initialise_with_stack(  );
  __builtin_unreachable();

Jun 13, 2021 7:06pm

Jeffrey Lee (213) 6048 posts

enable the MMU with the area including the OS mapped to wherever it already is, and near the top of memory (0xfc000000), add the difference to the PC (and update the stack pointer, and never return from the routine that does it), then remove the temporary copy.

That’s what the kernel already does.

The Pi HAL copies the ROM to the high end of the address space for other reasons (I think the main reasons are to ensure 1MB alignment for more optimal page table configuration, and to ensure there’s a consistent physical memory map regardless of how the ROM was loaded)

(Is there documentation about the kernel boot sequence, somewhere?)

Probably nothing that’s fully up-to-date. Just open up s.HAL, find RISCOS_Start, and read away!

Fun and games to process the list of RAM chunks and identify suitable chunks for use during MMU/kernel init
Query the CPU a bit to determine what page table format to use
Bare-minimum memory allocation & page table setup
Enable the MMU
Do some hardware initialisation (HAL_Init, 100hz timer, IIC busses)
ROM decompression
Allocate & map in some more key areas (including the privileged mode stacks – prior to this, a bit of kernel workspace is temporarily used as the stack)
Set up the CAM, so that the rest of the kernel can start handling memory more normally
Processor vector, IRQ, and SWI dispatcher setup
Create some important dynamic areas
More IRQ setup
Some higher-level stuff (environment handlers, VDU driver, HAL_InitDevices, system variables, etc.)
Initialise modules for keyboard scan
Do keyboard scan
Initialise more dynamic areas, subsystems, and remaining modules
Issue Service_PostInit
More stuff I can’t be bothered listing

Don’t expect there to be a lot of reason or significance behind the order that different things happen, a lot of it is just because this is a 30+ year old code base that’s grown and twisted over time. A lot of stuff could be tidied up in a rewrite.

Jun 13, 2021 8:01pm

Simon Willcocks (1499) 513 posts

find RISCOS_Start, and read away!

I was afraid someone might say that! What’s the CAM?

Jun 13, 2021 8:18pm

Jeffrey Lee (213) 6048 posts

What’s the CAM?

Technically it’s the “CAM soft copy”. It’s a big array which keeps track of the state of each physical RAM page. In the ARM2 days, this was a literal soft copy of the MEMC page tables, which were stored in content-addressable memory within the MEMC chip, which the ARM only had write access to. Modern ARMs use a completely different page table system, but for the kernel the CAM soft copy is still very useful data structure since it stores extra info that can’t be stored in the page tables, and allows for fast indexing by physical address or page number.

Jun 13, 2021 9:01pm

Simon Willcocks (1499) 513 posts

Ah, thanks.

Jun 14, 2021 8:05am

David J. Ruck (33) 1635 posts

There are fundamental problems with using either the current AnsiLib or Stubs to link transient utilities

@StuartS I had hoped that the use of Utilities would be quite rare, but it seems there a large number in my current set of 32 bit compatible programs, although not as many as in the older 26 bit set.

Jun 14, 2021 9:42am

Rick Murray (539) 13840 posts

There’s nothing wrong with utilities, and they’re useful to be used without replacing (or affecting) a currently running program.

However, given their environment, it’s mind boggling that writing a utility in C works at all… It kinda really shouldn’t. :-)
https://www.riscosopen.org/wiki/documentation/show/File%20formats:%20Utility

Jun 15, 2021 4:28pm

Simon Willcocks (1499) 513 posts

However, given their environment, it’s mind boggling that writing a utility in C works at all… It kinda really shouldn’t. :-)

Wanna see a way to do it with gcc? :)

You compile it like this:

arm-linux-gnueabi-gcc-8 example.c utility.c -nostdlib -fPIC -O4 -T utility.script -o example.elf
arm-linux-gnueabi-objcopy -O binary —only-section=.init —only-section=.text* —only-section=.rodata* —only-section=.endtag example.elf example.util

utility.c and utility.script are the “framework”, example.c is whatever you want in your utility.

utility.c:


extern void utility( const char *cmdline, const char *cmdtail );
asm ( “.section .init”

    “\n.global _start”

    “\n.type _start, %function”

    “\n_start:”

    “\n  b 1f”

    “\n  .word 0×79766748”

    “\n  .word 0×216C6776”

    “\n  .word utility_end”

    “\n  .word 0”

    “\n  .word 32”

    “\n1:”

    “\n  str r14, [r12]”

    “\n  mov r9, r12”

    “\n  b utility”

    “\n  .previous” );
asm ( “.section .endtag”

    “\n  .word 0×4b4f3233”

    “\nutility_end:”

    “\n.previous” );

utility.script:


ENTRY( _start )
SECTIONS
{
 . = 0 ;
 .text : {
  *(.init) ;
  *(.text) ;
  *(.rodata.*) ;
  . = ALIGN( 4 ) ;
  *(.endtag) ;
 }
/DISCARD/ : {

  *(.*) ;

}

} ;

(Don’t worry, I wrote it by reading the documents, it’s not come from any GPL source!)

example.c:


struct {
    void (*exit)();
    // Any other workspace stuff can go here, up to 1k, less stack space and size of exit.
} *workspace asm( "r9" );
static inline void xos_write0( const char *buf )

{

  register const char *b asm( “r0” ) = buf;

  asm( “svc 0×20002”

   “\n  ldrvs pc, [r9]” : : “r” (b) );

}
static inline void xos_newline( )

{

  asm( “svc 0×20003”

   “\n  ldrvs pc, [r9]” );

}
void utility( const char *cmdline, const char *cmdtail )

{

  xos_write0( cmdline );

  xos_newline();

  xos_write0( “Hello world\n” );

  xos_newline();

}

The resulting file is 104 bytes long (I also did a version of your ListOpen example, which was around 400 bytes):



00000000 <_start>:

   0:   ea000004        b       18 <_start+0x18>

   4:   79766748        ldmdbvc r6!, {r3, r6, r8, r9, sl, sp, lr}^

   8:   216c6776        smccs   50806   ; 0xc676

   c:   00000068        andeq   r0, r0, r8, rrx

  10:   00000000        andeq   r0, r0, r0

  14:   00000020        andeq   r0, r0, r0, lsr #32

  18:   e58ce000        str     lr, [ip]

  1c:   e1a0900c        mov     r9, ip

  20:   eaffffff        b       24 
00000024 :

  24:   ef020002        svc     0×00020002

  28:   6599f000        ldrvs   pc, [r9]

  2c:   ef020003        svc     0×00020003

  30:   6599f000        ldrvs   pc, [r9]

  34:   e59f0014        ldr     r0, [pc, #20]   ; 50 <utility+0x2c>

  38:   e08f0000        add     r0, pc, r0

  3c:   ef020002        svc     0×00020002

  40:   6599f000        ldrvs   pc, [r9]

  44:   ef020003        svc     0×00020003

  48:   6599f000        ldrvs   pc, [r9]

  4c:   e12fff1e        bx      lr

  50:   00000014        andeq   r0, r0, r4, lsl r0

  54:   6c6c6548        cfstr64vs       mvdx6, [ip], #-288      ; 0xfffffee0

  58:   6f77206f        svcvs   0×0077206f

  5c:   0a646c72        beq     191b22c <utility_end+0x191b1c4>

  60:   00000000        andeq   r0, r0, r0

  64:   4b4f3233        blmi    13cc938 <utility_end+0x13cc8d0>

You can’t use global or static local variables, and the -fPIC flag is essential if you have rodata like strings (otherwise you get absolute pointers embedded in the code).

Just an example of what you can do in (mostly) C without libraries.

Jun 15, 2021 8:33pm

Rick Murray (539) 13840 posts

Wanna see a way to do it with gcc? :)

Be more impressive if it wasn’t using the C compiler as an assembler. ;-)

PS: what a horrible compiler that each line of assembler needs \n, makes the code rather ugly.

Jun 15, 2021 8:56pm

Simon Willcocks (1499) 513 posts

It’s all one string, with lots of lines. The point is that things like loops, register allocation, and optimisation can be taken care of by the compiler.

Here’s the main part of the OpenFiles utility, not an assembly instruction in sight:


void utility( const char *cmdline, const char *cmdtail )
{
  xos_write0( "List of open files\n" );
  for (int i = 255; i >= 0; i--) {
    unsigned fileflags = get_flags( i );
    if (0 == ((1 << 11) & fileflags)) {
      // Stream is allocated
int length = int_to_string( i );
for (int j = 3; j > length; j—)
print_space();
xos_write0( workspace→string );
print_space();
xos_writec_risky( ((1 << 6) & fileflags) != 0 ? ‘R’ : ’ ’ );
xos_writec_risky( ((1 << 7) & fileflags) != 0 ? ‘W’ : ’ ’ );
xos_writec_risky( ((1 << 8) & fileflags) != 0 ? ‘!’ : ’ ’ );
xos_writec_risky( ((1 << 9) & fileflags) != 0 ? ‘E’ : ’ ’ );
xos_writec_risky( ((1 << 10) & fileflags) != 0 ? ‘U’ : ’ ’ );
xos_writec_risky( ((1 << 13) & fileflags) != 0 ? ‘X’ : ’ ’ );
print_space();
print_filename( i );
xos_newline();
}
}
}

Jun 15, 2021 8:58pm

Simon Willcocks (1499) 513 posts

You’ll have to imagine the correct indentation, I’m done fighting with the forum software for the night!

Jun 24, 2021 4:48pm

Simon Willcocks (1499) 513 posts

Say, for the sake of argument, someone made a start on a C kernel that gets as far as running C code at the proper virtual address on all four cores with the only assumptions being:

There’s at least 64MiB RAM starting at address 0 (the rest can be reported to the kernel later)
The image has been loaded into that memory and is executed in a 32-bit privileged mode from the first word

Each core has its own translation tables and mapped workspace, plus an area of shared workspace for locks, etc.

What kind of copyright/licence should be put on it, and where could it reasonably be uploaded to?

(It’s very much a work in progress, and has only been tested on QEMU so far, since you couldn’t tell if it was working without it hitting peripherals.)

Jun 24, 2021 5:30pm

Steve Pampling (1551) 8170 posts

What kind of copyright/licence should be put on it, and where could it reasonably be uploaded to?

I would say Apache to match the existing source (majority of it)¹

Where to put it? Well you could talk to Paolo about using the community github¹ here

If ROOL are happy with it then it could move to here

Or even start there¹

¹ But: What do I know?

Jun 24, 2021 5:40pm

Jeffrey Lee (213) 6048 posts

What kind of copyright/licence should be put on it, and where could it reasonably be uploaded to?

Copyright is your choice. The important thing is the licence, it’ll need to be something fairly permissive like 2 or 3 clause BSD, Apache, CDDL, MIT, etc.

For source hosting, any kind of git repo should be fine. Obviously if you want other people to contribute, a platform that makes contribution easy (merge/pull requests, etc.) would be best. You don’t necessarily have to use ROOL’s gitlab, the code can easily be moved there later on if it reaches the point where you want to try getting it into the main source tree.

Good luck!

Jun 24, 2021 10:02pm

Simon Willcocks (1499) 513 posts

OK, I’ve got an account on github, since I started helping with BBCSDL, so I guess I’ll put it there, unless it’s a bad idea? It won’t be until I’ve gone through their training program, though…

Jun 25, 2021 7:20am

Simon Willcocks (1499) 513 posts

OK, I’ve put it up here:

https://github.com/Simon-Willcocks/RISC-OS-Kernel-in-C

It’s set up a vector table, but doesn’t handle events, yet, let alone interrupts.

Jul 19, 2021 8:22pm

Simon Willcocks (1499) 513 posts

I have a couple of questions:

1. Could I include a binary module from RISC OS on github and in my build?

I’ve got all four cores running DrawMod independently, rotating the RISC OS cog (which I’ll have to ask Richard Hallas for permission for).

There are still some major problems with cleaning the caches and one core gets an abort for no readily apparent reason, but it might pique someone’s interest.

2. Does anyone know: can the vdu code handling code be extracted into a module that claims WrchV? It’s obviously a lot of legacy code that would have to work just right.

Jul 19, 2021 8:42pm

Julie Stamp (8365) 474 posts

rotating the RISC OS cog

Haha, snap!

2. Does anyone know: can the vdu code handling code be extracted into a module that claims WrchV? It’s obviously a lot of legacy code that would have to work just right.

With VDUtext I was experimenting with that sort of thing. I never looked too in detail, but there’s more than WrcHV, for example there’s calls like OS_ReadVDUVariables etc. I believe it was done in RISC OS Select though.

Jul 19, 2021 9:20pm

Simon Willcocks (1499) 513 posts

With VDUtext I was experimenting with that sort of thing. I never looked too in detail, but there’s more than WrcHV, for example there’s calls like OS_ReadVDUVariables etc. I believe it was done in RISC OS Select though.

I’m OK with it setting various VduVars, and I’d be fine with tweaking it a bit, I just don’t really want it in the kernel itself. Look forward, not backwards, and all that.

I’m hoping someone can help me with some cache-fu (core-fu?), since my interpretation of the ARM DAI 0527A, Bare-metal Boot Code for ARMv8-A Processors, is duff, but the code makes a good illustration of the problem.

Jul 19, 2021 10:28pm

Jeffrey Lee (213) 6048 posts

Could I include a binary module from RISC OS on github and in my build?

Yes.

Technical hurdle: C modules and maybe a few assembler modules will be statically linked to certain base addresses, so you won’t be able to use a binary directly. Instead you’ll need to use an AOF and link it to the correct base address during the build (e.g. the same way that ResolverBlob works). If AOF isn’t suitable, you might be able to convince the Norcroft linker to output a usable ELF (there is a command line option for it)

Licensing: ROM and !System downloads just contain a copy of the Apache license file. So presumably it’d be sufficient to just include a copy of that alongside the binary in the git repo.

Does anyone know: can the vdu code handling code be extracted into a module that claims WrchV? It’s obviously a lot of legacy code that would have to work just right.

As Julie says, there are a number of places where the code would need decoupling from the kernel. It’s not impossible, it’d just take some time to make sure it’s done correctly and that any new interfaces are sensible.

Jul 21, 2021 10:59am

Simon Willcocks (1499) 513 posts

Well, there’s a simple demo here, which limps along rotating flickery cogs. (Turning off the caching for the screen memory gets rid of the flickering.)

https://github.com/Simon-Willcocks/RISC-OS-Kernel-in-C/files/6854988/kernel7.zip

If someone can point out the obvious mistake, I’d appreciate it!

Jul 22, 2021 9:12am

Jeffrey Lee (213) 6048 posts

I haven’t looked at the code, or tried running it, but immediate things that come to mind for cache/TLB related issues are:

Make sure everything is marked as shareable (so the different cores can peek into each others caches, and LDREX/STREX work correctly)
Make sure screen memory uses a write-through cache policy instead of write-back
Make sure all I/O regions (i.e. read or write sensitive hardware registers) are Device/Strongly-Ordered and non-executable. Marking memory as non-executable is the only way to fully prevent instruction fetches.
If your page tables are cacheable, make sure TTBCR is configured correctly
If memory is marked as cacheable, you need to assume that it’s in the cache. Even if you’ve just flushed the cache, the CPU could spontaneously reload it at any point. So to fully evict something from the cache (e.g. when mapping out memory or making it non-cacheable) you need to do the following:
1. Update the page tables to make the memory non-cacheable (e.g. “Normal, non-cachable” attributes)
2. Do the right synchronisation dance (e.g. DSB+ISB to wait for page table write to complete, them CP15 TLB invalidate op, then DSB+ISB to wait for CP15 op to complete)
3. Flush the pages from the cache
4. (If necessary) update the page tables again to set the desired state (e.g. map out the pages)
For multi-core, make sure you’re using cache/TLB operations which are broadcast to the other cores. IIRC this rules out the use of set/way based operations. MVA-based operations, and full I cache invalidate, should be fine (although even among those, there are different versions that have different broadcasting behaviour)
IIRC, technically any page table modification requires the other cores to execute an ISB in order for them to see the modification. But the number of cases where you need to add extra ISBs to ensure this will be pretty small (since most of the time you’d use a mutex or semaphore or something to signal to the core that “whatever you were waiting for has been done”, and that will execute an ISB)
For long descriptor page tables, LDRD/STRD (and I guess also LDREXD/STREXD?) are the only instructions that are guaranteed to access page tables in an atomic way from the perspective of the page table walk hardware.

n.b. Device/Strongly-Ordered/Normal are all ARMv7 names for memory types, I’m not sure off the top of my head what they’re calling them in ARMv8

Since there are lots of ways for things to go wrong, and only a couple of ways for things to go right, and your aim is to create a kernel which is RISC OS compatible, I’d say that it makes sense to lift the cache/TLB/page table code from the current kernel and use that as a starting point for your code. E.g. start with the current assembler code (modifying it as necessary to make it run, e.g. adjusting zero page references), make sure it fixes the problems you’re seeing, and then rewrite any bits you want in C.

Some relevant places to look:

s.ARMops, the WB_CR7_Lx routines (single-core ARMv7/v8) & ARMv7MP routines (multi-core v7/v8)
s.ARMops, the XCBTableVMSAv6 & XCBTableVMSAv6Long tables (mapping of RISC OS memory types to page table attributes)
s.VMSAv6, s.VMSAv6Short, s.VMSAv6Long, s.ShortDesc, s.LongDesc (low-level routines for reading/writing page table entries, and some higher-level routines for things like enabling cacheable pagetables & bulk memory mapping routines used by lazy task swapping)
s.Exceptions (there’s some code in the data abort handler to detect data aborts that can be triggered by cache maintenance operations – e.g. I think this happens when doing cache maintenance on an unmapped region of memory)

Jul 22, 2021 7:36pm

Simon Willcocks (1499) 513 posts

Thanks!

I found the source of the crash on core 1, it happened to be the one that set up the DAs, and it mapped the RMA as unshared. So that’s nice.

The “random” flushing of cache lines was driving me mad, so I ended up with your screen memory policy. I don’t make everything sharable, as each core gets its own workspace, stacks, “zero page”, vectors, translation tables, etc. Most things will be, though; the RMA is shared, but modules in different cores will allocate different workspaces. I’m still having difficulty getting my head around what caches are available from where.

The only device access is so far is the screen and a gpio pin, which is configured as you suggested. The plan is that the first module will set up the hardware devices, and carry on from there.

Marking memory as non-executable is the only way to fully prevent instruction fetches.

Sounds like the voice of painful experience! Does the processor grab instructions from anywhere it gets a pointer to, or something?

If your page tables are cacheable, make sure TTBCR is configured correctly

With EAE = 1, I see. That’s similar to the aarch64 environment.

[ More good advice… ]

I’m keeping the translation tables local to each core, so I hope they will be reasonably independent. Currently, I’m using short page tables; long page tables would go hand-in-hand with moving the kernel to 64-bit.

n.b. Device/Strongly-Ordered/Normal are all ARMv7 names for memory types, I’m not sure off the top of my head what they’re calling them in ARMv8

Device-nGnRnE or similar.

I’m kind of coming at this from two directions. I started writing a strange sort of kernel where 99% of code runs at EL0, and the kernel allows calls between “maps” (not quite the same as processes). Once that was working moderately well, I started to implement a VM to run RISC OS in, but ground to a halt with (surprise surprise) MMU management problems; how much can be left to the processor’s non-secure mode, and how much do I have to emulate?

So, I took a break from that to do the aarch64 assembler for BASIC, and then started on this project, using what I’d learned from the earlier one. If I get it right, then it should be easy to move to aarch64 at EL2.

Now, I’ve got to spend a week or so in quarantine, moving from a country with 1,300 new cases a day to one with getting on for 50,000 (which won’t accept my vaccinations because they weren’t done by the NHS). Boo.

(Is it just me, or are there more configuration registers in todays ARMs than there were instructions in the ARM2?)

C Kernel

Reply

Search forums

Social

ROOL Store

Donate! Why?

RISC OS IPR

Description

Voices

Options