Where to start kernel hacking?
Simon Willcocks (1499) 520 posts |
Wasn’t someone giving away copies of the DDE to interested coders a while ago? |
Paolo Fabio Zaino (28) 1882 posts |
That was for BBC BASIC (aka ABC Compiler) only I think |
Stuart Swales (8827) 1357 posts |
I think there was an announcement some years ago that full DDE might be available to those who signed up to particular tasks. |
Simon Willcocks (1499) 520 posts |
Found it, from 2020, which feels like yesterday: https://www.riscosopen.org/news/articles/2020/02/21/new-desktop-development-environment-reaches-out
|
NancySadkov (10280) 34 posts |
Fun fact: it can eat raw x86 byte code and convert it to C++. But I doubt it will handle ARM without a tool, since it is not byte code. But as I mentioned several times already, it works best when you have well documented source code to fed it with, otherwise it gets confused what you want from it. It is not amazing at ARM machine code (despite it being the simplest and cleanest out there), since all the ARM tutorials are confusing, incomplete and targeted at “security experts”. I personally read that Aug 1986 ARM assembler reference manual, which appears to be the easiest source to read (the documentation at arm.com is just impenetrable, even to LLM).
The problem is: I already need a public domain ARM assembler for my project. So why not create one compatible with the ObjAsm? There is not a single C file inside the Kernel. I do need a linker. But then again, I need a PD ARM linker for my Symta lisp system project to detach it from any dependencies onto the host C compiler. In other words, there are no other option but to do that. If I ever need to compiler Norcroft C, I think I can adapt my transpiler for it. It has some unusual calling conventions like __value_in_regs, but I think it can be solved with a trampoline, which I again need to study to properly transfer Symta’s FFI runtime to ARM. Currently going through the encoding format (went to RPCEmu for reference). Operation{cond}{S} Rd, Rm, Rn data processing instruction format: BitOffset/NBits: Name - Description 28/ 4: C - condition. See CND_.* macros 26/ 2: T - instruction class (always 0x0) 25/ 1: I - IF 1, Rm is immediate 21/ 4: O - operation code. OPC_.* macros 20/ 1: S - set flags 16/ 4: Rn - 1st operand register 12/ 4: Rd - destination register 0/12: Rm - 2nd operand register or immediate Register (I=0): 0/4: Register 4/1: ShiftSource 5/2: ShiftType 0 = logical left 1 = logical right 2 = arithmetic right (sign bit handled) 3 = rotate right 7/5: shift value (0 to 31) Immediate (ShiftSource=0) 7/5 = immediate (i.e. 7 for LSL#7) Register (ShiftSource=1) 7/1 = should be 0 8/4 = register Immediate (I=1): 0/8: value 8/4: rotate right amount note: it is multiplied by 2 before use //////////////////////// // Operation Codes //Rd - destionation //Rn - 1st operand //Rm - 2nd operand //Rd = Rn & Rm #define OPC_AND 0x0 //Rd = Rn ^ Rm #define OPC_EOR 0x1 //Rd = Rn - Rm #define OPC_SUB 0x2 //Rd = Rm - Rn #define OPC_RSB 0x3 //Rd = Rn + Rm #define OPC_ADD 0x4 //Rd = Rn + Rm + C #define OPC_ADC 0x5 //Rd = Rn - Rm - C #define OPC_SBC 0x6 //Rd = Rm - Rn - C #define OPC_RSC 0x7 //NZCV <- Rn & Rm //S should be always 1 #define OPC_TST 0x8 //NZCV <- Rn ^ Rm //S should be always 1 #define OPC_TEQ 0x9 //NZCV <- Rn - Rm //S should be always 1 #define OPC_CMP 0xA //NZCV <- Rn + Rm (compare negated) //S should be always 1 #define OPC_CMN 0xB //Rd = Rn | Rm #define OPC_ORR 0xC //Rd = Rm (Rn is ignored) #define OPC_MOV 0xD //Rd = Rn & ~Rm (bit clear) #define OPC_BIC 0xE //Rd = ~Rm (Rn is ignored) #define OPC_MVN 0xF |
Paolo Fabio Zaino (28) 1882 posts |
Did you have a look at the HAL? |
Simon Willcocks (1499) 520 posts |
There are many compiled C files in the ROM, mostly for modules. |
NancySadkov (10280) 34 posts |
I think there was https://www.4corn.co.uk/articles/acornc5/ It could be useful for reference.
I’m interested in getting the Kernel running first.
Can’t I take modules from the already compiled distribution? |
Stuart Swales (8827) 1357 posts |
Not very helpful, that’s an archaic DDE. Useful for building RISC OS 3.71 maybe, but definitely not RISC OS 5.
No, because the C modules in the ROM are statically linked to the single copy of the SharedCLibrary, which would end up moving address if you changed the kernel that lives at the start of the image. But if you want to instrument the DHCP module, that’s a different matter, that’s not ‘the kernel’ in RISC OS. You could build a RISC OS ROM with the DHCP module moved to the end of the image and just keep replacing that, perhaps. Though that, as a C module itself, would need linking against the SharedCLibrary each time. All the ASM modules are PIC, the C ones are not. |
Chris Mahoney (1684) 2165 posts |
I particularly enjoyed its claim that a lowercase × symbol is ÷. (Edit: Which I now see you mentioned in the comments.) |
Simon Willcocks (1499) 520 posts |
@NancySadkov You might be able to manage something that overwrites the HAL, but not much.
I’ve been stuck in them for the last couple of years! |
NancySadkov (10280) 34 posts |
Outline of the assembler, which compiles just the basic form of data processing opcodes (i.e. `movsne r0, r1`) Macros apparently will be the dirtiest part.
There is still one classic strategy: parasite hook into the working build, replacing functions one by one. HLE emulators do that and they used this trick to reverse engineer Dungeon Keeper and Diablo. The people initially reversing Might & Magic VI went directly with HexRays dumping, and the result was an explosion of bugs and crashes.
Latest GPT3.5 version writes correct code (if you’re lucky). |
Rick Murray (539) 13850 posts |
Good luck with that. Macros are something ObjAsm excels at, and RISC OS uses them frequently.
? Surely C modules start up in a specific way and make certain library calls to begin? If you can identify this, then you’ll have a known address to a known library function. From there you can may be able to derive the addresses of the library functions (an easy way to grab these would be to get the library to fill in the application jump table (or just steal one from a running app)).
Interesting. It gave me something different.
Passed today playing with ChatGPT. It seems it is okay at fairly simple things, but messes up as complexity increases. Still, the broken code it gave me was enough to get me going to write something that worked. |
Steve Pampling (1551) 8172 posts |
Typo time :) While I’m here, there’s something about `movsne r0, r1` that’s nudging a braincell, asking for activity. |
Rick Murray (539) 13850 posts |
I never claimed to be good at maths. ;)
An interesting little article on parsing ARM instructions by somebody not used to ARM: I think the mistake people (including the above) tend to make is in thinking that all those instructions are different. MOVNES is really just MOV with the S option and the NE conditional. It’s better to break it apart like that, because if you’re going to parse them as separate instructions, there’s like a hundred instructions (figured pulled from my posterior quarters as life is too damned short to go count them), which have, what is it, 14 possible conditions? Plus multiple options (like S for all the maths ones, B for LDR/STR, the stack modes for LDM/STM), etc etc. You’d be looking at parsing something like 2,000 odd instructions. Yikes! Easier way. Read the first three letters. That’s usually the instruction, though they broke that will UMULL, UMLAL and such but they are recognisable enough. Likewise special handling for B and BL. |
Rick Murray (539) 13850 posts |
A couple of small observations on the pastebin stuff:
|
Stuart Swales (8827) 1357 posts |
And oddballs, such as RRX |
NancySadkov (10280) 34 posts |
IIRC BSD cant be just thrown into an existing code base and you have to credit everyone involved.
The main error is that I tried to introduce an intermediate instruction format (the x86 experience), but with ARM the right thing is to map mnemonic to a prototype and then just or-in arguments, because ARM is so simple. |
Chris Mahoney (1684) 2165 posts |
There’s also this series, written from a Windows perspective but a lot of it’s probably still relevant.
“I have no idea who would ever use RRX anyway.” – Part 3 of the series linked above :) |
Paolo Fabio Zaino (28) 1882 posts |
That is because the Chat has the P parameter set probably. In the OpenAI GPT models (both in GPT 3 and in 4) there are some parameters a user can set to have it more creative vs more precise, in other words it’s possible to “configure” a model (it’s also possible to fine tune it by providing a fine-tuning dataset). Given that a Chat may require some degree of “creativity”, most likely it’s pre-configured with P = 1.0 or similar (parameter values are float). This makes it more creative and also causes the effect of the model responding differently to the same question. Which is why, it’s a terrible idea to use it for ASM or C/C++. There, “creativity” (mind, that is not the same meaning of human creativity), can cause problems. There are models (or even using OpenAI GPT API instead of ChatGPT, where it’s possible to set all the parameters) that can do a better job at coding problems. Mind that the APIs generally are “context-less”, in other words they do not keep a context like a Chat format does, so prompts have to be extremely clear and descriptive. As a side note, it’s totally feasible to develop models that can deal with ARM assembly or C in a very good way, but they tend to be more dedicated, while ChatGPT is an ambicious project which tries to develop a single model to be used in all scenarios. |
Steve Pampling (1551) 8172 posts |
Machine creativity = add random bits Human creativity = add random bits passed through a filter (filter = will this kill me or do something bad? AND do I care?) |
Clive Semmens (2335) 3276 posts |
I think human filters can be quite a lot more complex than that! I suppose machine ones could be too. Maybe they’ll get there…in the end. Human designers of creative machines not creative enough? (Yet?) |
NancySadkov (10280) 34 posts |
First milestone. It can compile the hello world from https://www.riscosopen.org/wiki/documentation/show/A%20BASIC%20guide%20to%20ObjAsm TODO: OS_WriteS * &01 OS_NewLine * &03 OS_Exit * &11 AREA helloworld, CODE, READONLY ENTRY SWI OS_WriteS = "Hello, world!" DCB 0 ALIGN SWI OS_NewLine MOV R0,#0 LDR R1,abex MOV R2,#0 SWI OS_Exit abex DCD &58454241 ; "ABEX" END |
Rick Murray (539) 13850 posts |
Ah, good, you’ve made it CC0. Some jurisdictions (ahem, a chunk of Europe) don’t have a concept of releasing things as public domain as there isn’t “copyright” so much as “moral rights” and you can’t waive those. That being said, legal rubbish is legal rubbish so you might need more than a single line saying FREEDOM! (picture Mel Gibson here) See here: https://wiki.creativecommons.org/wiki/CC0_FAQ#May_I_apply_CC0_to_computer_software.3F_If_so.2C_is_there_a_recommended_implementation.3F Otherwise, pretty good going. 👍 You’ll have fun with the macro stuff. This’ll make your head spin: https://gitlab.riscosopen.org/RiscOS/Sources/Programmer/HdrSrc/-/blob/master/hdr/CPU/Generic32
Libraries are the job of the librarian (LibFile in the DDE) to collect together all the object files into a “library”. It’s a separate program because any AOF can be used to create a library. It’s quite normal to mix bits in C with bits in assembler.
Simple C is doable. Norcroft isn’t that unusual, so it might be possible to lightly bash an existing compiler for anything Norcroft specific. Where it will matter, however, is in its optimisations. Norcroft is pretty clever in that respect. |
NancySadkov (10280) 34 posts |
CC0 is different from basic PD, since it is a license with explicit dedication:
As of now there is just Clang.
I did a turning complete macro-processor with lexical scoping before, but it was single pass and had different constraints: Still unsure how to properly approach the macro expansion, since ObjAsm apparently allows to use forward resolved labels in its EQU expressions, so one has to implement macros declarations after their use. Assembler just ignores them during the first pass, assuming a single word opcode and then goes back during the second pass to compile it. |