RISC OS Open: Forum: Zero page protection

Jul 17, 2012 4:18pm

ResEd loads on the iconbar. Clicking select on the icon produces an abort.
Clicking on ‘New’ from the iconbar menu also causes an abort.

While testing !PDF i’m getting a ’Couldn’t load toolbar’ error. It looks like it is failing in the ‘Toolbox_CreateObject’ swi. The object it is failing on is a ‘Toolbar’ object prototype. Specifically any gadgets that display a sprite cause the error. If I remove them from the ‘Toolbar’ object prototype !PDF then loads successfully.
If the ‘Res’ file is loaded into !ResEd and the ‘toolbar’ displayed the sprites appear at first then sometimes disappear. Also clicking on ‘Gadgets’ produces an abort.

Apr 6, 2013 10:14am

Chris Gransden (337) 1207 posts

It’s been a while since the last one. ‘amu’ aborts with the following,


*amu

Internal error: abort on data transfer at &0000CEC0

Postmortem requested
ceb8 in anonymous function
d310 in anonymous function
e130 in anonymous function
ab5c in anonymous function
  Arg2: 0x0000aa98 43672 -> [0xe1a0c00d 0xe92dd830 0xe24cb004 0xe15d000a]
  Arg1: 0x000161d8 90584 -> [0x00756d61 0x000161dc 0xeafffff7 00000000]
fc138c28 in shared library function
12fd4 in anonymous function
*

Jun 9, 2013 11:07am

Chris Gransden (337) 1207 posts

Typing any invalid command at the supervisor prompt produces an abort.

*jkjkjkjkj
Internal error: abort on data transfer at &FC013090
*where
Address &FC013090 is in the Kernel

Aug 17, 2013 3:15pm

Chris Gransden (337) 1207 posts

Running the Theme configuraration plugin produces the following abort,

Internal error: abort on data transfer at &FC149058
*where
Address &FC149058 is at offset &00013560 in module SharedCLibrary

Aug 18, 2013 12:02am

Rick Murray (539) 13840 posts

Running the Theme configuraration plugin

Excuse my dose of stupidity, but I’ve poked around Configure and !Boot (RPi, 5.21 of early August) and nothing seems to jump out as being an obvious way to configure themes.
Is it third-party?

Aug 18, 2013 5:54am

Chris Gransden (337) 1207 posts

The Theme configuration plugin was made available from 4th August onwards.

Dec 17, 2013 7:56pm

Jeffrey Lee (213) 6048 posts

I’ve now caught up with a few of these:

!Printers should be (mostly) fixed. The BASIC sources are very scary and were full of null pointer dereferences, so it wouldn’t surprise me if there are more problems lurking. Printing itself also looks like it might be a bit broken – I was able to print a text file but not a spritefile or drawfile (using the PostScript drivers, to file, over ShareFS).
The theme setup plugin should be fixed
I couldn’t reproduce the ‘*jkjkjkjk’ crash – are you still getting it? Is this from just hitting F12 while in the desktop, or from booting straight to the supervisor?

Dec 17, 2013 9:38pm

Steve Pampling (1551) 8170 posts

I’ve now caught up with a few of these:

Is it just my setup or is the current version of Maestro broken?
“Unknown or missing variable (internal error 5135)”
Poking around a bit it looks like either the crunched(I think) BASIC is not being detokenised correctly or it was altered incorrectly. Certainly editing line 5135 and manually splitting the tokens and variables moves the error elsewhere.
Obviously without editing the whole file I’m not likely to fix it.

Dec 17, 2013 9:48pm

Steve Pampling (1551) 8170 posts

“Unknown or missing variable (internal error 5135)”

also seems to occur with the version from the stable disc image 2013-07-09 so can’t be the actual maestro runimage file.

Dec 17, 2013 11:17pm

Steve Pampling (1551) 8170 posts

also seems to occur with the version from the stable disc image 2013-07-09 so can’t be the actual maestro runimage file.

OK, went back through archived ROM images. Maestro works November 28th image, fails on December 2 image.

According to the CVS logs there were BASIC related changes in that period

Dec 18, 2013 6:22am

WPB (1391) 352 posts

Can all modern builds of RO (R-Pi, BB, IOMD even?) run with high processor vectors? What is the compile switch to turn them on, please?

Dec 18, 2013 8:51am

Sprow (202) 1158 posts

Maestro “Unknown or missing variable (internal error 5135)”

also seems to occur with the version from the stable disc image 2013-07-09 so can’t be the actual maestro runimage file.

Weird: BASIC 1.57 softloaded on RISC OS 4.02 and Maestro 1.98 work fine. Same combo on 5.21 gives the above mentioned error. I might have screwed up something in re-26/32 bitting it, I’ll take a look…

Dec 18, 2013 1:36pm

Jeffrey Lee (213) 6048 posts

Can all modern builds of RO (R-Pi, BB, IOMD even?) run with high processor vectors? What is the compile switch to turn them on, please?

Everything can run with them on, except IOMD. Although if we wanted we could probably do a simplified version for IOMD (keep the processor vectors at &0, but move the kernel’s ZeroPage workspace up high and make &0 inaccessible in user mode)

To do a build with high processor vectors enabled you need to make two changes:

In castle.RiscOS.Sources.Kernel.hdr.Options, enable the HiProcVecs option
In BuildSys.Components.ROOL.<foo> add FPEANCHOR=High to the FPEmulator options

One day I might have a go at updating FPEmulator so that it doesn’t rely on storing a workspace pointer in the kernel’s ZeroPage workspace. A day or two ago I had a look at the sources and it didn’t look like there would be too many places where it would be difficult to get the pointer passed in from elsewhere.

Dec 18, 2013 4:49pm

WPB (1391) 352 posts

Thanks for the explanation. I’ll be sure to give that a go soon.

Dec 18, 2013 10:07pm

Sprow (202) 1158 posts

Maestro “Unknown or missing variable (internal error 5135)”

I might have screwed up something in re-26/32 bitting BASIC, I’ll take a look…

In that most rare of occurrences, nope, it wasn’t me: Maestro was reading the sprite suffix from the Wimp and assuming it was always a number, and passing it to a CASE statement with very limited WHEN’s. Now the Wimp does alpha sprites “A2” suffix meant that Maestro’s sprite pool variable wasn’t being set.

Actually, this would always have occurred for all but 4 combinations: 180dpi (EX0 EY0) for example.

Dec 19, 2013 8:41pm

Steve Pampling (1551) 8170 posts

In that most rare of occurrences, nope, it wasn’t me:

Don’t you just love those? T’wasn’t me guv. :)

Jun 17, 2015 1:22pm

Jeffrey Lee (213) 6048 posts

A recent meeting with ROOL has revealed that we both have a desire to enable this code ASAP and ~~break everyone’s computers~~ increase RISC OS’s stability & security. However due to the ~~large angry mob~~ number of legacy apps that need supporting it was concluded that it would be a good idea to implement some kind of compatibility module before we flip the switch. I ~~valiantly~~ foolishly volunteered for this task (since it’s something I’d been planning on doing since the outset), and now it’s time for me to implement it.

Although an all-singing, all-dancing compatibility module would be nice, such a thing would also take a long time to implement, and the full feature set might not actually be required in practice. So instead I’m thinking of aiming for a simple system which will do the bare minimum:

Simple Configure plugin to enable/disable the module and view/control logging
When enabled, the module will be loaded as part of the boot sequence and will install a data abort handler
Any read operation to zero page will be trapped and will add an entry to the log file (register dump, code around PC, wimp task/module name, etc.)
Reads of certain known kernel locations (monotonic timer, escape status byte, IRQsema, etc.) will return the correct value
Reads of other locations (including the processor vectors) will return zero
Writes will not be trapped and will trigger an abort as normal
Module will have a configuration file which gives details of which zero page locations should be trapped for which programs/modules. This will give good flexibility without needing much UI work in the configure plugin (just go the cheap and nasty route and make it open the file in a text editor). Logging could be controlled on a per-app/module basis too.
Only LDM/LDR/LDR[S]B/LDR[S]H/LDRD instructions will be trapped. More exotic stuff (FPA/VFP/NEON/LDREX/RFE/LDRT/etc.) will be left to abort. Unaligned access will abort.

In my mind that should be enough to get 90% of legacy software working without any impact. How/when to write to the log file will need a bit of thought (we don’t want to thrash people’s drives if they run something which aborts all the time) but the rest of it should be straightforward.

Anyone have any thoughts of their own?

Jun 17, 2015 4:34pm

Rick Murray (539) 13840 posts

I like the redactions. ;-)

Just out of interest, has there been a study of actual breakages if such a thing has been implemented?

I guess what I’m trying to say is…once the page zero access logging is implemented, can you just put out a special Pi test build so the brave souls can start it and see what breaks? Why is software in 2015 reading page zero locations instead of the legal SWI?

Jun 17, 2015 5:36pm

Steve Pampling (1551) 8170 posts

How/when to write to the log file will need a bit of thought (we don’t want to thrash people’s drives if they run something which aborts all the time) but the rest of it should be straightforward.

Anyone have any thoughts of their own?

Unless you have some nice easy to implement system that could magically identify particular tasks (or allow manual selection)¹ and NOT log those tasks thereafter then go with the idea you have.

I guess what I’m trying to say is…once the page zero access logging is implemented, can you just put out a special Pi test build so the brave souls can start it and see what breaks?

Page-protect test on RPCEmu IOMD would seem the first “see what you can break option”

Why is software in 2015 reading page zero locations instead of the legal SWI?

Because it’s legacy? (where legacy can mean “written in the previous century”)

¹ There’s some method of identifying tasks that Adrian used in Aemulor (task name?)

Jun 17, 2015 9:36pm

Jeffrey Lee (213) 6048 posts

Just out of interest, has there been a study of actual breakages if such a thing has been implemented?

Judging by my experience with fixing up the OS, I expect 90% of the breakages to be down to apps performing ‘harmless’ null pointer dereferences. There certainly won’t be any user mode code writing to zero page (since it’s already read-only in RISC OS 5), but there will be a small percentage of code which either reads or writes to known zero page locations (processor vectors, kernel workspace, etc.)

I guess what I’m trying to say is…once the page zero access logging is implemented, can you just put out a special Pi test build so the brave souls can start it and see what breaks?

Page-protect test on RPCEmu IOMD would seem the first “see what you can break option”

That will actually result in more breakage than simply moving zero page – there are a few locations which need to be readable from user mode (CLib workspace pointer, CLib tmpnam() counter, OS_ChangedBox buffer, etc.)

Jun 18, 2015 8:13am

Sprow (202) 1158 posts

Simple Configure plugin to enable/disable the module and view/control logging

I’m not sure I’d polish it to that high a sheen. There’s a distinct danger that this ends up in an AIF-header-check-conundrum like Select suffered, where ultimately users boil it down to “I got an error message on my favourite app, if I turn off the header check the error goes away” and we end up in a perpetual state of having to emulate zero page.

From reading your bullet list I think the aim here is to spend a <few months> watching for problematic apps, and using the logged data to formulate patches (applied with !Patch, that’s what it’s there for) like Acorn did when StrongARM came out.

Offending reads could, for example, incur a 1s penalty so there’s no annoying message but there is an impetus to fix the problem properly.

Reads of other locations (including the processor vectors) will return zero

Address 0×00000000 and 0×00000004 merit special mention, being NULL and the origin of many oflaoflaofla errors.

Only LDM/LDR/LDR[S]B/LDR[S]H/LDRD instructions will be trapped.

I’d count LDRSB/LDRSH/LDRD as exotic and probably not worth bothering with.

Jun 18, 2015 10:15am

Chris Hall (132) 3554 posts

From reading your bullet list I think the aim here is to spend a watching for problematic apps,

You can’t really start the clock until zero page protection (with optional turn off) has made it into a stable release. Until then the angry mob won’t have started using it!

Jun 18, 2015 10:25am

Jeffrey Lee (213) 6048 posts

Simple Configure plugin to enable/disable the module and view/control logging

I’m not sure I’d polish it to that high a sheen. There’s a distinct danger that this ends up in an AIF-header-check-conundrum like Select suffered, where ultimately users boil it down to “I got an error message on my favourite app, if I turn off the header check the error goes away” and we end up in a perpetual state of having to emulate zero page.

Yeah, that’s true. I think I was forgetting that 5.22 is a recent thing and 5.23 doesn’t have many new features yet. So we can assume that ordinary users will stick to 5.22 (or the Pi equivalent), while only techie users (who are happy with hacking their !Boot to enable the compatibility module manually) will be using 5.23 with zero page relocation.

From reading your bullet list I think the aim here is to spend a <few months> watching for problematic apps, and using the logged data to formulate patches (applied with !Patch, that’s what it’s there for) like Acorn did when StrongARM came out.

Actually I was hoping it was more a case of me writing the compatibility module and then forgetting about it, leaving the app developers to fix their broken code themselves. If the onus is on us to patch everyone else’s code then we’ll never get any other work done!

One thing that I know will cause a problem is apps compiled using old versions of GCC, since there was a bug in the stubs (for both clib & unlxlib?) which would cause a null pointer dereference if DDEUtils wasn’t loaded. I’m not sure how well !Patch is able to cope with that situation – I’d assume that the dodgy code sequence will be at different locations for different apps. Although for that case at least the workaround is simple (make sure DDEUtils is always loaded)

Reads of other locations (including the processor vectors) will return zero

Address 0×00000000 and 0×00000004 merit special mention, being NULL and the origin of many oflaoflaofla errors.

For ofla errors I’ve realised that we could quite easily add some code to the kernel (perhaps for odd-numbered development versions only?) which checks for null pointers on the SWI error exit and replaces it with something more useful (“SWI xxx returned null error pointer”?). That would be a big help in tracking down where the bad error pointers are coming from.

Jun 18, 2015 3:34pm

Chris Gransden (337) 1207 posts

Building an OMAP4 rom with zero page protection fails at various places with the following or similar,

Error: Literal pool too distant (use LTORG to dump it within 4KB) at line 1171 in file s.ARMops
 included by GET/INCLUDE directive at line 93 in file "s.GetAll"
 1171 fc011624         LDR     ip, =ZeroPage

Jun 18, 2015 7:56pm

Steve Pampling (1551) 8170 posts

Actually I was hoping it was more a case of me writing the compatibility module and then forgetting about it, leaving the app developers to fix their broken code themselves. If the onus is on us to patch everyone else’s code then we’ll never get any other work done!

Make a nice page in the wiki for descriptions of the individual faults to hang off.
Sub-pages should follow a format which expands on the symptoms and, when people figure it out, the fix. Many problems will be the same or similar root cause and the fix equally similar.

Short description: dump the evidence and clues in the laps of the resource and largely leave to it.

Zero page protection

Reply

Search forums

Social

ROOL Store

Donate! Why?

RISC OS IPR

Description

Voices

Options

Jul 17, 2012 4:18pm Chris Gransden (337) 1207 posts	ResEd loads on the iconbar. Clicking select on the icon produces an abort. Clicking on ‘New’ from the iconbar menu also causes an abort. While testing !PDF i’m getting a ’Couldn’t load toolbar’ error. It looks like it is failing in the ‘Toolbox_CreateObject’ swi. The object it is failing on is a ‘Toolbar’ object prototype. Specifically any gadgets that display a sprite cause the error. If I remove them from the ‘Toolbar’ object prototype !PDF then loads successfully. If the ‘Res’ file is loaded into !ResEd and the ‘toolbar’ displayed the sprites appear at first then sometimes disappear. Also clicking on ‘Gadgets’ produces an abort.

Apr 6, 2013 10:14am Chris Gransden (337) 1207 posts	It’s been a while since the last one. ‘amu’ aborts with the following, amu Internal error: abort on data transfer at &0000CEC0 Postmortem requested ceb8 in anonymous function d310 in anonymous function e130 in anonymous function ab5c in anonymous function Arg2: 0x0000aa98 43672 -> [0xe1a0c00d 0xe92dd830 0xe24cb004 0xe15d000a] Arg1: 0x000161d8 90584 -> [0x00756d61 0x000161dc 0xeafffff7 00000000] fc138c28 in shared library function 12fd4 in anonymous function

Jun 9, 2013 11:07am Chris Gransden (337) 1207 posts	Typing any invalid command at the supervisor prompt produces an abort. jkjkjkjkj Internal error: abort on data transfer at &FC013090 where Address &FC013090 is in the Kernel

Aug 17, 2013 3:15pm Chris Gransden (337) 1207 posts	Running the Theme configuraration plugin produces the following abort, Internal error: abort on data transfer at &FC149058 *where Address &FC149058 is at offset &00013560 in module SharedCLibrary

Aug 18, 2013 12:02am Rick Murray (539) 13840 posts	Running the Theme configuraration plugin Excuse my dose of stupidity, but I’ve poked around Configure and !Boot (RPi, 5.21 of early August) and nothing seems to jump out as being an obvious way to configure themes. Is it third-party?

Aug 18, 2013 5:54am Chris Gransden (337) 1207 posts	The Theme configuration plugin was made available from 4th August onwards.

Dec 17, 2013 7:56pm Jeffrey Lee (213) 6048 posts	I’ve now caught up with a few of these: !Printers should be (mostly) fixed. The BASIC sources are very scary and were full of null pointer dereferences, so it wouldn’t surprise me if there are more problems lurking. Printing itself also looks like it might be a bit broken – I was able to print a text file but not a spritefile or drawfile (using the PostScript drivers, to file, over ShareFS). The theme setup plugin should be fixed I couldn’t reproduce the ‘*jkjkjkjk’ crash – are you still getting it? Is this from just hitting F12 while in the desktop, or from booting straight to the supervisor?

Dec 17, 2013 9:38pm Steve Pampling (1551) 8170 posts	I’ve now caught up with a few of these: Is it just my setup or is the current version of Maestro broken? “Unknown or missing variable (internal error 5135)” Poking around a bit it looks like either the crunched(I think) BASIC is not being detokenised correctly or it was altered incorrectly. Certainly editing line 5135 and manually splitting the tokens and variables moves the error elsewhere. Obviously without editing the whole file I’m not likely to fix it.

Dec 17, 2013 9:48pm Steve Pampling (1551) 8170 posts	“Unknown or missing variable (internal error 5135)” also seems to occur with the version from the stable disc image 2013-07-09 so can’t be the actual maestro runimage file.

Dec 17, 2013 11:17pm Steve Pampling (1551) 8170 posts	also seems to occur with the version from the stable disc image 2013-07-09 so can’t be the actual maestro runimage file. OK, went back through archived ROM images. Maestro works November 28th image, fails on December 2 image. According to the CVS logs there were BASIC related changes in that period

Dec 18, 2013 6:22am WPB (1391) 352 posts	Can all modern builds of RO (R-Pi, BB, IOMD even?) run with high processor vectors? What is the compile switch to turn them on, please?

Dec 18, 2013 8:51am Sprow (202) 1158 posts	Maestro “Unknown or missing variable (internal error 5135)” also seems to occur with the version from the stable disc image 2013-07-09 so can’t be the actual maestro runimage file. Weird: BASIC 1.57 softloaded on RISC OS 4.02 and Maestro 1.98 work fine. Same combo on 5.21 gives the above mentioned error. I might have screwed up something in re-26/32 bitting it, I’ll take a look…

Dec 18, 2013 1:36pm Jeffrey Lee (213) 6048 posts	Can all modern builds of RO (R-Pi, BB, IOMD even?) run with high processor vectors? What is the compile switch to turn them on, please? Everything can run with them on, except IOMD. Although if we wanted we could probably do a simplified version for IOMD (keep the processor vectors at &0, but move the kernel’s ZeroPage workspace up high and make &0 inaccessible in user mode) To do a build with high processor vectors enabled you need to make two changes: In castle.RiscOS.Sources.Kernel.hdr.Options, enable the HiProcVecs option In BuildSys.Components.ROOL.<foo> add FPEANCHOR=High to the FPEmulator options One day I might have a go at updating FPEmulator so that it doesn’t rely on storing a workspace pointer in the kernel’s ZeroPage workspace. A day or two ago I had a look at the sources and it didn’t look like there would be too many places where it would be difficult to get the pointer passed in from elsewhere.

Dec 18, 2013 4:49pm WPB (1391) 352 posts	Thanks for the explanation. I’ll be sure to give that a go soon.

Dec 18, 2013 10:07pm Sprow (202) 1158 posts	Maestro “Unknown or missing variable (internal error 5135)” I might have screwed up something in re-26/32 bitting BASIC, I’ll take a look… In that most rare of occurrences, nope, it wasn’t me: Maestro was reading the sprite suffix from the Wimp and assuming it was always a number, and passing it to a CASE statement with very limited WHEN’s. Now the Wimp does alpha sprites “A2” suffix meant that Maestro’s sprite pool variable wasn’t being set. Actually, this would always have occurred for all but 4 combinations: 180dpi (EX0 EY0) for example.

Dec 19, 2013 8:41pm Steve Pampling (1551) 8170 posts	In that most rare of occurrences, nope, it wasn’t me: Don’t you just love those? T’wasn’t me guv. :)

Jun 17, 2015 1:22pm Jeffrey Lee (213) 6048 posts	A recent meeting with ROOL has revealed that we both have a desire to enable this code ASAP and ~~break everyone’s computers~~ increase RISC OS’s stability & security. However due to the ~~large angry mob~~ number of legacy apps that need supporting it was concluded that it would be a good idea to implement some kind of compatibility module before we flip the switch. I ~~valiantly~~ foolishly volunteered for this task (since it’s something I’d been planning on doing since the outset), and now it’s time for me to implement it. Although an all-singing, all-dancing compatibility module would be nice, such a thing would also take a long time to implement, and the full feature set might not actually be required in practice. So instead I’m thinking of aiming for a simple system which will do the bare minimum: Simple Configure plugin to enable/disable the module and view/control logging When enabled, the module will be loaded as part of the boot sequence and will install a data abort handler Any read operation to zero page will be trapped and will add an entry to the log file (register dump, code around PC, wimp task/module name, etc.) Reads of certain known kernel locations (monotonic timer, escape status byte, IRQsema, etc.) will return the correct value Reads of other locations (including the processor vectors) will return zero Writes will not be trapped and will trigger an abort as normal Module will have a configuration file which gives details of which zero page locations should be trapped for which programs/modules. This will give good flexibility without needing much UI work in the configure plugin (just go the cheap and nasty route and make it open the file in a text editor). Logging could be controlled on a per-app/module basis too. Only LDM/LDR/LDR[S]B/LDR[S]H/LDRD instructions will be trapped. More exotic stuff (FPA/VFP/NEON/LDREX/RFE/LDRT/etc.) will be left to abort. Unaligned access will abort. In my mind that should be enough to get 90% of legacy software working without any impact. How/when to write to the log file will need a bit of thought (we don’t want to thrash people’s drives if they run something which aborts all the time) but the rest of it should be straightforward. Anyone have any thoughts of their own?

Jun 17, 2015 4:34pm Rick Murray (539) 13840 posts	I like the redactions. ;-) Just out of interest, has there been a study of actual breakages if such a thing has been implemented? I guess what I’m trying to say is…once the page zero access logging is implemented, can you just put out a special Pi test build so the brave souls can start it and see what breaks? Why is software in 2015 reading page zero locations instead of the legal SWI?

Jun 17, 2015 5:36pm Steve Pampling (1551) 8170 posts	How/when to write to the log file will need a bit of thought (we don’t want to thrash people’s drives if they run something which aborts all the time) but the rest of it should be straightforward. Anyone have any thoughts of their own? Unless you have some nice easy to implement system that could magically identify particular tasks (or allow manual selection)¹ and NOT log those tasks thereafter then go with the idea you have. I guess what I’m trying to say is…once the page zero access logging is implemented, can you just put out a special Pi test build so the brave souls can start it and see what breaks? Page-protect test on RPCEmu IOMD would seem the first “see what you can break option” Why is software in 2015 reading page zero locations instead of the legal SWI? Because it’s legacy? (where legacy can mean “written in the previous century”) ¹ There’s some method of identifying tasks that Adrian used in Aemulor (task name?)

Jun 17, 2015 9:36pm Jeffrey Lee (213) 6048 posts	Just out of interest, has there been a study of actual breakages if such a thing has been implemented? Judging by my experience with fixing up the OS, I expect 90% of the breakages to be down to apps performing ‘harmless’ null pointer dereferences. There certainly won’t be any user mode code writing to zero page (since it’s already read-only in RISC OS 5), but there will be a small percentage of code which either reads or writes to known zero page locations (processor vectors, kernel workspace, etc.) I guess what I’m trying to say is…once the page zero access logging is implemented, can you just put out a special Pi test build so the brave souls can start it and see what breaks? Page-protect test on RPCEmu IOMD would seem the first “see what you can break option” That will actually result in more breakage than simply moving zero page – there are a few locations which need to be readable from user mode (CLib workspace pointer, CLib tmpnam() counter, OS_ChangedBox buffer, etc.)

Jun 18, 2015 8:13am Sprow (202) 1158 posts	Simple Configure plugin to enable/disable the module and view/control logging I’m not sure I’d polish it to that high a sheen. There’s a distinct danger that this ends up in an AIF-header-check-conundrum like Select suffered, where ultimately users boil it down to “I got an error message on my favourite app, if I turn off the header check the error goes away” and we end up in a perpetual state of having to emulate zero page. From reading your bullet list I think the aim here is to spend a <few months> watching for problematic apps, and using the logged data to formulate patches (applied with !Patch, that’s what it’s there for) like Acorn did when StrongARM came out. Offending reads could, for example, incur a 1s penalty so there’s no annoying message but there is an impetus to fix the problem properly. Reads of other locations (including the processor vectors) will return zero Address 0×00000000 and 0×00000004 merit special mention, being NULL and the origin of many oflaoflaofla errors. Only LDM/LDR/LDR[S]B/LDR[S]H/LDRD instructions will be trapped. I’d count LDRSB/LDRSH/LDRD as exotic and probably not worth bothering with.

Jun 18, 2015 10:15am Chris Hall (132) 3554 posts	From reading your bullet list I think the aim here is to spend a watching for problematic apps, You can’t really start the clock until zero page protection (with optional turn off) has made it into a stable release. Until then the angry mob won’t have started using it!

Jun 18, 2015 10:25am Jeffrey Lee (213) 6048 posts	Simple Configure plugin to enable/disable the module and view/control logging I’m not sure I’d polish it to that high a sheen. There’s a distinct danger that this ends up in an AIF-header-check-conundrum like Select suffered, where ultimately users boil it down to “I got an error message on my favourite app, if I turn off the header check the error goes away” and we end up in a perpetual state of having to emulate zero page. Yeah, that’s true. I think I was forgetting that 5.22 is a recent thing and 5.23 doesn’t have many new features yet. So we can assume that ordinary users will stick to 5.22 (or the Pi equivalent), while only techie users (who are happy with hacking their !Boot to enable the compatibility module manually) will be using 5.23 with zero page relocation. From reading your bullet list I think the aim here is to spend a <few months> watching for problematic apps, and using the logged data to formulate patches (applied with !Patch, that’s what it’s there for) like Acorn did when StrongARM came out. Actually I was hoping it was more a case of me writing the compatibility module and then forgetting about it, leaving the app developers to fix their broken code themselves. If the onus is on us to patch everyone else’s code then we’ll never get any other work done! One thing that I know will cause a problem is apps compiled using old versions of GCC, since there was a bug in the stubs (for both clib & unlxlib?) which would cause a null pointer dereference if DDEUtils wasn’t loaded. I’m not sure how well !Patch is able to cope with that situation – I’d assume that the dodgy code sequence will be at different locations for different apps. Although for that case at least the workaround is simple (make sure DDEUtils is always loaded) Reads of other locations (including the processor vectors) will return zero Address 0×00000000 and 0×00000004 merit special mention, being NULL and the origin of many oflaoflaofla errors. For ofla errors I’ve realised that we could quite easily add some code to the kernel (perhaps for odd-numbered development versions only?) which checks for null pointers on the SWI error exit and replaces it with something more useful (“SWI xxx returned null error pointer”?). That would be a big help in tracking down where the bad error pointers are coming from.

Jun 18, 2015 3:34pm Chris Gransden (337) 1207 posts	Building an OMAP4 rom with zero page protection fails at various places with the following or similar, Error: Literal pool too distant (use LTORG to dump it within 4KB) at line 1171 in file s.ARMops included by GET/INCLUDE directive at line 93 in file "s.GetAll" 1171 fc011624 LDR ip, =ZeroPage

Jun 18, 2015 7:56pm Steve Pampling (1551) 8170 posts	Actually I was hoping it was more a case of me writing the compatibility module and then forgetting about it, leaving the app developers to fix their broken code themselves. If the onus is on us to patch everyone else’s code then we’ll never get any other work done! Make a nice page in the wiki for descriptions of the individual faults to hang off. Sub-pages should follow a format which expands on the symptoms and, when people figure it out, the fix. Many problems will be the same or similar root cause and the fix equally similar. Short description: dump the evidence and clues in the laps of the resource and largely leave to it.