New OS_PlatformFeatures 0 flags
Pages: 1 2
Jeffrey Lee (213) 6048 posts |
I’m thinking of adding a new flag to OS_PlatformFeatures 0 to indicate whether UDIV/SDIV are supported (I’m planning on doing a few little optimisations to the OS as a bit of a break from the daily grind, and one of the things on my list is to start making use of the UDIV/SDIV instructions). Can anyone think of any other flags that would be good to add? Here are some I can think of, if people think they would be useful:
There’s also the question of whether we should avoid adding new flags to OS_PlatformFeatures 0 for instruction availability and instead add new reason codes for reading the instruction set feature registers (although there is the danger there that ARM might make a backwards-incompatible change to the feature registers which would then trip up software) |
||||||||||||||||||||||||||||||||||||
Sprow (202) 1155 posts |
I mused a similar thing when fixing which ID registers are used to deduce some of the flags. I’d not realised how much juicy information there was in the various ID registers (on ARMv7, at least). In particular I think the spirit of bit 9 (“CPU supports Thumb mode”) should probably be read as the original 16 bit encoding Thumb-1 instructions, since it sits between bit 8 and 10 which correspond to ARM7TDMI era. We now have mixed 16 & 32 bit encoding Thumb-2 instructions but no API to distinguish which is which. On the flip side, they are backwards compatible (ie. any 16 bit Thumb-1 stuff should still work) and I’m not aware of anyone using Thumb code on RISC OS anyway – so I let sleeping dogs lie and left the bit set for either Thumb-1 OR Thumb-2. If we were to add an extra flag, the values in RISC OS 5.18/5.20/5.22 would be wrong, so as an API extension it’d be a bit of a turkey.
Beware the LDRH/STRH, since although the StrongARM supports it, the Risc PC memory bus doesn’t, so there’s a difference between it running (not aborting) and working on a given machine.
I don’t think ARM’s registers should be exposed raw. ARM seem to give 4 bits per each feature, but have only allocated one or two enumerated values. Distilling those into bit flags that are meaningful (and could be remapped in future if ARM changes their mind) to RISC OS would seem reasonable though: for example the logic to deduce whether SWP exists is a bit twisted in ARM’s world, whereas we just want a binary “yes/no” result. |
||||||||||||||||||||||||||||||||||||
Jeffrey Lee (213) 6048 posts |
For the Thumb-2 flag SpriteExtend only really cares about the ARM instructions that were introduced at the same time (MOVW, MOVT, UBFX, BFC, BFI, etc.), so the flag could easily be specific to those rather than indicating any level of Thumb mode execution support in the CPU/OS. Of course it still won’t change the fact that old stable versions of RISC OS 5 will report the wrong value for some of the new flags. If we wanted to fix that, without requiring developers to implement manual feature checks, we’d probably need a new module which is softloadable on everything (unlike CallASWI) which provides all the relevant information. |
||||||||||||||||||||||||||||||||||||
Rick Murray (539) 13806 posts |
I’d be inclined to wonder if it wouldn’t be better to provide a new API OS_PlatformFeatures n (not 0) which provides an abstract view of what sort of flags are likely to be necessary on RISC OS. For example, knowing that the processor supports TrustZone Security extensions is unlikely to be necessary. Knowing how it will deal with an unaligned LDR moreso. Jazelle, probably not. Saturating maths instructions, possibly yes. Why a new API? Because this can provide up to date generic information more applicable to new processors (as opposed to “CPU is XScale” and “XScale JTAG is connected” (bits 18 and 19)); allowing programs to fall back to PlatformFeatures 0 (or probe the hardware itself) if the newer API is not present; without altering the meaning of any of the older API flags that might be used by a program. It appears as if there are 11 or 15 bits available in the older API (depends on whether 11-14 is reserved unused or reserved undocumented). Is this enough for all we’d be likely to want to say?2 I agree with Sprow that the data shouldn’t be exposed in raw form. I’ve written a cpuinfo program that reads a lot of interesting information from the Pi’s processor, but I have noticed that interpreting this information is somewhat convoluted (and in some cases seemed flat out contradictory1, and how do we deal with older processors that don’t have this level of information such as the ARM710? Better to have a more abstract API where RISC OS fills in the flags itself – as it will know what platform it is running on and can fake (if necessary) missing information such as UMULL (etc) on a StrongARM but not on an ARM610/710; and as Sprow notes, there’s a difference between an instruction being available on the processor and it working on the platform in question. If the RiscPC hardware can’t deal with LDRH, then LDRH needs to be flagged as not supported on the platform. 1 Looking in the data, it seems that flags can be 0 to mean “Not supported” (most flags), 0 to mean “Supported” (ie Ex/Sync Primitive on ARM11 and Cortex-A), and 0 to mean both depending on processor (ie supersections on Cortex-A) and even if the flags mean the same thing on different processors (such as ARM11’s Cache Coherence DMA). Much of this stuff is unlikely to be relevant to normal applications programs; however I really think it would be best for the OS to expose this information having parsed it once correctly instead of everybody that wants to know prodding around the CPUID data for themselves… 2 Don’t forget, we might want to indicate how many cores are available, how many are supposed/used by the OS, and whether or not the current incarnation of RISC OS is ‘primary’ or not. Sure, it’s off in the future, but may be something useful for the API to return; as well as other features not directly related to the processor such as whether or not a temperature sensor is available (via a standardised API, that is), and so on…? |
||||||||||||||||||||||||||||||||||||
Sprow (202) 1155 posts |
I think we’re talking at cross purposes – I used Thumb-2 as an illustration of something that’s currently lacking & dangerous to change, rather than picking on your suggestion of adding it to help out SpriteExtend. You’re quite right that if phrased as “Supports ARM v6K extensions” then it would be distinct and more useful.
That’s probably tolerable if it’s the safe way round: the bit being clear suggesting UDIV/SDIV being present would cause pre UDIV/SDIV to be used, which is safe. Again my example of redefining bit 9 to be “and Thumb-2 too” is dangerous because you’re uncertain if such code can be run (or not).
See bit 10.
Perhaps crack open a new subreason and style it on OS_ReadSysInfo 8 this time, so there’s a register reporting which flags the OS knows about in addition to the flags themselves? |
||||||||||||||||||||||||||||||||||||
Jeffrey Lee (213) 6048 posts |
Indeed – trying to work out if WFE or WFI do anything useful is particularly tortuous. Which reminds me that we might need flags for those (or one for WFE, at least – programs should ideally use Portable_Idle for WFI, since there’s also a MCR-based WFI on some chips, and the Pi 1 has a CPU bug that needs working around).
Yes, I think your discussion of adjusting bit 9 bamboozled me a bit. old stable versions of RISC OS 5 will report the wrong value for some of the new flags Yes, that’s what I had in mind. |
||||||||||||||||||||||||||||||||||||
Jeffrey Lee (213) 6048 posts |
Looking at a few different versions of the ARM ARM, it looks like we need to be able to differentiate between the following v3-v5 revisions:
ARMv6 and above is where things get a bit messy, due to the instruction set feature registers being introduced, and some instructions being “supported” but not being useful (NOP hints). The instruction set feature flags suggest there are the following possibilities (focusing on ARM instructions, and ignoring instructions covered by <ARMv6):
However it’s possible to distil that down further by just looking at the features which are optional for each architecture variant:
In summary this leaves us with the following flags:
Which is 9-12 new flag bits, depending on how cheeky we want to get with hijacking existing flags. Conveniently there are 12 bits spare in OS_PlatformFeatures. Out of the new flags, ARMv7VE, ARMv8 and CRC are the only truly new ones (no stable OS release for platforms which would have the flags set). |
||||||||||||||||||||||||||||||||||||
Rick Murray (539) 13806 posts |
Aren’t the flags words on newer ARM cores because these features are now more “optional”? With that in mind I wonder if it may be better to have flags for specific operations (signed long multiply, saturating maths, etc) rather than making assumptions based upon architecture version. Obviously there isn’t space for the more esoteric instructions, so they can be left for the application to probe for if it thinks it needs them… |
||||||||||||||||||||||||||||||||||||
Jeffrey Lee (213) 6048 posts |
Yes, but a lot of that comes from the different architecture profiles (A, R, M) and execution modes (ARM, Thumb). Since we don’t really care about Thumb, and can’t run on M profile, things become a lot simpler. E.g. there are three different encodings for SXTB (ARMv6 Thumb without rotation, ARMv6T2 Thumb-2 with optional rotation, ARMv6 ARM with optional rotation). Since we don’t care about Thumb we can just have two states for the instruction (supported/not supported) rather than three (full support/no rotation/not supported). |
||||||||||||||||||||||||||||||||||||
Chris Evans (457) 1614 posts |
Haven’t other OSs or ARM themselves come up with differentiation system? |
||||||||||||||||||||||||||||||||||||
Peter Howkins (211) 236 posts |
On Linux there are two methods, the HWCAPS information provided in the ELF header of a running app. http://lxr.free-electrons.com/source/arch/arm/include/uapi/asm/hwcap.h You can also view the contents of /proc/cpuinfo and check the features line, an example from a raspberry pi Features : swp half thumb fastmult vfp edsp java and a pi 2 Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm and a pi 3 Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 On Android, which is ARMv5 or later only a smaller set of flags are presented to native binaries. |
||||||||||||||||||||||||||||||||||||
Jeffrey Lee (213) 6048 posts |
If we’re OK with grouping things by architecture version, I’m starting to think that it would be better to have a new call for reading the architecture version number, rather than allocating one bit per architecture in OS_PlatformFeatures 0. That way we’d only need 6 or 7 new flag bits to indicate which extensions are supported (depending on whether we want to repurpose bit 13 for the ‘K’ extension), leaving a reasonable number of flags spare for future expansion. For the ROM alignment setting, I think we’ll need at least two bits to encode the possible states (unknown, ROM requires rotated load behaviour, ROM requires unaligned load behaviour, ROM doesn’t use unaligned accesses). But maybe we’d want two extra bits to indicate the alignment modes supported by the CPU. |
||||||||||||||||||||||||||||||||||||
Sprow (202) 1155 posts |
Does that not then fall foul of ROOL’s dislike of monotonically incrementing feature tests
We’ve no reason to try to cram them all into PlatformFeatures 0 do we? There are plenty of spare subreason codes, I’d rather have more bit testable features I think, no need to recycle flag bits. ARM are bound to change their minds in future and withdraw something like the various page size ideas they’ve had and fast context switch dalliance. Grouping things loosely based on the instruction set feature registers seems the right kind of granularity.
I’d always read that as the MMX extensions, though 13 years down the line it probably does implicitly mean BLX CLZ BKPT too. |
||||||||||||||||||||||||||||||||||||
Rick Murray (539) 13806 posts |
Case in point: RISC OS 4 < RISC OS 5 < RISC OS 6…uh…hang on…
No, PlatformFeatures 0 ought to provide “sensible” results (or at least no worse than right now) as a fallback for if the to-be-defined reason code is not available (as it won’t be on all current versions of RISC OS given it doesn’t yet exist). While I’m not against architecture versions as a quick guide to what the platform is running, I am very wary of the fact that the current processors include all those flags. The only reason I can think of for for having so many flags for obvious parts of the core instruction set are because it is perfectly valid for those instructions not to exist, depending on the chip baker. For instance, maybe a Cortex design may exist that doesn’t support double word loads and stores because it makes for a cheaper design? |
||||||||||||||||||||||||||||||||||||
Jeffrey Lee (213) 6048 posts |
In which case I go back to my earlier suggestion of just exposing the (instruction set) feature registers as-is. Looking at your example of SWP, the reason two fields are used is because one pertains to SWP/SWPB being fully supported, and one pertains to SWP/SWPB only working correctly in a uniprocessor environment. If we were to OR those two fields together and just say “SWP is supported” then anyone trying to use SWP to access a location which is being updated by another bus master (DSP coprocessor?) could be in for a bad time. The fact that ARM added a second field for this rather than just adding a new value to the first field shows that they are trying to make sure everything is done in a forwards-compatible manner – so a value of 3 for one field will always mean that the system supports all the features that were introduced with values 2 and 1. Of course the counter-argument to that is that on some systems you can disable SWP in the SCTLR (which will not affect the value of the feature registers), but for the small number of configurable behaviours it should be easy enough to have our SWI wrapper update the fields to match the current processor configuration. |
||||||||||||||||||||||||||||||||||||
Rick Murray (539) 13806 posts |
In which case we’re back to the issue of different processors having different flags, potentially different interpretations of the same flags, and……. |
||||||||||||||||||||||||||||||||||||
Jeffrey Lee (213) 6048 posts |
Provide an example of where the feature registers are contradictory and I’ll happily withdraw the suggestion. |
||||||||||||||||||||||||||||||||||||
Sprow (202) 1155 posts |
Grouping things loosely based on the instruction set feature registers seems the right kind of granularity. The sweet spot I think it’d be good to aim for is
Just my 2p |
||||||||||||||||||||||||||||||||||||
Jeffrey Lee (213) 6048 posts |
The API will provide fake register values for old architecture versions, and will avoid the need for the user to drop into SVC mode to peek at the registers.
That’s a fair point. Despite being almost 10 years old, the ARMv7 ARM is still kept under lock and key on ARM’s website, so something which doesn’t require programmers to consult it directly would be useful.
Yes. Even if we go with the “just expose the feature registers” route there’ll still need to be some extra flags elsewhere for some things (e.g. LDSRB, LDRH, LDRSH, STRH are all considered to be part of the “base instruction set” that the feature registers assume is always present, which isn’t very useful when we want an API that goes back at least as far as ARMv3) There’s also a bunch of other information we could potentially report, e.g. whether MUL allows Rd == Rn (which is determined by the architecture version, rather than a feature register)
And when they decide to remove SMULL but keep UMULL? ;-) If we’re worried about features vanishing in future then combining flags is the wrong way to go. |
||||||||||||||||||||||||||||||||||||
Jeffrey Lee (213) 6048 posts |
It looks like there are about 50 useful flags was can pull from the instruction set feature registers, using a table of data to describe which feature register bits map to each flag. Extra flags (e.g. LDRH availability) can then be added in by special-case code after the table routine has run. Proposed flags for the feature registers are below. I’ve stripped out most of the obvious Thumb-related things. CPUFeature_AESE_AESD_AESMC_AESIMC CPUFeature_BFC_BFI_SBFX_UBFX CPUFeature_BKPT CPUFeature_BLX CPUFeature_BX CPUFeature_CLREX_LDREXB_LDREXH_STREXB_STREXH CPUFeature_CLZ CPUFeature_CRC32B_CRC32H_CRC32W_CRC32CB_CRC32CH_CRC32CW CPUFeature_DMB_DSB_ISB CPUFeature_Generic_CDP2_LDC2_MCR2_MRC2_STC2 CPUFeature_Generic_CDP_LDC_MCR_MRC_STC CPUFeature_Generic_MCRR2_MRRC2 CPUFeature_Generic_MCRR_MRRC CPUFeature_Interworking_MOV_pc CPUFeature_LDAB_LDAH_LDA_LDAEXB_LDAEXH_LDAEX_LDAEXD_STLB_STLH_STL_STLEXB_STLEXH_STLEX_STLEXD CPUFeature_LDM_STM_continuable CPUFeature_LDM_STM_noninterruptible CPUFeature_LDM_STM_restartable CPUFeature_LDRD_STRD CPUFeature_LDREXD_STREXD CPUFeature_LDREX_STREX CPUFeature_LDRHT_LDRSBT_LDRSHT_STRHT CPUFeature_MLS CPUFeature_MOVW_MOVT CPUFeature_MRS_MSR CPUFeature_NOP_hints CPUFeature_PKHBT_PKHTB_QADD16_QADD8_QASX_QSUB16_QSUB8_QSAX_SADD16_SADD8_SASX_SEL_SHADD16_SHADD8_SHASX_SHSUB16_SHSUB8_SHSAX_SSAT16_SSUB16_SSUB8_SSAX_SXTAB16_SXTB16_UADD16_UADD8_UASX_UHADD16_UADD8_UHASX_UHSUB16_UHSUB8_UHSAX_UQADD16_UQADD8_UQASX_UQSUB16_UQSUB8_UQSAX_USAD8_USADA8_USAT16_USUB16_USUB8_USAX_UXTAB16_UXTB16 CPUFeature_PLD CPUFeature_PLDW CPUFeature_PLI CPUFeature_PSR_GE_bits CPUFeature_QADD_QDADD_QDSUB_QSUB CPUFeature_Q_bit CPUFeature_RBIT CPUFeature_REV_REV16_REVSH CPUFeature_SEVL CPUFeature_SHA1C_SHA1P_SHA1M_SHA1H_SHA1SU0_SHA1SU1 CPUFeature_SHA256H_SHA256H2_SHA256SU0_SHA256SU1 CPUFeature_SMC CPUFeature_SMLABB_SMLABT_SMLALBB_SMLALBT_SMLALTB_SMLALTT_SMLATB_SMLATT_SMLAWB_SMLAWT_SMULBB_SMULBT_SMULTB_SMULTT_SMULWB_SMULWT CPUFeature_SMLAD_SMLADX_SMLALD_SMLALDX_SMLSD_SMLSDX_SMLSLD_SMLSLDX_SMMLA_SMMLAR_SMMLS_SMMLSR_SMMUL_SMMULR_SMUAD_SMUADX_SMUSD_SMUSDX CPUFeature_SMULL_SMLAL CPUFeature_SRS_RFE_CPS CPUFeature_SSAT_USAT CPUFeature_SWP_SWPB CPUFeature_SWP_SWPB_uniproc CPUFeature_SXTAB_SXTAH_UXTAB_UXTAH CPUFeature_SXTB16_SXTAB16_UXTB16_UXTAB16 CPUFeature_SXTB_SXTH_UXTB_UXTH CPUFeature_UDIV_SDIV CPUFeature_UMAAL CPUFeature_UMULL_UMLAL It remains to be seen whether objasm will be happy with a 300 character identifier. |
||||||||||||||||||||||||||||||||||||
Peter Howkins (211) 236 posts |
A tad controversial perhaps, but this shouldn’t be a feature of the core OS … Backwards compatibility, it’s a thing. |
||||||||||||||||||||||||||||||||||||
Rick Murray (539) 13806 posts |
There should come a point where one draws the line, otherwise pretty much everything could come under the “backwards compatibility” banner. I don’t see a real pressing need for this information on a 26 bit machine given that the hardware capabilities are well known and unchanging. I think OS_PlatformFeatures 0 will allow one to determine whether or not the processor is a StrongARM, otherwise it can mostly be determined from the machine type and whether or not there is a an command to turn the cache on and off… |
||||||||||||||||||||||||||||||||||||
Jeffrey Lee (213) 6048 posts |
The code needed to extract the flags from the feature registers is pretty minimal, and OS-version agnostic. A bit of extra logic is needed to deal with ARMv2/2a, and to read the architecture straight from the MIDR, and we’ll have something that will happily run on all RISC OS machines. The best solution for backwards compatibility is probably going to be to add the code into CallASWI. That will take care of all the 26bit machines. For 32bit machines we currently don’t provide a 32bit version of CallASWI, but considering that the module is slowly gaining more and more features that were introduced post-5.00 it would seem that the most sensible choice would be to start producing a 32bit version of CallASWI should the demand become high enough. |
||||||||||||||||||||||||||||||||||||
Jeffrey Lee (213) 6048 posts |
I’ve now checked in the kernel-side changes for this. There’s a big list of flags exposed by OS_PlatformFeatures 34, and OS_ReadSysInfo 8 is now able to report the memory access alignment mode that the ROM was built for. And here’s a BASIC program people can use to easily see the results of OS_PlatformFeatures 34: http://www.phlamethrower.co.uk/misc2/cpufeats.zip I’ve tested it on Raspberry Pi 1-3, along with ARM2/250/3 and StrongARM (via the as-yet unreleased CallASWI), so if anyone spots any oddities elsewhere (or on the ones I’ve already tested) then let me know! |
||||||||||||||||||||||||||||||||||||
William Harden (2174) 244 posts |
Just a thought…. If you were to write a piece of software which traps undefined instructions and emulates the missing instruction, is it possible for other modules to update the flags word to reflect it? Obviously not going to happen now – but if you imagine the FPE (and I think someone emulated half-word aligns some time ago but can’t remember when). |
Pages: 1 2