Boolean arrays in BBC BASIC
Steve Drain (222) 1620 posts |
That just makes explicit how
Which is just the same. The comparison yields a boolean result, ‘true’ or ‘false’,
If that is so – I have not checked – and you are testing it in a shortish
It is not ‘coercing’ and ‘fudging’. It is called ‘casting’ and BASIC does it all without fuss. Of course, there is a time penalty, but not huge. All floating point systems can hold any 32-bit integer exactly, which is handy when you want to deal with unsigned integers in BASIC. I feel a great weight hanging over my head. ;-) Edit: Not all floating point systems. Not single precision as with NEON.
Would you actually find |
Frank de Bruijn (160) 228 posts |
And you would be right, because it is. |
Steve Drain (222) 1620 posts |
I have edited my last post to remove a glitch. I should make clear that |
Clive Semmens (2335) 3276 posts |
I think, but haven’t checked, that shifts and |
Steve Drain (222) 1620 posts |
AFAICT the BASIC There is a lovely phrase I have picked up, premature optimisation, that warns against making these changes at the wrong stage of programming. Here are the optimised routines: DEFPROCsetBit(a%,b%)PROCresBit:?a%=?a%OR1<<b%:ENDPROC DEFPROCresetBit(a%,b%)PROCresBit:?a%=?a%ANDNOT1<<b%:ENDPROC DEFFNgetBit(a%,b%)PROCresBit:=?a%AND1<<b% DEFPROCresBit:a%+=b%>>3:b%=b%AND7:ENDPROC |
Steve Drain (222) 1620 posts |
Rambling on … The BASIC integer type is used in three domains: signed numbers, bit fields and boolean values. If the use of a variable is confined exclusively to one of those there is never a problem. However, there are gains to be made employing the functions of one domain in another. It is in this cross-over that premature optimisation is a danger and care must be taken. ;-) |
Clive Semmens (2335) 3276 posts |
Your optimized routines look even more like what I’d done already, except that as I wrote, I’d already optimized them further by inlining them. Sadly, still too slow for the biggest arrays I’d like to try. Not to worry. I’ve not run out of memory with byte-sized booleans yet (which is an order of magnitude faster…), and when/if I do, I’m obviously going to have to go to assembler – which might still be too slow. Maybe book myself in for a spell in cryogenic hibernation. Or give up on this ridiculous project. Or bite the AArch64 assembler bullet and get an M1-powered Mac 8~) As for the nice ramble – spot on. |
Rick Murray (539) 13840 posts |
I don’t know if it’s true. I just can’t help but think that the creator of the language used checks against zero instead of FALSE for better reasons than “it’s quicker to type”.
I’m thinking of things from the point of view of what the machine is doing inside, which is probably promoting the integer to a float and then comparing them.
I was going to say, reading it, that shifting down and shifting back up by the same amount looks like an AND might do the job… Glad I read ahead. Let’s put it differently. I suck at maths. I’m quite good at binary. So yes, shifts are easier for me. But you can throw in OR, AND, NOT, and EOR too. I can work with those. I’ve not had to work with bit arrays larger than a word. So And, since I’ve given my code in C, there’s always the other option. Epic cheat and use a bitfield. ;-) |
GavinWraith (26) 1563 posts |
I have a small query, which is easier to ask here than look up in an ARM^n document. It concerns parallel execution. If { clearly does not matter. Do modern ARMs execute them in parallel? I would expect this to be the case. Indeed, the whole question of parallelizability is obviously a very big deal, but I have not seen any documents with explicit details. Can anybody quote chapters and verses?
|
Rick Murray (539) 13840 posts |
I think it depends upon the processor. This talks about the A8 – https://community.arm.com/developer/ip-products/processors/f/cortex-a-forum/3919/cortex-a8-instruction-fetch-for-dual-issue/10314#10314 Later cores use out of order execution, to further muddle things up. ;-) |
Rick Murray (539) 13840 posts |
Class lecture slides. More to do with Intel, but the concepts are broadly the same. |
Sprow (202) 1158 posts |
It turned out that even ARM’s own documents (for the Cortex-A8 at least) didn’t contain a full cycle description of how the various instructions get pipelined, especially when one pipeline gets stalled because it’s waiting for a result. In other words you can’t just write Recent-ish versions of the DDE include an application that does the modelling for you (A8time), accepting arbitrary code in ObjAsm format as input, and outputting an annotated diagram of what happens when. Chapter 7 of the Desktop Tools manual explains what it all means. As the name suggests, A8time models a Cortex-A8. I suspect fiddling round with instruction and register orders is a law of diminishing returns, and that some reordering is better than none, but to not worry too much beyond that. |
GavinWraith (26) 1563 posts |
Thanks for that. I had not really noticed A8time in the DDE. Very interesting. It certainly brings home the gulf between the early ARM CPUs and present-day ones. |
David J. Ruck (33) 1635 posts |
Given the wide range of ARM cores that RISC OS runs on, I would not worry about superscalar instruction ordering. Use the generic compiler options, and if you are still writing assembler by hand, aim for maintainability. Only if you are targetting one specific hardware platform considerer using the compiler tuning options for its CPU. Generally you are only going to have to worry about instruction level optimisation on the oldest slowest processors which have little or no superscalar capabilities (the awful XScale had so many additional delays compared to every other ARM, generic code or even StrongARM optimised code performed poorly). You don’t have to worry about it at all on the fastest cores now, as out-of-order execution allows the processor to re-order instructions in an optimal way for it’s internal architecture at run-time, which gives better results than doing it at compile time, even targetting a specific core. Unless your problem is unrealistically CPU bound pure computation, you are always going to get more gains in feeding the data to it in a way which avoids cache misses, than worrying about instruction ordering. |
Steve Drain (222) 1620 posts |
This is for Clive really, but I have been playing around with some code to implement actual bit arrays, not really BASIC, but very nearly. I am not absolutely sure I have got the offset calculation right, but my head hurts, so I though I would offer it for view at:: http://www.kappa.me.uk/Miscellaneous/swBitArray002.zipCalling it looks like: CALLbitDim:a`(),1,2,3 PRINT DIM(a`(),1) CALLbitSet:a`(),0,1,0 PRINT USRbitGet:(a`(),0,1,0) CALLbitReset:a`(),0,1,0 PRINT USRbitGet:(a`(),0,1,0) As always, comments are welcome. ;-) |
Clive Semmens (2335) 3276 posts |
I do believe you’re putting more effort into this than I am, Steve! For the moment I’m staying entirely in BASIC and living with the limited array size – I’ve got this horrid suspicion it’s all too slow however I do it if I try to do array sizes that won’t fit in my Pi at a byte per point.
I don’t have much hope – no, I don’t have any hope – of avoiding cache misses in these operations, more’s the pity. |
David J. Ruck (33) 1635 posts |
But by reducing the data size from integers to bytes, you are reducing the cache misses by a factor 4. If you use bits, you reduce the cache misses by a factor of 32. When dealing with large amounts of data, the more efficient use of cache far outweigh the more complex code to extract the correct bits. |
Clive Semmens (2335) 3276 posts |
Except that it’s not so much reducing the cache misses as increasing the cache hits – from almost exactly zero to 4 or 32 times almost exactly zero. I’m doing random accesses into arrays as big as available memory allows. Cache hits are extremely improbable. |
Steve Drain (222) 1620 posts |
Quite possibly, but I am enjoying it. Your challenge tied in with something I was fiddling with and gave me a point of focus. The code has advanced and has nearly all the checks and errors. It looks like this: PROCbitAssemble CALLbit:DIM a`(1,2,3) PRINT DIM(a`(),1) CALLbit:a`(0,1,0)=1 PRINT USRbit:a`(0,1,0) CALLbit:a`(0,1,0)=0 PRINT USRbit:a`(0,1,0) CALLbit:a`(0,1,0)=2:REM toggle PRINT USRbit:a`(0,1,0) It is at: http://www.kappa.me.uk/Miscellaneous/swBitArray003.zipByte arrays would only require small changes. ;-) |
Steve Drain (222) 1620 posts |
Having fixed acouple of bugs I have added byte arrays for anyone still interested. ;-) |