Writing simple stuff in C
nemo (145) 2556 posts |
Behold, part of the RO6 kernel: u128D8 MOV R8,#1 LDR R0,[R11,#-&034] STR R8,[R0,#0] MVN R0,#0 LDR R1,[R11,#-&034] STR R0,[R1,#4]! LDR R1,[R11,#-&034] STR R0,[R1,#8]! LDR R1,[R11,#-&034] STR R0,[R1,#&00C]! LDR R1,[R11,#-&034] STR R0,[R1,#&010]! LDR R1,[R11,#-&034] STR R0,[R1,#&014]! B u1291C Code emitted by compilers is generally regarded as ‘good enough’. So now you know what that looks like. It’s not just self-documenting, it appears! to! be! self! critiquing! too. |
Patrick M (2888) 126 posts |
Would you explain what the code is doing, and what’s remarkable about it, for someone who knows Basic but not ARM assembly language? I’m interested because I’m considering trying to learn ARM assembly language. |
GavinWraith (26) 1563 posts |
All but the first LDR instructions are redundant. The code could be rewritten as
This writes 1 to the address in R1 and -1 to the addresses at R1+4, R1+12, R1+8, R1+12, R1+16, R1+20. The compiler is so besotted with writeback (!) that it thinks it needs to reload R1 each time to undo it. Maybe the compiler was originally written for another processor? Is that the point you were making nemo? |
Rick Murray (539) 13851 posts |
As Gavin said (beat me to it), the cycle of instructions is basically:
Since the cycle is identical each time (well, R0 does change after the first write, but that’s not the big issue with that code), save for the offset, then it’s perfectly logical to write the code as Gavin did, reading the base address once (R11+&34) and then simply writing each bit of data without any writeback to mess with the value in R1. It’s incompetence. Incompetence in the compiler (surely Norcroft didn’t make this ugly mess?) and incompetence in the OS developers for thinking that “writing it in C” automatically makes it somehow “better” if this is the sort of code that is emitted. |
GavinWraith (26) 1563 posts |
Sorry, a small bug crept in there. The last instruction should have had writeback, i.e.
Alternatively the code could be
I do not know what compiler produced the original code. Writing a compiler is a difficult task because the programmers writing it have to express in formal language mental processes that may well be practically subconscious. Do people use AI, neural nets and suchlike, to write compilers? The compiler writer has to figure out now why did it produce that rubbish code when clearly it ought to have produced this instead . She needs to be a compiler psychologist. |
Willard Goosey (5119) 257 posts |
Honestly that looks like fairly typical compiler output to me. I agree it needs another pass through the optimizer. I’d guess a loop got “unrolled” and one of the initializers was in the body. :-( |
Jon Abbott (1421) 2651 posts |
It’s probably compiled exactly what was written in C, which wasn’t optimal in the first place. Although Compiler do sometimes output head scratching code, it’s hard to blame them without seeing the source to compare. |
Steve Fryatt (216) 2105 posts |
Are you sure? The former might be true (depending on what was fed into the compiler), but the latter isn’t. “Writing it in C” will almost always be the correct option when the alternative is assembler, as it’s more maintainable in the long term. I seem to remember folk claiming that Impression was “better” than Ovation Pro, because it was written in “hand-crafted ARM assembler” – a few years, and processor architectures, down the line, is that still correct? Just because one compiler has produced some dubious code from an unknown chunk of C, let’s not throw the baby out with the bathwater… |
Rick Murray (539) 13851 posts |
If it’s supposed to be APCS then there’s no writeback needed, otherwise one would need to see if R1 is used further on. There’s evidence in the code to suggest that the compiler is just doing writeback “because”.
For an entire project such as a DTP package (cough or an operating system cough) ought to be in C for exactly the reasons you say. But I’m assuming here we’re replacing something written in assembler with something worse written in C. Is that progress?
True – perhaps everything was declared as volatile? Otherwise it’s hard to understand why a compiler would miss such an obvious optimisation and, let’s face it, in this code the writeback is utterly useless. |
Steffen Huber (91) 1953 posts |
It might be. How do you know? There is not enough information to come to any well-founded conclusion. Anyway, if you improve the compiler or just use a better compiler, everything/something/hopefully anything written in the high level language is made better with zero additional effort. This will never happen with Assembler code. RISC OS is riddled with Assembler code where Nemo would surely say “nobody half-competent would ever write something THAT inefficient!” But it happened. |
GavinWraith (26) 1563 posts |
I think that some programmers can be lulled into a sloppy false sense of security by a compiler. Any decent C programmer should be examining the assembler output of her C code to see if it contains manifest stupidities, and then be going back to rewrite the relevant sections – in this particular case, as Willard suggested, lifting a fetch out of a loop. If the compiler output is looked upon as a suggestion – think of a dog wagging its tail with a lead in its mouth – rather than ultimate truth, then master can decide whether to go walkies or not. I am sure that with today’s CPUs the compiler will know far better than the human how to optimize some things. But one hopes that some compilers can be made to be more generous with commenting assembler output, to explain to the humble human just what they are doing. |
Rick Murray (539) 13851 posts |
I think it’s fair that some (many?) people write C code in order to not have to deal with assembler…?
I don’t. However since nemo said prior to the posting that it was a bit of the RO6 kernel I think it’s safe to assume either the code was in assembler or that it’s new and didn’t previously exist – and was then either (re)written in C or written by somebody who really really needed to work out what the writeback is for. ;-) |
Glen Walker (2585) 469 posts |
Guilty. |
Jeffrey Lee (213) 6048 posts |
This. Gavin (+ others) has made the assumption that the value at Explicitly loading |
GavinWraith (26) 1563 posts |
Guilty.
This is why it is vital to analyse scope and use local variables in one’s program wherever possible. BASIC unfortunately gets people into the bad habit of using global variables, because its notions of scope are rather limited, which might be the reason for the case in point. Choice of programming language affects not only the current problem, but the programmer’s habits of thought, and so future problems as well.
I think GCC, and many other compilers, use an intermediate code representation before generating the assembler instructions; just as many interpreters these days compile the program source to a bytecode representation as a first step (and BASIC does some tokenization). The intermediate code can, in principle, be a lot easier to understand than assembler; at least for checking that the algorithm is correct. GCC optimizes at both the intermediate level and the assembler level. I believe that it can be made to do clever things like interweaving the text of the program source with the output of intermediate code for checking purposes. This is what the !VMView utility does for RiscLua; though for large programs its output needs to be a bit cleverer – HTML might be an improvement. I think the obstacle for most people is probably not so much the obscurity of the ouput as ignorance of the abstract ideas employed by the compiler. I cannot resist quoting (from Roberto Ierusalimschy) the two maxims of program optimization:
|
nemo (145) 2556 posts |
I sometimes wonder if everyone who reads it gets Roberto’s joke. (His maxim is both a description of optimisation, and advice about optimisation) |
Andy S (2979) 504 posts |
Explicitly loading [R11,#-&034] into a local variable, or using attributes such as __restrict, should be enough for the compiler to produce more optimal output. I see the opposite of this all the time in modern OOP code. Coders seem to prefer writing calls to accessor functions (or “properties”) over and over again rather than introducing a local variable. I’m not sure if it’s an aesthetic choice, laziness, or something else. If the compiler can tell the result is constant, it should get optimised out, but I see it as a bad habit to get into because there are probably situations where it isn’t. for (int i = 0; i < Environment.GetPerson().GetTotalFootsteps(); i++) { distance += Environment.GetPerson().GetStrideLength(); } instead of: Person &person = Environment.GetPerson(); ... int strideLength = person.GetStrideLength(); int footsteps = person.GetTotalFootsteps(); for (int i = 0; i < footsteps; i++) { distance += strideLength; } I’ll ignore the fact that it was a silly example that could be solved with multiplication! Should I have marked strideLength and footsteps “const”? |
nemo (145) 2556 posts |
I suspect it’s the subliminal influence of the JavaScript You can’t fool me, it’s dots all the way down.
Depends on how bad the compiler is, how time-critical the code is, how large the maintenance team, amongst other things. The former pattern describes how the programmer is thinking of the concepts. The latter pattern describes how the compiler author wished programmers thought of concepts. |
Rick Murray (539) 13851 posts |
True. However if the code is exactly as is given, there are no function calls or anything, so it would be extremely dangerous to write data to a pointer liable to changing behind the back of the program (note that there is no attempt to sanitise the value read from R11-&34) given that if this were the case, it would likely execute entirely differently on every single ARM that RISC OS runs upon, not to mention probably differently every single time it is executed (depending upon the source of whatever it is that may change this value, and if it is exactly nanosecond-synchronised to the processor’s behaviour). With this in mind, R11-&34 being a constant is not an unreasonable assumption. Now for another assumption. I’m (wild) guessing looking at the code that we’re setting up some sort of struct with 1,-1,-1(etc). Okay then:
Gives us a very tight:
That’s some clever code there, and note the nice use of writeback in the first STR. Let’s try an inline definition:
Gives us… ahem…
However, the data segment now reads:
I just knew that Norcroft was better than to output such awful code. I can’t speak for GCC, but it would surprise me if it wasn’t better also.
|
Jeffrey Lee (213) 6048 posts |
Oh, Rick. You try so hard and yet you try so little. #include <stdio.h> typedef struct { unsigned int one; unsigned int two; unsigned int three; unsigned int four; unsigned int five; unsigned int six; } mehdef; mehdef *meh; int main(int argc,char **argv) { meh->one = 1; meh->two = -1; meh->three = -1; meh->four = -1; meh->five = -1; meh->six = -1; return 0; } ; generated by Norcroft RISC OS ARM C vsn 5.76 [19 Mar 2018] AREA |C$$code|, CODE, READONLY |x$codeseg| DATA main LDR a1,[pc, #L000044-.-8] MOV a2,#1 LDR a3,[a1] STR a2,[a3] LDR a3,[a1] MVN a2,#0 STR a2,[a3,#4]! LDR a3,[a1] STR a2,[a3,#8]! LDR a4,[a1] STR a2,[a4,#&c]! LDR a3,[a1] STR a2,[a3,#&10]! LDR a3,[a1] MOV a1,#0 STR a2,[a3,#&14]! MOV pc,lr L000044 DCD |x$dataseg| AREA |C$$data|,DATA |x$dataseg| meh DCD &00000000 EXPORT main EXPORT meh END |
Rick Murray (539) 13851 posts |
Yup. Indirected. I may have gotten there. I was sidetracked trying to get something useful out of GCC (that I’ve never used before). My result, with indirection, matches yours. Now that is some messed up code. :-) I could (almost) understand the paranoia with loading the base address over and over. But the writeback? That said, my first attempt ended with an apparently useless writeback (the But not as weird as GCC :-)
|
Jeffrey Lee (213) 6048 posts |
Make sure you have optimisation enabled (it’s disabled by default, for insert-historical-reason-here). As long as you have optimisation enabled, you have to try very hard to get GCC to output code which is as bad as Norcroft. Probably because GCC attempts to take full advantage of the “strict aliasing” rules, whereas Norcroft doesn’t. Strict aliasing (introduced in C99 I think?) basically says “two objects of different data types won’t occupy the same memory”, and so allows the compiler to much more easily make decisions about what cached values may need discarding following a memory write. |
Glen Walker (2585) 469 posts |
This is slightly off topic, but… Is the Norcroft compiler going to get fixed/upgraded/replaced or are we going to eventually move over to GCC? I really like the way source/header files on RISC OS whereas GCC doesn’t seem as integrated – but obviously if its producing better assembler code then we should be using it right? |
Dave Higton (1515) 3534 posts |
People, please… Code only has to be good enough. ISTM that there are people here with the mindset of the 1970s and early 1980s, where every byte and every microsecond made a difference. In most cases they don’t matter today. (Lest anyone should misrepresent me, I’m not praising bad code.) It’s much more important to write maintainable code. As such, C wins over assembly language. New architecture comes along: recompile with the new version of the compiler. |
Rick Murray (539) 13851 posts |
That’s quite true; but in our defence (us people of the 70s/80s who did count cycles and bits used), the great amounts of memory available and fast storage mean that some utterly horrendous things get done… and we have to moan about it to each other, because kids these days… they just wouldn’t understand… |