RISC OS Open: Forum: Writing simple stuff in C

Jul 1, 2018 10:39pm

nemo (145) 2554 posts

Behold, part of the RO6 kernel:

u128D8  MOV  R8,#1
        LDR  R0,[R11,#-&034]
        STR  R8,[R0,#0]

        MVN  R0,#0
        LDR  R1,[R11,#-&034]
        STR  R0,[R1,#4]!

        LDR  R1,[R11,#-&034]
        STR  R0,[R1,#8]!

        LDR  R1,[R11,#-&034]
        STR  R0,[R1,#&00C]!

        LDR  R1,[R11,#-&034]
        STR  R0,[R1,#&010]!

        LDR  R1,[R11,#-&034]
        STR  R0,[R1,#&014]!

        B    u1291C

Code emitted by compilers is generally regarded as ‘good enough’. So now you know what that looks like.

It’s not just self-documenting, it appears! to! be! self! critiquing! too.

Jul 2, 2018 12:58am

Patrick M (2888) 126 posts

Would you explain what the code is doing, and what’s remarkable about it, for someone who knows Basic but not ARM assembly language? I’m interested because I’m considering trying to learn ARM assembly language.

Jul 2, 2018 5:41am

GavinWraith (26) 1563 posts

All but the first LDR instructions are redundant. The code could be rewritten as

u128D8 LDR R1,[R11,#-&034] 
       MOV R0,#1
       STR R0,[R1,#0]
       MVN R0,#0
       STR R0,[R1,#4]
       STR R0,[R1,#8]
       STR R0,[R1,#&00C]
       STR R0,[R1,#&010]
       STR R0,[R1,#&014]
       B   u1291C

This writes 1 to the address in R1 and -1 to the addresses at R1+4, R1+12, R1+8, R1+12, R1+16, R1+20. The compiler is so besotted with writeback (!) that it thinks it needs to reload R1 each time to undo it. Maybe the compiler was originally written for another processor? Is that the point you were making nemo?

Jul 2, 2018 5:51am

Rick Murray (539) 13850 posts

As Gavin said (beat me to it), the cycle of instructions is basically:

Load an address from R11-&34 into R1.
Write the value in R0 to R1+offset.
Write back the updated address into R1.

Since the cycle is identical each time (well, R0 does change after the first write, but that’s not the big issue with that code), save for the offset, then it’s perfectly logical to write the code as Gavin did, reading the base address once (R11+&34) and then simply writing each bit of data without any writeback to mess with the value in R1.

It’s incompetence. Incompetence in the compiler (surely Norcroft didn’t make this ugly mess?) and incompetence in the OS developers for thinking that “writing it in C” automatically makes it somehow “better” if this is the sort of code that is emitted.

Jul 2, 2018 5:59am

GavinWraith (26) 1563 posts

Sorry, a small bug crept in there. The last instruction should have had writeback, i.e.

      STR R0, [R1,#&014]!

Alternatively the code could be

u128D8 LDR,[R11,#-&034]
       MOV R0,#1
       STR R0,[R1],#4
       MVN R0,#0
       STR R0,[R1],#4
       STR R0,[R1],#4
       STR R0,[R1],#4
       STR R0,[R1],#4
       STR R0,[R1],#4
       B   u1291C?

I do not know what compiler produced the original code. Writing a compiler is a difficult task because the programmers writing it have to express in formal language mental processes that may well be practically subconscious. Do people use AI, neural nets and suchlike, to write compilers? The compiler writer has to figure out now why did it produce that rubbish code when clearly it ought to have produced this instead . She needs to be a compiler psychologist.

Jul 2, 2018 6:34am

Willard Goosey (5119) 257 posts

Honestly that looks like fairly typical compiler output to me. I agree it needs another pass through the optimizer. I’d guess a loop got “unrolled” and one of the initializers was in the body. :-(

Jul 2, 2018 6:59am

Jon Abbott (1421) 2651 posts

It’s probably compiled exactly what was written in C, which wasn’t optimal in the first place. Although Compiler do sometimes output head scratching code, it’s hard to blame them without seeing the source to compare.

Jul 2, 2018 7:24am

Steve Fryatt (216) 2105 posts

It’s incompetence. Incompetence in the compiler (surely Norcroft didn’t make this ugly mess?) and incompetence in the OS developers for thinking that “writing it in C” automatically makes it somehow “better” if this is the sort of code that is emitted.

Are you sure?

The former might be true (depending on what was fed into the compiler), but the latter isn’t. “Writing it in C” will almost always be the correct option when the alternative is assembler, as it’s more maintainable in the long term. I seem to remember folk claiming that Impression was “better” than Ovation Pro, because it was written in “hand-crafted ARM assembler” – a few years, and processor architectures, down the line, is that still correct?

Just because one compiler has produced some dubious code from an unknown chunk of C, let’s not throw the baby out with the bathwater…

Jul 2, 2018 7:51am

Rick Murray (539) 13850 posts

The last instruction should have had writeback,

If it’s supposed to be APCS then there’s no writeback needed, otherwise one would need to see if R1 is used further on. There’s evidence in the code to suggest that the compiler is just doing writeback “because”.

Are you sure?

For an entire project such as a DTP package (cough or an operating system cough) ought to be in C for exactly the reasons you say. But I’m assuming here we’re replacing something written in assembler with something worse written in C. Is that progress?

let’s not throw the baby out with the bathwater…

True – perhaps everything was declared as volatile? Otherwise it’s hard to understand why a compiler would miss such an obvious optimisation and, let’s face it, in this code the writeback is utterly useless.

Jul 2, 2018 10:36am

Steffen Huber (91) 1953 posts

I’m assuming here we’re replacing something written in assembler with something worse written in C. Is that progress?

It might be. How do you know? There is not enough information to come to any well-founded conclusion.

Anyway, if you improve the compiler or just use a better compiler, everything/something/hopefully anything written in the high level language is made better with zero additional effort. This will never happen with Assembler code. RISC OS is riddled with Assembler code where Nemo would surely say “nobody half-competent would ever write something THAT inefficient!” But it happened.

Jul 2, 2018 11:00am

GavinWraith (26) 1563 posts

I think that some programmers can be lulled into a sloppy false sense of security by a compiler. Any decent C programmer should be examining the assembler output of her C code to see if it contains manifest stupidities, and then be going back to rewrite the relevant sections – in this particular case, as Willard suggested, lifting a fetch out of a loop. If the compiler output is looked upon as a suggestion – think of a dog wagging its tail with a lead in its mouth – rather than ultimate truth, then master can decide whether to go walkies or not. I am sure that with today’s CPUs the compiler will know far better than the human how to optimize some things. But one hopes that some compilers can be made to be more generous with commenting assembler output, to explain to the humble human just what they are doing.

Jul 2, 2018 11:55am

Rick Murray (539) 13850 posts

Any decent C programmer should be examining the assembler output of her C code

I think it’s fair that some (many?) people write C code in order to not have to deal with assembler…?

How do you know?

I don’t. However since nemo said prior to the posting that it was a bit of the RO6 kernel I think it’s safe to assume either the code was in assembler or that it’s new and didn’t previously exist – and was then either (re)written in C or written by somebody who really really needed to work out what the writeback is for. ;-)

Jul 2, 2018 12:00pm

Glen Walker (2585) 469 posts

I think it’s fair that some (many?) people write C code in order to not have to deal with assembler

Guilty.

Jul 2, 2018 12:24pm

Jeffrey Lee (213) 6048 posts

It’s probably compiled exactly what was written in C, which wasn’t optimal in the first place. Although Compiler do sometimes output head scratching code, it’s hard to blame them without seeing the source to compare.

This.

Gavin (+ others) has made the assumption that the value at [R11,#-&034] is constant. While this assumption is probably true, most C compilers aren’t sophisticated enough to be able to determine this for themselves, so they’ll typically err on the side of caution and invalidate any values which are cached in registers when a memory write or function call occurs.

Explicitly loading [R11,#-&034] into a local variable, or using attributes such as __restrict, should be enough for the compiler to produce more optimal output.

Jul 2, 2018 12:32pm

GavinWraith (26) 1563 posts

Gavin (+ others) has made the assumption that the value at [R11,#-&034] is constant.

Guilty.

Explicitly loading [R11,#-&034] into a local variable, or using attributes such as __restrict, should be enough for the compiler to produce more optimal output.

This is why it is vital to analyse scope and use local variables in one’s program wherever possible. BASIC unfortunately gets people into the bad habit of using global variables, because its notions of scope are rather limited, which might be the reason for the case in point. Choice of programming language affects not only the current problem, but the programmer’s habits of thought, and so future problems as well.

I think it’s fair to say that some (many?) people write C code in order to not have to deal with assembler.

I think GCC, and many other compilers, use an intermediate code representation before generating the assembler instructions; just as many interpreters these days compile the program source to a bytecode representation as a first step (and BASIC does some tokenization). The intermediate code can, in principle, be a lot easier to understand than assembler; at least for checking that the algorithm is correct. GCC optimizes at both the intermediate level and the assembler level. I believe that it can be made to do clever things like interweaving the text of the program source with the output of intermediate code for checking purposes. This is what the !VMView utility does for RiscLua; though for large programs its output needs to be a bit cleverer – HTML might be an improvement.

I think the obstacle for most people is probably not so much the obscurity of the ouput as ignorance of the abstract ideas employed by the compiler. I cannot resist quoting (from Roberto Ierusalimschy) the two maxims of program optimization:

Rule #1: Don’t do it.
Rule #2: Don’t do it yet. (for experts only)

Jul 2, 2018 12:46pm

nemo (145) 2554 posts

I sometimes wonder if everyone who reads it gets Roberto’s joke.

(His maxim is both a description of optimisation, and advice about optimisation)

Jul 2, 2018 1:12pm

Andy S (2979) 504 posts

Explicitly loading [R11,#-&034] into a local variable, or using attributes such as __restrict, should be enough for the compiler to produce more optimal output.

I see the opposite of this all the time in modern OOP code. Coders seem to prefer writing calls to accessor functions (or “properties”) over and over again rather than introducing a local variable. I’m not sure if it’s an aesthetic choice, laziness, or something else. If the compiler can tell the result is constant, it should get optimised out, but I see it as a bad habit to get into because there are probably situations where it isn’t.

for (int i = 0; i < Environment.GetPerson().GetTotalFootsteps(); i++)
{
  distance += Environment.GetPerson().GetStrideLength();
}

instead of:

Person &person = Environment.GetPerson();
...
int strideLength = person.GetStrideLength();
int footsteps = person.GetTotalFootsteps();

for (int i = 0; i < footsteps; i++)
{
  distance += strideLength;
}

I’ll ignore the fact that it was a silly example that could be solved with multiplication! Should I have marked strideLength and footsteps “const”?

I think modern coders don’t like the fact that the second version needs more typing and more lines of code. I’ve seen more extreme examples than this out in the wild. The phobia of local variables seems very strong.

Any thoughts on which is better?

Jul 2, 2018 1:50pm

nemo (145) 2554 posts

I suspect it’s the subliminal influence of the JavaScript return this; chaining construction.

You can’t fool me, it’s dots all the way down.

which is better

Depends on how bad the compiler is, how time-critical the code is, how large the maintenance team, amongst other things.

The former pattern describes how the programmer is thinking of the concepts.

The latter pattern describes how the compiler author wished programmers thought of concepts.

Jul 2, 2018 5:39pm

Rick Murray (539) 13850 posts

Gavin (+ others) has made the assumption that the value at [R11,#-&034] is constant.

True. However if the code is exactly as is given, there are no function calls or anything, so it would be extremely dangerous to write data to a pointer liable to changing behind the back of the program (note that there is no attempt to sanitise the value read from R11-&34) given that if this were the case, it would likely execute entirely differently on every single ARM that RISC OS runs upon, not to mention probably differently every single time it is executed (depending upon the source of whatever it is that may change this value, and if it is exactly nanosecond-synchronised to the processor’s behaviour).

With this in mind, R11-&34 being a constant is not an unreasonable assumption.

Now for another assumption. I’m (wild) guessing looking at the code that we’re setting up some sort of struct with 1,-1,-1(etc).

Okay then:

#include <stdio.h>

typedef struct
{
   unsigned int one;
   unsigned int two;
   unsigned int three;
   unsigned int four;
   unsigned int five;
   unsigned int six;
} mehdef;

volatile mehdef meh;

int  main(int argc, char *argv[])
{
   meh.one = 1;
   meh.two = -1;
   meh.three = -1;
   meh.four = -1;
   meh.five = -1;
   meh.six = -1;

   return 0;
}

Gives us a very tight:

main
        LDR      a2,[pc, #L000030-.-8]
        LDR      a3,[sl,#-0]
        MOV      a1,#1
        STR      a1,[a3,a2]!
        MVN      a4,#0
        STR      a4,[a3,#4]
        STR      a4,[a3,#8]
        STR      a4,[a3,#&c]
        STR      a4,[a3,#&10]
        STR      a4,[a3,#&14]!
        MOV      a1,#0
        MOV      pc,lr
L000030
        DCD     meh    ; points to "meh" in the data segment

That’s some clever code there, and note the nice use of writeback in the first STR.

Let’s try an inline definition:

#include <stdio.h>

typedef struct
{
   unsigned int one;
   unsigned int two;
   unsigned int three;
   unsigned int four;
   unsigned int five;
   unsigned int six;
} mehdef;

volatile mehdef meh = { 1, -1, -1, -1, -1, -1 };


int  main(int argc, char *argv[])
{
   return 0;
}

Gives us… ahem…

main
        MOV      a1,#0
        MOV      pc,lr

However, the data segment now reads:

meh
        DCD     &00000001
        DCD     &ffffffff
        DCD     &ffffffff
        DCD     &ffffffff
        DCD     &ffffffff
        DCD     &ffffffff

I just knew that Norcroft was better than to output such awful code. I can’t speak for GCC, but it would surprise me if it wasn’t better also.

~~As a consequence, I’m starting to get a horrible feeling that this code just might have been written by hand…~~ Nope! It’s Norcroft. Jeffrey explains…

Jul 2, 2018 5:56pm

Jeffrey Lee (213) 6048 posts

Oh, Rick. You try so hard and yet you try so little.

#include <stdio.h>

typedef struct
{
  unsigned int one;
  unsigned int two;
  unsigned int three;
  unsigned int four;
  unsigned int five;
  unsigned int six;
} mehdef;

mehdef *meh;

int main(int argc,char **argv)
{
  meh->one = 1;
  meh->two = -1;
  meh->three = -1;
  meh->four = -1;
  meh->five = -1;
  meh->six = -1;
  
  return 0;
}

; generated by Norcroft RISC OS ARM C vsn 5.76 [19 Mar 2018]

        AREA |C$$code|, CODE, READONLY
|x$codeseg| DATA

main
        LDR      a1,[pc, #L000044-.-8]
        MOV      a2,#1
        LDR      a3,[a1]
        STR      a2,[a3]
        LDR      a3,[a1]
        MVN      a2,#0
        STR      a2,[a3,#4]!
        LDR      a3,[a1]
        STR      a2,[a3,#8]!
        LDR      a4,[a1]
        STR      a2,[a4,#&c]!
        LDR      a3,[a1]
        STR      a2,[a3,#&10]!
        LDR      a3,[a1]
        MOV      a1,#0
        STR      a2,[a3,#&14]!
        MOV      pc,lr
L000044
        DCD     |x$dataseg|

        AREA |C$$data|,DATA

|x$dataseg|

meh
        DCD     &00000000

        EXPORT main
        EXPORT meh

        END

Jul 2, 2018 6:26pm

Rick Murray (539) 13850 posts

You try so hard and yet you try so little.

Yup. Indirected.

I may have gotten there. I was sidetracked trying to get something useful out of GCC (that I’ve never used before).

My result, with indirection, matches yours.

Now that is some messed up code. :-) I could (almost) understand the paranoia with loading the base address over and over. But the writeback? That said, my first attempt ended with an apparently useless writeback (the STR a4,[a3,#&14]!), so I’m wondering if this is just what Norcroft does with the register as a sort of sign-off? I’m done here, writeback? Still weird.

But not as weird as GCC :-)

        .file   "test3.c"
        .local  meh
        .comm   meh,4,4
        .text
        .align  2
        .global main
        .ascii  "main\000"
        .align  2
        .word   4278190088
        .type   main, %function
main:
        @ args = 0, pretend = 0, frame = 8, outgoing = 0
        @ frame_needed = 1, uses_anonymous_args = 0
        mov     ip, sp
        stmfd   sp!, {r9, fp, ip, lr, pc}
        sub     fp, ip, #4
        cmp     sp, sl
        bllt    __rt_stkovf_split_small
        sub     sp, sp, #8
        mov     r9, sp
        str     r0, [r9, #0]
        str     r1, [r9, #4]
        ldr     r3, .L3
        ldr     r3, [r3, #0]
        mov     r2, #1
        str     r2, [r3, #0]
        ldr     r3, .L3
        ldr     r3, [r3, #0]
        mvn     r2, #0
        str     r2, [r3, #4]
        ldr     r3, .L3
        ldr     r3, [r3, #0]
        mvn     r2, #0
        str     r2, [r3, #8]
        ldr     r3, .L3
        ldr     r3, [r3, #0]
        mvn     r2, #0
        str     r2, [r3, #12]
        ldr     r3, .L3
        ldr     r3, [r3, #0]
        mvn     r2, #0
        str     r2, [r3, #16]
        ldr     r3, .L3
        ldr     r3, [r3, #0]
        mvn     r2, #0
        str     r2, [r3, #20]
        mov     r3, #0
        mov     r0, r3
        ldmea   fp, {r9, fp, sp, pc}
.L4:
        .align  2
.L3:
        .word   meh
        .size   main, .-main
        .ident  "GCC: (GCCSDK GCC 4.7.4 Release 3) 4.7.4"

Jul 2, 2018 6:38pm

Jeffrey Lee (213) 6048 posts

But not as weird as GCC :-)

Make sure you have optimisation enabled (it’s disabled by default, for insert-historical-reason-here).

As long as you have optimisation enabled, you have to try very hard to get GCC to output code which is as bad as Norcroft. Probably because GCC attempts to take full advantage of the “strict aliasing” rules, whereas Norcroft doesn’t.

Strict aliasing (introduced in C99 I think?) basically says “two objects of different data types won’t occupy the same memory”, and so allows the compiler to much more easily make decisions about what cached values may need discarding following a memory write.

Jul 2, 2018 7:56pm

Glen Walker (2585) 469 posts

you have to try very hard to get GCC to output code which is as bad as Norcroft

This is slightly off topic, but…

Is the Norcroft compiler going to get fixed/upgraded/replaced or are we going to eventually move over to GCC? I really like the way source/header files on RISC OS whereas GCC doesn’t seem as integrated – but obviously if its producing better assembler code then we should be using it right?

Jul 2, 2018 8:34pm

Dave Higton (1515) 3534 posts

People, please…

Code only has to be good enough.

ISTM that there are people here with the mindset of the 1970s and early 1980s, where every byte and every microsecond made a difference. In most cases they don’t matter today.

(Lest anyone should misrepresent me, I’m not praising bad code.)

It’s much more important to write maintainable code. As such, C wins over assembly language. New architecture comes along: recompile with the new version of the compiler.

Jul 2, 2018 8:59pm

Rick Murray (539) 13850 posts

In most cases they don’t matter today.

That’s quite true; but in our defence (us people of the 70s/80s who did count cycles and bits used), the great amounts of memory available and fast storage mean that some utterly horrendous things get done… and we have to moan about it to each other, because kids these days… they just wouldn’t understand…

Writing simple stuff in C

Reply

Search forums

Social

ROOL Store

Donate! Why?

RISC OS IPR

Description

Voices

Options

Jul 1, 2018 10:39pm nemo (145) 2554 posts	Behold, part of the RO6 kernel: u128D8 MOV R8,#1 LDR R0,[R11,#-&034] STR R8,[R0,#0] MVN R0,#0 LDR R1,[R11,#-&034] STR R0,[R1,#4]! LDR R1,[R11,#-&034] STR R0,[R1,#8]! LDR R1,[R11,#-&034] STR R0,[R1,#&00C]! LDR R1,[R11,#-&034] STR R0,[R1,#&010]! LDR R1,[R11,#-&034] STR R0,[R1,#&014]! B u1291C Code emitted by compilers is generally regarded as ‘good enough’. So now you know what that looks like. It’s not just self-documenting, it appears! to! be! self! critiquing! too.

Jul 2, 2018 12:58am Patrick M (2888) 126 posts	Would you explain what the code is doing, and what’s remarkable about it, for someone who knows Basic but not ARM assembly language? I’m interested because I’m considering trying to learn ARM assembly language.

Jul 2, 2018 5:41am GavinWraith (26) 1563 posts	All but the first LDR instructions are redundant. The code could be rewritten as `u128D8 LDR R1,[R11,#-&034] MOV R0,#1 STR R0,[R1,#0] MVN R0,#0 STR R0,[R1,#4] STR R0,[R1,#8] STR R0,[R1,#&00C] STR R0,[R1,#&010] STR R0,[R1,#&014] B u1291C` This writes 1 to the address in R1 and -1 to the addresses at R1+4, R1+12, R1+8, R1+12, R1+16, R1+20. The compiler is so besotted with writeback (!) that it thinks it needs to reload R1 each time to undo it. Maybe the compiler was originally written for another processor? Is that the point you were making nemo?

Jul 2, 2018 5:51am Rick Murray (539) 13850 posts	As Gavin said (beat me to it), the cycle of instructions is basically: Load an address from R11-&34 into R1. Write the value in R0 to R1+offset. Write back the updated address into R1. Since the cycle is identical each time (well, R0 does change after the first write, but that’s not the big issue with that code), save for the offset, then it’s perfectly logical to write the code as Gavin did, reading the base address once (R11+&34) and then simply writing each bit of data without any writeback to mess with the value in R1. It’s incompetence. Incompetence in the compiler (surely Norcroft didn’t make this ugly mess?) and incompetence in the OS developers for thinking that “writing it in C” automatically makes it somehow “better” if this is the sort of code that is emitted.

Jul 2, 2018 5:59am GavinWraith (26) 1563 posts	Sorry, a small bug crept in there. The last instruction should have had writeback, i.e. `STR R0, [R1,#&014]!` Alternatively the code could be `u128D8 LDR,[R11,#-&034] MOV R0,#1 STR R0,[R1],#4 MVN R0,#0 STR R0,[R1],#4 STR R0,[R1],#4 STR R0,[R1],#4 STR R0,[R1],#4 STR R0,[R1],#4 B u1291C`? I do not know what compiler produced the original code. Writing a compiler is a difficult task because the programmers writing it have to express in formal language mental processes that may well be practically subconscious. Do people use AI, neural nets and suchlike, to write compilers? The compiler writer has to figure out now why did it produce that rubbish code when clearly it ought to have produced this instead . She needs to be a compiler psychologist.

Jul 2, 2018 6:34am Willard Goosey (5119) 257 posts	Honestly that looks like fairly typical compiler output to me. I agree it needs another pass through the optimizer. I’d guess a loop got “unrolled” and one of the initializers was in the body. :-(

Jul 2, 2018 6:59am Jon Abbott (1421) 2651 posts	It’s probably compiled exactly what was written in C, which wasn’t optimal in the first place. Although Compiler do sometimes output head scratching code, it’s hard to blame them without seeing the source to compare.

Jul 2, 2018 7:24am Steve Fryatt (216) 2105 posts	It’s incompetence. Incompetence in the compiler (surely Norcroft didn’t make this ugly mess?) and incompetence in the OS developers for thinking that “writing it in C” automatically makes it somehow “better” if this is the sort of code that is emitted. Are you sure? The former might be true (depending on what was fed into the compiler), but the latter isn’t. “Writing it in C” will almost always be the correct option when the alternative is assembler, as it’s more maintainable in the long term. I seem to remember folk claiming that Impression was “better” than Ovation Pro, because it was written in “hand-crafted ARM assembler” – a few years, and processor architectures, down the line, is that still correct? Just because one compiler has produced some dubious code from an unknown chunk of C, let’s not throw the baby out with the bathwater…

Jul 2, 2018 7:51am Rick Murray (539) 13850 posts	The last instruction should have had writeback, If it’s supposed to be APCS then there’s no writeback needed, otherwise one would need to see if R1 is used further on. There’s evidence in the code to suggest that the compiler is just doing writeback “because”. Are you sure? For an entire project such as a DTP package (cough or an operating system cough) ought to be in C for exactly the reasons you say. But I’m assuming here we’re replacing something written in assembler with something worse written in C. Is that progress? let’s not throw the baby out with the bathwater… True – perhaps everything was declared as volatile? Otherwise it’s hard to understand why a compiler would miss such an obvious optimisation and, let’s face it, in this code the writeback is utterly useless.

Jul 2, 2018 10:36am Steffen Huber (91) 1953 posts	I’m assuming here we’re replacing something written in assembler with something worse written in C. Is that progress? It might be. How do you know? There is not enough information to come to any well-founded conclusion. Anyway, if you improve the compiler or just use a better compiler, everything/something/hopefully anything written in the high level language is made better with zero additional effort. This will never happen with Assembler code. RISC OS is riddled with Assembler code where Nemo would surely say “nobody half-competent would ever write something THAT inefficient!” But it happened.

Jul 2, 2018 11:00am GavinWraith (26) 1563 posts	I think that some programmers can be lulled into a sloppy false sense of security by a compiler. Any decent C programmer should be examining the assembler output of her C code to see if it contains manifest stupidities, and then be going back to rewrite the relevant sections – in this particular case, as Willard suggested, lifting a fetch out of a loop. If the compiler output is looked upon as a suggestion – think of a dog wagging its tail with a lead in its mouth – rather than ultimate truth, then master can decide whether to go walkies or not. I am sure that with today’s CPUs the compiler will know far better than the human how to optimize some things. But one hopes that some compilers can be made to be more generous with commenting assembler output, to explain to the humble human just what they are doing.

Jul 2, 2018 11:55am Rick Murray (539) 13850 posts	Any decent C programmer should be examining the assembler output of her C code I think it’s fair that some (many?) people write C code in order to not have to deal with assembler…? How do you know? I don’t. However since nemo said prior to the posting that it was a bit of the RO6 kernel I think it’s safe to assume either the code was in assembler or that it’s new and didn’t previously exist – and was then either (re)written in C or written by somebody who really really needed to work out what the writeback is for. ;-)

Jul 2, 2018 12:00pm Glen Walker (2585) 469 posts	I think it’s fair that some (many?) people write C code in order to not have to deal with assembler Guilty.

Jul 2, 2018 12:24pm Jeffrey Lee (213) 6048 posts	It’s probably compiled exactly what was written in C, which wasn’t optimal in the first place. Although Compiler do sometimes output head scratching code, it’s hard to blame them without seeing the source to compare. This. Gavin (+ others) has made the assumption that the value at `[R11,#-&034]` is constant. While this assumption is probably true, most C compilers aren’t sophisticated enough to be able to determine this for themselves, so they’ll typically err on the side of caution and invalidate any values which are cached in registers when a memory write or function call occurs. Explicitly loading `[R11,#-&034]` into a local variable, or using attributes such as `__restrict`, should be enough for the compiler to produce more optimal output.

Jul 2, 2018 12:32pm GavinWraith (26) 1563 posts	Gavin (+ others) has made the assumption that the value at [R11,#-&034] is constant. Guilty. Explicitly loading [R11,#-&034] into a local variable, or using attributes such as __restrict, should be enough for the compiler to produce more optimal output. This is why it is vital to analyse scope and use local variables in one’s program wherever possible. BASIC unfortunately gets people into the bad habit of using global variables, because its notions of scope are rather limited, which might be the reason for the case in point. Choice of programming language affects not only the current problem, but the programmer’s habits of thought, and so future problems as well. I think it’s fair to say that some (many?) people write C code in order to not have to deal with assembler. I think GCC, and many other compilers, use an intermediate code representation before generating the assembler instructions; just as many interpreters these days compile the program source to a bytecode representation as a first step (and BASIC does some tokenization). The intermediate code can, in principle, be a lot easier to understand than assembler; at least for checking that the algorithm is correct. GCC optimizes at both the intermediate level and the assembler level. I believe that it can be made to do clever things like interweaving the text of the program source with the output of intermediate code for checking purposes. This is what the !VMView utility does for RiscLua; though for large programs its output needs to be a bit cleverer – HTML might be an improvement. I think the obstacle for most people is probably not so much the obscurity of the ouput as ignorance of the abstract ideas employed by the compiler. I cannot resist quoting (from Roberto Ierusalimschy) the two maxims of program optimization: Rule #1: Don’t do it. Rule #2: Don’t do it yet. (for experts only)

Jul 2, 2018 12:46pm nemo (145) 2554 posts	I sometimes wonder if everyone who reads it gets Roberto’s joke. (His maxim is both a description of optimisation, and advice about optimisation)

Jul 2, 2018 1:12pm Andy S (2979) 504 posts	Explicitly loading [R11,#-&034] into a local variable, or using attributes such as __restrict, should be enough for the compiler to produce more optimal output. I see the opposite of this all the time in modern OOP code. Coders seem to prefer writing calls to accessor functions (or “properties”) over and over again rather than introducing a local variable. I’m not sure if it’s an aesthetic choice, laziness, or something else. If the compiler can tell the result is constant, it should get optimised out, but I see it as a bad habit to get into because there are probably situations where it isn’t. for (int i = 0; i < Environment.GetPerson().GetTotalFootsteps(); i++) { distance += Environment.GetPerson().GetStrideLength(); } instead of: Person &person = Environment.GetPerson(); ... int strideLength = person.GetStrideLength(); int footsteps = person.GetTotalFootsteps(); for (int i = 0; i < footsteps; i++) { distance += strideLength; } I’ll ignore the fact that it was a silly example that could be solved with multiplication! Should I have marked strideLength and footsteps “const”? I think modern coders don’t like the fact that the second version needs more typing and more lines of code. I’ve seen more extreme examples than this out in the wild. The phobia of local variables seems very strong. Any thoughts on which is better?

Jul 2, 2018 1:50pm nemo (145) 2554 posts	I suspect it’s the subliminal influence of the JavaScript `return this;` chaining construction. You can’t fool me, it’s dots all the way down. which is better Depends on how bad the compiler is, how time-critical the code is, how large the maintenance team, amongst other things. The former pattern describes how the programmer is thinking of the concepts. The latter pattern describes how the compiler author wished programmers thought of concepts.

Jul 2, 2018 5:39pm Rick Murray (539) 13850 posts	Gavin (+ others) has made the assumption that the value at `[R11,#-&034]` is constant. True. However if the code is exactly as is given, there are no function calls or anything, so it would be extremely dangerous to write data to a pointer liable to changing behind the back of the program (note that there is no attempt to sanitise the value read from R11-&34) given that if this were the case, it would likely execute entirely differently on every single ARM that RISC OS runs upon, not to mention probably differently every single time it is executed (depending upon the source of whatever it is that may change this value, and if it is exactly nanosecond-synchronised to the processor’s behaviour). With this in mind, R11-&34 being a constant is not an unreasonable assumption. Now for another assumption. I’m (wild) guessing looking at the code that we’re setting up some sort of struct with 1,-1,-1(etc). Okay then: `#include <stdio.h> typedef struct { unsigned int one; unsigned int two; unsigned int three; unsigned int four; unsigned int five; unsigned int six; } mehdef; volatile mehdef meh; int main(int argc, char argv[]) { meh.one = 1; meh.two = -1; meh.three = -1; meh.four = -1; meh.five = -1; meh.six = -1; return 0; }` Gives us a very tight: `main LDR a2,[pc, #L000030-.-8] LDR a3,[sl,#-0] MOV a1,#1 STR a1,[a3,a2]! MVN a4,#0 STR a4,[a3,#4] STR a4,[a3,#8] STR a4,[a3,#&c] STR a4,[a3,#&10] STR a4,[a3,#&14]! MOV a1,#0 MOV pc,lr L000030 DCD meh ; points to "meh" in the data segment` That’s some clever code there, and note the nice use of writeback in the first STR. Let’s try an inline definition: `#include <stdio.h> typedef struct { unsigned int one; unsigned int two; unsigned int three; unsigned int four; unsigned int five; unsigned int six; } mehdef; volatile mehdef meh = { 1, -1, -1, -1, -1, -1 }; int main(int argc, char argv[]) { return 0; }` Gives us… ahem… `main MOV a1,#0 MOV pc,lr` However, the data segment now reads: `meh DCD &00000001 DCD &ffffffff DCD &ffffffff DCD &ffffffff DCD &ffffffff DCD &ffffffff` I just knew that Norcroft was better than to output such awful code. I can’t speak for GCC, but it would surprise me if it wasn’t better also. ~~As a consequence, I’m starting to get a horrible feeling that this code just might have been written by hand…~~ Nope! It’s Norcroft. Jeffrey explains…

Jul 2, 2018 5:56pm Jeffrey Lee (213) 6048 posts	Oh, Rick. You try so hard and yet you try so little. #include <stdio.h> typedef struct { unsigned int one; unsigned int two; unsigned int three; unsigned int four; unsigned int five; unsigned int six; } mehdef; mehdef meh; int main(int argc,char *argv) { meh->one = 1; meh->two = -1; meh->three = -1; meh->four = -1; meh->five = -1; meh->six = -1; return 0; } ; generated by Norcroft RISC OS ARM C vsn 5.76 [19 Mar 2018] AREA \|C$$code\|, CODE, READONLY \|x$codeseg\| DATA main LDR a1,[pc, #L000044-.-8] MOV a2,#1 LDR a3,[a1] STR a2,[a3] LDR a3,[a1] MVN a2,#0 STR a2,[a3,#4]! LDR a3,[a1] STR a2,[a3,#8]! LDR a4,[a1] STR a2,[a4,#&c]! LDR a3,[a1] STR a2,[a3,#&10]! LDR a3,[a1] MOV a1,#0 STR a2,[a3,#&14]! MOV pc,lr L000044 DCD \|x$dataseg\| AREA \|C$$data\|,DATA \|x$dataseg\| meh DCD &00000000 EXPORT main EXPORT meh END

Jul 2, 2018 6:26pm Rick Murray (539) 13850 posts	You try so hard and yet you try so little. Yup. Indirected. I may have gotten there. I was sidetracked trying to get something useful out of GCC (that I’ve never used before). My result, with indirection, matches yours. Now that is some messed up code. :-) I could (almost) understand the paranoia with loading the base address over and over. But the writeback? That said, my first attempt ended with an apparently useless writeback (the `STR a4,[a3,#&14]!`), so I’m wondering if this is just what Norcroft does with the register as a sort of sign-off? I’m done here, writeback? Still weird. But not as weird as GCC :-) .file "test3.c" .local meh .comm meh,4,4 .text .align 2 .global main .ascii "main\000" .align 2 .word 4278190088 .type main, %function main: @ args = 0, pretend = 0, frame = 8, outgoing = 0 @ frame_needed = 1, uses_anonymous_args = 0 mov ip, sp stmfd sp!, {r9, fp, ip, lr, pc} sub fp, ip, #4 cmp sp, sl bllt __rt_stkovf_split_small sub sp, sp, #8 mov r9, sp str r0, [r9, #0] str r1, [r9, #4] ldr r3, .L3 ldr r3, [r3, #0] mov r2, #1 str r2, [r3, #0] ldr r3, .L3 ldr r3, [r3, #0] mvn r2, #0 str r2, [r3, #4] ldr r3, .L3 ldr r3, [r3, #0] mvn r2, #0 str r2, [r3, #8] ldr r3, .L3 ldr r3, [r3, #0] mvn r2, #0 str r2, [r3, #12] ldr r3, .L3 ldr r3, [r3, #0] mvn r2, #0 str r2, [r3, #16] ldr r3, .L3 ldr r3, [r3, #0] mvn r2, #0 str r2, [r3, #20] mov r3, #0 mov r0, r3 ldmea fp, {r9, fp, sp, pc} .L4: .align 2 .L3: .word meh .size main, .-main .ident "GCC: (GCCSDK GCC 4.7.4 Release 3) 4.7.4"

Jul 2, 2018 6:38pm Jeffrey Lee (213) 6048 posts	But not as weird as GCC :-) Make sure you have optimisation enabled (it’s disabled by default, for insert-historical-reason-here). As long as you have optimisation enabled, you have to try very hard to get GCC to output code which is as bad as Norcroft. Probably because GCC attempts to take full advantage of the “strict aliasing” rules, whereas Norcroft doesn’t. Strict aliasing (introduced in C99 I think?) basically says “two objects of different data types won’t occupy the same memory”, and so allows the compiler to much more easily make decisions about what cached values may need discarding following a memory write.

Jul 2, 2018 7:56pm Glen Walker (2585) 469 posts	you have to try very hard to get GCC to output code which is as bad as Norcroft This is slightly off topic, but… Is the Norcroft compiler going to get fixed/upgraded/replaced or are we going to eventually move over to GCC? I really like the way source/header files on RISC OS whereas GCC doesn’t seem as integrated – but obviously if its producing better assembler code then we should be using it right?

Jul 2, 2018 8:34pm Dave Higton (1515) 3534 posts	People, please… Code only has to be good enough. ISTM that there are people here with the mindset of the 1970s and early 1980s, where every byte and every microsecond made a difference. In most cases they don’t matter today. (Lest anyone should misrepresent me, I’m not praising bad code.) It’s much more important to write maintainable code. As such, C wins over assembly language. New architecture comes along: recompile with the new version of the compiler.

Jul 2, 2018 8:59pm Rick Murray (539) 13850 posts	In most cases they don’t matter today. That’s quite true; but in our defence (us people of the 70s/80s who did count cycles and bits used), the great amounts of memory available and fast storage mean that some utterly horrendous things get done… and we have to moan about it to each other, because kids these days… they just wouldn’t understand…