C is not a low level language (any more)

39 posts, 10 voices

Pages: 1 2

May 22, 2018 1:51pm GavinWraith (26) 1563 posts	Fascinating article on why C can no longer be thought of as a low-level language. Developments in CPUs and GPUs should force us to rethink many hoary assumptions about programming.

May 22, 2018 4:40pm Rick Murray (539) 13851 posts	“A programming language is low level when its programs require attention to the irrelevant.” Using that definition, one might consider BASIC to be lower level than C. Anybody who disagrees, please write a Wimp program. ;-) I’m not sure that one can lay blame on C. For instance the misconception of the flat memory model. This is typically assumed, and those of us of a certain age will remember the horribleness that was the segmented memory models (plural) of the x86. But, when you think about it, flat memory has never really been true. Any version of Windows worth mentioning used virtual memory so the actual allocations are an illusion, and even in the world of RISC OS, our memory mapping is a lot simpler but one thing is for certain, every application is not based at &8000. It’s only there when it needs to be. While it is true that there are caches and virtual memory and MMUs, it isn’t valid to say that the current problems are C’s fault. A programmer wants to malloc() a chunk of memory and I’m quite sure 99% don’t care where or how so long as it works (did you know it comes from application space in an app and the RMA in a module? you probably wouldn’t care if it was a heap in a Dynamic Area, so long as it worked). yet a significant proportion of programmers either believe the wrong thing or are not sure This would probably hold true for any number of contemporary languages. One needs only read a bit of StackOverflow to see that. There may be recognised qualifications required in order to call oneself a software engineer, but there’s nothing required in order to call oneself a programmer (hell, I learned from the Beeb user guide, AcornUser, and hacking stuff to see what broke – nobody ever taught me C, or assembler, I read stuff, examined stuff, took stuff apart…all of which probably makes me the very worst sort of programmer – but then I started with BASIC so I’m already damned…). Point is, a lot of people tend to learn only that which is necessary to do a specific task, and don’t really know much of the weird and complicated stuff. In C, think about pointers of the dash-greater-than variety, or using the bracketed question colon thing, whether ++ comes before or after the variable name, or indeed the comma (which I’m sure many think is just a part of the syntax of if statements). Yes, C compilers jump though hoops to make a fairly uncomplicated language generate fast code, but then again don’t you think modern frameworks with tens of megabytes of runtime support libraries aren’t jumping through their own hoops? The problem is nothing to do with the use of any particular language, even if C’s typically linear behaviour (akin to BASIC) is not exactly best suited for multicore processors. The thing is, language doesn’t matter. Why? Because a modern processor isn’t going to be running one C (or otherwise) program, it’ll be running hundreds. And switching around them hundreds or thousands of times per second. This is an operating system issue, not a language issue. But, wait, this switching of contexts, of jumping in and out of code millions of times every minute, billions of billions of times every day, this adds up. So an obvious job for the processor was to try to find ways of making context switches less onerous. Less time dealing with baggage is more time for running the user’s software. This is not a language issue, nor an OS issue, it’s a processor issue. And it’s an issue with unforseen consequences. Consequences that are coming to light, as processor designers concentrated more on speed than on security. Truth is, we can’t have both. The market wanted speed. Here’s the result. PS – why’s this an Announcement? PPS – I never thought of C as “low level” or “readable assembler”. It was just closer to the machine architecture than languages that try to hide function calls, that don’t give you a choice of variable types (signed/unsigned), or worse the “whatever it needs to be” type of variable.

May 22, 2018 4:59pm nemo (145) 2556 posts	Fascinating article I can’t disagree with any of that. -O anything means what you wrote is not what gets run, and the continued preservation of frankly ridiculous undefined behaviour (signed overflow, uninitialised values, cross-type casting) means what you think you wrote means is not necessarily what some compilers will give you.

May 22, 2018 5:01pm nemo (145) 2556 posts	The problem is nothing to do with the use of any particular language No, it is. The C language programmer’s model fits the silicon less and less well each generation. Some other languages fit much better.

May 22, 2018 6:11pm John Sandgrounder (1650) 574 posts	Anybody who disagrees, please write a Wimp program. ;-) Not as difficult as it used to be. In most cases I would argue that it is no longer necessary (with today’s processor speeds) to write windows one rectangle at a time.

May 22, 2018 7:22pm Rick Murray (539) 13851 posts	continued preservation of frankly ridiculous undefined behaviour That is a problem, certainly. Given the number of buffer overruns we’ve seen do nasty things in the past, the continued inability to optionally invoke bounds checking is also ridiculous. It’s a bit like ZeroPain. Use it until you know your program is good, then you can think about turning it off… The C language programmer’s model fits the silicon less and less well each generation. I would like to believe that a language translates something readable into a usefully compact form that may be executed by the system; with less and less dependence upon what the actual silicon is doing. Because these days we cannot dare to assume such things, what with multicore processors, fake multicore processors (i Atom), dual issue, out of order execution, and of course, the fact that in most cases you’re one of many, not exclusive. Some other languages fit much better. Today, perhaps. They’ll get old and outdated too. How many “modern” languages can you name? How many result in actual executable code (as opposed to interpreted script or bytecode)? How many do you think will still be around in five years? That’s not to say that C is the best, but it has huge support and longevity (even C++ failed to kill it off). I can’t, offhand, think of any other language that has the scope and support that C offers. It’s available for everything from microcontrollers to today’s processing monsters – and that alone is why it being “close to silicon” is a bit nonsense, there’s practically nothing in common between a Intel Core i9-7980XE (go eyeball the specs) and…a PIC. Yet, with some obvious allowances for hardware differences, it is possible to compile C code for each, and pretty much everything in between. Other languages may work better with today’s processors (which?) but will it have the level of support and longevity? Will it expand to support hundreds of different devices (and thus risk making itself equally “inefficient” in the process)? I still maintain that C is not especially at fault here. The processors had one job to do, to execute code. Whether single thread C, or assembler written by a monkey, it was just expected to follow instructions. In the craze for squeezing of more and more speed, some corners were cut. Now that’s turning out to have some unwanted side effects. Not as difficult as it used to be. Depends whether you do it yourself or use somebody’s pre-written library. The latter option is, obviously, the logical choice; but even so there’s some nasty mucking around with memory blocks to make it all work. Not at opcode level, at BASIC level. You just don’t see it when using a library… In most cases I would argue that it is no longer necessary (with today’s processor speeds) to write windows one rectangle at a time. Indeed. My Pi2 runs an easy hundred times faster than my original A3000 (faster yet due to cache and architectural enhancements), the Ti is something like twice that. So why should we be drawing windows a rectangle at a time? We have enough processing power that we ought to be able to manage circles and parallelograms too. :-) More seriously, processing speed aside, we have enough memory that I do wonder if it would be useful to draw into a sprite (instead of the window) and have the Wimp handle most of the redraws for itself? Perhaps blitting a part of a sprite would be quicker than expecting the application to work out what needs to be redrawn from an arbitrary rectangle created as a result of a menu or some other window appearing in front of the window that we have drawn into.

May 22, 2018 9:32pm John Sandgrounder (1650) 574 posts	Depends whether you do it yourself or use somebody’s pre-written library. Or a compromise. use somebody’s existing code as a starting point.

May 22, 2018 9:41pm Tristan M. (2946) 1039 posts	A compositing window manager would make a nice option. Implementation would probably make some nice additions to z ordering support. What languages are more suitable? I’m curious. Maybe Go, or the relative safety of Rust?

May 22, 2018 10:24pm Jeffrey Lee (213) 6048 posts	A compositing window manager would make a nice option. Implementation would probably make some nice additions to z ordering support. This has been discussed before, and may have even existed at one point. https://www.riscosopen.org/forum/forums/2/topics/2541 What languages are more suitable? I’m curious. Maybe Go, or the relative safety of Rust? Depends on what you’re after, as they both have their pros and cons when compared to other languages, including C. E.g. they both offer very little control over memory allocation strategy (or failure modes when allocation fails). Fine for application code where you can mostly assume you have gigabytes of uniform RAM and gigabytes of pagefile, not so great for OS kernels or hardware drivers. Also despite Rust aiming to be as safe as possible, they still haven’t solved this stupid and dangerous bug.

May 23, 2018 10:10am Rick Murray (539) 13851 posts	This has been discussed before, and may have even existed at one point. Steve said, at the time: And because the changes to the Wimp would be so widespread, the sensible answer would be to throw it away and write a new one. This is not necessarily a bad plan, because it seems to me that the fundamental behaviour of the Wimp itself is going to have to change. For a start, consider the idea of the Recorded Message. You send the message, and you receive back either a reply or your message back (essentially an ACK or a NAK). Well, the moment the Wimp gains the ability to disperse tasks among cores (as it surely should to make best use of the hardware) is the moment that one ceases to be able to assume that each task will be run in rotation. It is quite possible that the Wimp will maintain a list of pending tasks, and the next one to be ready will be assigned to the next free core. However, and here’s the important part, we cannot assume in one polling cycle that each interested task has in fact been executed. Imagine if one of the tasks was ChangeFSI (known for stalling the machine for a long time while it works its magic). If tht was sat hogging one core, the rest of the Wimp would cycle thousands of times in the meantime. Questions like that (and all of the other wishlist items) raise the question of whether it is better to patch the existing code, or to design something new with the features available from the outset? Of course, as always, we’re right back to the issue of developer time and resources… Depends on what you’re after, as they both have their pros and cons when compared to other languages, including C. There’s a certain irony in the LLVM backend being written in C++. ;-) they still haven’t solved this stupid and dangerous bug. Wow. I like the discussion regarding “accuracy” and whether to saturate. A big float is not going to fit into a little integer, so it doesn’t really matter what is stored. Wrong is wrong. I can understand saturating, but it seems like their implementation added a huge overhead (even in values within range). Perhaps one should just AND mask it? With this in mind, I tried a program to see what RISC OS’ C does: Here’s some code: `#include <stdio.h> int main(void) { double fpval; struct { unsigned short first; unsigned short second; unsigned short third; } sixteenbit; sixteenbit.first = 43690; // %101010(etc) sixteenbit.third = 21845; // %010101(etc) fpval = (double)sixteenbit.first * (double)sixteenbit.third; fpval = fpval * fpval; sixteenbit.second = (unsigned short)fpval; printf("%f, %d, %d, %d\n", fpval, sixteenbit.first, sixteenbit.second, sixteenbit.third); return 0; }` Running it: 954408050.000000, 43690, 7282, 21845 The value 7282 is 954408050 ANDed with &FFFF. The code? A bit jumbled up, but essentially: MOV R1,#&AA ; set up %10101010 STRB R1,[R13,#-8]! ; write %10... to stack [sixteenbit.first+0] ; and update stack pointer to make space for the struct MOV R3,#&55 ; set up %01010101 MOV R2,#&AA ; set up %10101010 STRB R3,[R13,#4] ; write %01... to stack+4 [sixteenbit.third +0] MOV R0,#&55 ; set up %01010101 STRB R0,[R13,#5] ; write %01... to stack+5 [sixteenbit.third +1] STRB R2,[R13,#1] ; write %10... to stack+1 [sixteenbit.first +1] LDR R1,[R13,#4] ; load WORD from sixteenbit.third+0 LDR R3,[R13,#0] ; load WORD from sixteenbit.first+0 ADR R0,&00008130 ; (point at the printf string) MOV R1,R1,LSL #16 ; as we loaded a WORD, shift up and then MOV R3,R3,LSL #16 ; down by 16 bits; effectively clipping to MOV R1,R1,LSR #16 ; a 16 bit value (AND &FFFF would have the MOV R3,R3,LSR #16 ; same effect...) FLTD F2,R1 ; sixteenbit.third -> FP register F2 FLTD F0,R3 ; sixteenbit.first -> FP register F0 MUFD F1,F2,F0 ; FP register F1 is F0 multiplied by F2 FIXZ R2,F1 ; FP result -> ARM register R2 (round towards zero) STRB R2,[R13,#2] ; save the first byte as sixteenbit.second+0 MOV R3,R2,LSR #8 ; shift the result value eight bits to the right STRB R3,[R13,#3] ; save this shifted byte as sixteenbit.second+1 Does it flag an overflow? No, it does not. But then, I as the programmer, have asked the compiler to take a potentially large value (20 bits of fraction, 11 bits of exponent) and stuff it into an integer value capable of storing only values from 0 to 65535. Technically the behaviour is “implementation defined”, but it’s been a while since I’ve seen a compiler that didn’t just throw away the data too large to fit…

May 23, 2018 10:24am Rick Murray (539) 13851 posts	Of course, one can arrive at code better suited to the Pi’s ARMv7 processor by using the `-arch 7` command line option… Better? BETTER? Note the glaring absence of any multiply command. LDR R1,&000080E0 ; load &AAAA ADR R0,&000080E4 ; (point to printf string) LDFD F0,&000080F4 ; LOAD PRECALCULATED FP VALUE! STRH R1,[R13,#-8]! ; save 16 bit value [sixteenbit.first] MOV R2,R1,ASR #1 ; shift it one place to the right STRH R2,[R13,#4] ; save 16 bit value [sixteenbit.third] LDR R2,&000080FC ; load PRECALCULATED sixteenbit.second value! LDRH R3,[R13,#4] ; read sixteenbit.third STRH R2,[R13,#2] ; store sixteenbit.second STMDB R13!,{R2,R3} ; mess with stack (for coming printf) LDRH R3,[R13,#8] ; read... sixteenbit.first? STFD F0,[R13,#-8]! ; oh, and while we're there write out the ; FP value and move stack pointer again... ;-) This is, clearly, when the compiler decides to troll the programmer.

May 23, 2018 10:26am Andrew Conroy (370) 740 posts	A compositing window manager would make a nice option. Am I the only one who keeps reading that as “A composting window manager…”?

May 23, 2018 10:32am nemo (145) 2556 posts	Well, the moment the Wimp gains the ability to disperse tasks among cores (as it surely should to make best use of the hardware) is the moment that one ceases to be able to assume that each task will be run in rotation. Let’s not have that discussion again. Just like Windows, the entire Wimp desktop protocol must be single threaded. Otherwise if you have two running programs that can load text files then when you double-click one you have no idea which program will load it… and if you double-click again it might open in the other too. I’ll say it again and hopefully for the last time: Task messaging/polling must be single threaded.

May 23, 2018 11:15am GavinWraith (26) 1563 posts	An important requirement of any programming language is that it should enable the programmer to have an adequate mental picture of its operational semantics. What adequate means obviously depends on what the programmer is trying to achieve, and to what extent any given language satisfies this is up for debate. But if our hardware is now so complex that only a handful of people understand it, then we are in trouble. The classic SICP book makes great play of the notion that programming languages are not just for instructing machines; they are also for describing algorithms to humans. Functional programming has been around since LISP, but after Backus’s Can programming be liberated from the von Neumann style? in 1978, their advantages for multiprogramming have been recognised. There seems to have been a wide gap between academia and commerce about this (with some honorable exceptions). Maybe the prevalence of mult-core processors will force a greater rapprochement?

May 23, 2018 2:09pm nemo (145) 2556 posts	any programming language…should enable the programmer to have an adequate mental picture of its operational semantics… But if our hardware is now so complex that only a handful of people understand it, then we are in trouble. These two statements are orthogonal – the problem with C is that its semantics are weirdly compromised by… well I was going to say early hardware, but it was actually executive decision by Dennis Ritchie. As long as the people who wrote your compiler understand the hardware, the only need for the author to understand it too is a degree of optimisation or ‘mechanical sympathy’. But when the compiler (for an appropriate language) is free to decide whether to use arrays of structures or groups of arrays (or a combination as appropriate) then the degree of mechanical sympathy required by the author is massively reduced. By mechanical sympathy I mean awareness of things like cache lines, cache sizes and behaviour, structure packing etc This is a good read on undefined behaviour.

May 23, 2018 2:55pm Steve Pampling (1551) 8172 posts	Otherwise if you have two running programs that can load text files then when you double-click one you have no idea which program will load it… and if you double-click again it might open in the other too. I’m being dim aren’t I? I thought that was down to the definition of the appropriate variable: `Alias$@EditType_FFF : Run HostFS::HostFS.$.Utilities.!StrongED.!Run %*0`

May 23, 2018 3:05pm Jeffrey Lee (213) 6048 posts	I believe there’s a Wimp message which gets sent around, prior to the system falling back to the environment variables. So if you have multiple running apps which are capable of responding to that message (e.g. Edit and StrongED), and a multi-threaded Wimp decided to deliver the messages in a non-stable order, you could easily get different behaviour each time you tried opening a file.

May 23, 2018 3:14pm Steve Pampling (1551) 8172 posts	So if you have multiple running apps which are capable of responding to that message (e.g. Edit and StrongED), Observed behaviour is that where a text editing app has startup config to set the variable the last one to load, and therefore point things their way, loads the file.

May 23, 2018 3:38pm Rick Murray (539) 13851 posts	Otherwise if you have two running programs that can load text files then when you double-click one you have no idea which program will load it How does RISC OS do it? Offer the file to every interested app (from the beginning) until one (usually the first loaded) accepts it? Why does this require messaging to be single threaded? Just like Windows, the entire Wimp desktop protocol must be single threaded. “The system maintains a single system message queue and one thread-specific message queue for each GUI thread.” [https://msdn.microsoft.com/en-us/library/windows/desktop/ms644927(v=vs.85).aspx] That said, Windows works differently to that. Like with RISC OS, files are “associated” with an application. I’m running Notepad, I double click a text file, Notepad++ loads it, as that’s what’s been told to deal with text files. On RISC OS, it’s more interesting. Load Zap. Load StrongEd. Load a text file (it’ll open in Zap). Quit Zap and StrongEd. Double click on a text file again. What loads it? [hint, it isn’t Zap] Indeed, on my system in Apps I have Edit, StrongEd, and Zap. Zap is the default by virtue of being the last one “seen” by the filer and hence the last one to mess with the RunType values. But if StrongEd was loaded then it would get the file as offering it around tasks happens before the system looks to see what could handle the file. I’m sure you’re all used to this – what application directories you open can (and will) affect what handles JPEGs, MP3s, and maybe even HTML… By mechanical sympathy I mean awareness of things like cache lines, cache sizes and behaviour, structure packing etc I’m guessing that this is due to the age of C and the fact that such things didn’t exist back then? That said, there’s the perennial problem of endianness, how big an “int” actually is, and structure packing alignments. I learned that hard when making some command line stuff in C work on DOS as well as RISC OS – an int was 16 bit, structs aligned to two byte boundaries, everything “back to front”; other then that it worked. :-) and a multi-threaded Wimp decided to deliver the messages in a non-stable order, That’s easy enough to mostly fix. There will, by necessity, be a global list of tasks, even if they’re running at different speeds on different cores. Simply offer the file to each in turn as their poll turn arrives, skipping over blocked/busy tasks.

May 23, 2018 7:42pm nemo (145) 2556 posts	Steve partially remembered: I’m being dim aren’t I? I thought that was down to the definition of the appropriate variable Double-click on file Filer broadcasts DataOpen with filename and filetype If that bounces (nothing running has acknowledged the message) Filer uses `Alias$@RunType_###` Some programs that ask others to edit a “file” use a very similar protocol by Jason Williams called the External Edit Protocol, which I think `Alias$@EditType_###` is the fallback for. I was never very convinced by the need for it, I’ve never implemented it. Rick continued to Not Get It Simply offer the file to each in turn as their poll turn arrives, skipping over blocked/busy tasks. Run Zap and StrongEd Double-click text file Zap loads it Double-click text file Zap happens to be busy StrongEd loads it The protocols upon which the familiar RISC OS Desktop experience are built require a single-threaded message polling system. If one had filetype registration, it would be possible to reduce this to subsets for particular messages, but the principle stands. There are three classes of Wimp messages: Broadcast negotiations Broadcast notifications Direct messages The negotiations (of which DataLoad/DataOpen is but one example of many) must be delivered in a static order as Jeffrey has pointed out. Notifications could be randomised, but the Wimp would have to keep a record of every busy task still to see a particular message before the sender could continue. Direct messages require strict serialisation between the parties… which may not be limited to two. Adding knowledge of the type of each kind of message to the WindowManager, and the complex handling necessary for the various types, all to achieve the perceived advantage of allowing Wimp programs to spend time ‘busy’ in their main thread, achieves nothing. To quote the old joke, “If I was going to go there, I wouldn’t start from here ”.

May 24, 2018 6:07am Rick Murray (539) 13851 posts	Rick continued to Not Get It Rick actually did get it, given the scenario you posted pretty much matches what I summed up with my one line… The protocols upon which the familiar RISC OS Desktop experience are built require a single-threaded message polling system. So how then might this expand to a future scenario where the Wimp can spread tasks around cores in order to make best use of the available hardware? A scenario where different tasks will be running on different cores concurrently, and quite likely at different speeds. Or does the current system pretty much preclude this? That was the question that I was hinting at.

May 24, 2018 8:27am John Sandgrounder (1650) 574 posts	Southport Sailing Club is using RISCOS to score the Junior 12 Hour Race which is held on the last Saturday in June each year. Each boat carries a small GPS Tracker which reports its position to a RISCOS Server every 30 seconds (and more frequently at the end of each lap). The GPS server then reports each completed lap to the race scoring program (also RISCOS). Results are calculated live and updates sent to the Race Scoring website (Raspbian Apache) every minute. The GPS server and race scoring are written in BBC Basic. The results of the 2017 race and the corresponding GPS Tracks for each boat on every lap can be seen by following the 12 Hour Race – Race Results Site link on the Southport Sailing Club website. The scoring for all but the first 3 of the linked previous 30 races was also done with same RISCOS program. GPS Tracking was introduced in 2017 (after a few years of experimentation). All of the software and website is on Raspberry Pi.

May 24, 2018 8:30am nemo (145) 2556 posts	The short answer is: You can’t spread tasks around different cores, but you can spread parts of tasks around different cores. There are many analogies – Web Workers in JavaScript, Pool & Map in Python, Channels in Rust – but we already have a suitable RISC OS model, at least at the low level – The Tube (or Hydra I suppose). Writing multithreaded code (using pthreads for example) is hard (in general), because it is very easy to do the wrong thing – it’s just a length of rope, after all. The multithreaded infrastructure I have created was based on message handling and entity lifetime management in order to insulate programmers from the low-level stuff. Exactly the same idea as the language models I just listed. They all, in effect, allow ‘work’ to be declared in an abstracted form, which the OS/Scheduler can then choose how best to implement and distribute – whether serially on a single core, or concurrently on a proportion of the available cores, or through a thread-pool system on hyperthreaded cores. Those ‘workers’ (to use the JavaScript terminology) have much greater restrictions on their use of ‘the’ OS… and this is why I cite The Tube as the obvious parochial model. The task is the ‘host’, the worker(s) are the ‘Tube(s)’. Communication is down a ‘channel’ (to use the Rust terminology) – this is precisely what “The Tube” refers to – and is buffered so that asynchronous worker messages are queued for the ‘host’ (the foreground task). Anyway, the details aren’t important, but the scale of the parallelism is – it should be ‘bits of work’ that are distributed, not ‘Tasks’.

May 24, 2018 12:35pm Steve Pampling (1551) 8172 posts	The short answer is: You can’t spread tasks around different cores, but you can spread parts of tasks around different cores. So if the WIMP is a task of the basic OS then you can spread bits around?

May 24, 2018 12:50pm Jeffrey Lee (213) 6048 posts	The short answer is: You can’t spread tasks around different cores, but you can spread parts of tasks around different cores. … Anyway, the details aren’t important, but the scale of the parallelism is – it should be ‘bits of work’ that are distributed, not ‘Tasks’. “You can’t spread tasks around different cores” sounds like you’re saying that only one task should be running at once, which sounds a bit barmy to me. I’m assuming this was just a poor choice of words and the second statement is the more accurate one?