Idea for discussion: Rewriting RISC OS using a high-level language
Terje Slettebø (285) 275 posts |
Hi all. I love RISC OS, and would like it to succeed, and I’d also like to contribute to its development. However, with much of the code being written in ARM assembly, I find much of it almost impossible to understand, and also very hard to work with. I’m no stranger to working with ARM assembly, but let’s be honest with ourselves: In general, all development takes much longer time when all you can use is a low-level language like assembly code… I’ve been struggling with this issue, and one thing that I’ve been thinking of is that, as RISC OS consists to a large degree of modules, it should be possible, in theory, to rewrite one module at a time, in a higher-level language, like C/C++. I know that this would be a large project, but again, it should be approachable on a module-by-module basis. Before I were to start on any rewriting effort, I’d like to know what you, the community, think about this approach. What is largely preventing me from working on RISC OS is that much of the source code is in assembly code, and I suspect that’s something that prevents quite a few other potential developers from contributing, too. After all, one major advantage with Unix (and later Linux) is that it was written in C, making both the development, as well as porting, easier. My intention with this isn’t to enable RISC OS to run on another platform, but – if much of the OS ends up as C/C++ – that could be a possibility, in the long run, as well. Comments? Regards, Terje |
Jeffrey Lee (213) 6048 posts |
I agree that using a high level language would make it a lot easier to write and maintain code. There’s nothing worse than trying to decipher someone else’s poorly documented, tightly-optimised assembler code to work out where a bug lies or how to expand it to add new features. It would probably be relatively straightforward to translate most of the assembler modules into C, since most of them are small and quite simple. Unfortunately, it’s not those simple modules which are holding the OS back – it’s the core modules, things like FileCore, the Wimp, the kernel, etc. Converting those modules will be a lot harder. Rewriting them from scratch would run the risk of introducing many bugs and incompatabilities. But on the other hand if you just did a literal translation of the assembler into C, one function at a time, then the end result could be just as bad to work with as the code that you started with. So I don’t think anyone’s going to have any complaints if you go round converting the simpler modules, but once you get to the core components you might start to make ROOL a bit worried :-) And if you wanted to use C++ then we’d obviously have to make sure the OS can be compiled on something a bit better than Norcroft :-) |
Trevor Johnson (329) 1645 posts |
Hi Terje
RISC OS is listed on the OSDev wiki, if that helps attracting coders.
Wouldn’t speed critical bits need to be kept in assembler? Do you have in mind a suitable (relatively small and uncomplicated) module that this could be tested on in, the first instance? |
Terje Slettebø (285) 275 posts |
Hi Jeffrey and thanks for the response. :) Yeah, I was thinking of some of the larger stuff, where we may want to improve the system… Also, I fully agree with you about the risk of rewriting, and the disadvantages of a “literal translation”… I’m not a stranger to rewriting/reimplementing systems, though: At work, I rewrote one of our systems, although it’s much simpler than RISC OS, in about a year (it’s written in PHP). Hardly any of the original code ended up in the rewritten version, so we had to be very careful about not loosing functionality or introducing bugs, as you point out… Even though this was a major undertaking, it has proved to be a very successful one, as most of our income comes from this system, and the rewriting has enabled us to work efficiently with the code, drastically lowering the complexity of the resulting system. I agree that “writing ARM assembly in C code” would hardly be an improvement, so the idea of a reimplementation would be to create a new structure, making use of abstractions in higher-level languages, and reimplementing it so that it works the same way, keeps the same interfaces, but the implementation may be completely different. Regarding the compiler: Yes, I would really like to use GCC for this… Still, might it not be possible to have some modules compiled using GCC, and others compiled using Norcroft, when building RISC OS? It seems to me that not all of it would be needed to be “ported” to GCC, to be able to build some modules with GCC? Fortunately, RISC OS is pretty well documented through the PRMs, so that most of the functionality of each SWI should be documented there (ideally all the functionality, but things like error checking may not be documented in full). |
Terje Slettebø (285) 275 posts |
Hi Trevor. About the licence: Yeah, well, like I said, the primary motivation was not originally platform-independence, but more being able to work on the OS in a more efficient manner, and lowering the “barrier to entry” for other developers to contribute. Probably less and less would need to be kept in assembly code for performance reasons, for the following reasons:
There could be some places that could use/need hand-optimisation, though, (or “intrinsics”), like code that is used a lot, and where the compiler hasn’t done a good enough job (SWI dispatch is pretty crucial, for example). I’d more expect that whatever ARM assembly still needed would be to perform low-level tasks like processor initialisation, etc. No, I don’t have a small module in mind; I was thinking of some of the larger stuff, but I agree that it would be a good idea to test this on a very small module, first. Fortunately, GCC (if we can reasonably use it here) can compile to modules. |
Trevor Johnson (329) 1645 posts |
OK yes, I don’t think anyone’s going to be building and blowing a ROM image for use on anything older than a RiscPC! |
Trevor Johnson (329) 1645 posts |
On re-reading that section of the licence, perhaps this would be OK as long as the intention is use with ARM processors: “PROVIDED THAT such work is only intended to be used in conjunction with […] the ARM processor architecture.” But if others were subsequently to rebuild for other architectures, maybe there’d be nothing preventing them! |
Jeffrey Lee (213) 6048 posts |
I think the licence on GCC’s SharedCLibrary stubs are permissive enough to allow them to be linked into the RISC OS ROM image, but I’m not 100% sure. But regardless of that, there is one feature that Norcroft has which GCC currently doesn’t (to my knowledge) – the ability to staticly link to the version of the C library that’s in ROM. Without this feature I don’t think it would be possible to use GCC for ROM modules, since being staticly linked allows the ROM components to still function even after the ROM C library has been killed (e.g. if you softload a newer version during the boot sequence). So we’d either need the static link ability, or some other method of loading a new C library without introducing a period of downtime where no library is active at all. |
Terje Slettebø (285) 275 posts |
From this page it appears that GCC can statically link to the Shared C Library in ROM:
However, I’m not at all certain about how these “stubs” and linking work, so this would have to be investigated. |
Jeffrey Lee (213) 6048 posts |
Nope, that’ll be dynamic linking via the stubs.
The stubs are basically just a table full of branch instructions. This table gets initialised by the SCL when the program registers itself with it. As far as the compiler is concerned the stubs are real functions, but if you were able to call them before the SCL initialises them you’ll just end up with a nasty crash. It’s fairly similar to how DLLs and similar things work in other OS’s. Volume 4 of the PRMs goes into full detail on how it works, I think. I think the static linking may work just by linking against a special object file containing no code but the appropriate symbol definitions. If that’s the case then it should work with any compiler, although you’d probably run into problems because GCC now uses binutils/ELF instead of AOF… |
Terje Slettebø (285) 275 posts |
Thanks, Jeffrey. I’ll read up on this in the PRM, and have a go at it, starting with a simple module, as a way of examining these issues. If we are to get any major progress on things like these, I really feel we need to move to higher-level languages. |
Andrew Rawnsley (492) 1445 posts |
I certainly think making it more accessible is a good thing, but one must definately remember that the reason many chunks are written in assembler was for performance reasons. Compared to running full linux distros on ARM, RISC OS is ridiculously fast, both from a boot up, and a usability perspective. Having worked on the lubuntu distribution for the ARMini to try and get a less glacial desktop ARM ubuntu distro, the difference compared to using RISC OS, is still huge. RISC OS really is just that nimble (from a user’s perspective, at least). I think it would be a shame to backstep on that front. But… maybe hardware is now speedy enough that the benefits of hand-optimised code aren’t as significant? |
Terje Slettebø (285) 275 posts |
Hi Andrew. I agree with you that we should keep the “nimbleness” of RISC OS… That’s one of the things that appalls me about Windows: Despite multi-core, multi-GHz processor, it may still take seconds from you press the Delete button, and until you get the file deletion dialogue box… I mean, come on, what’s going on…? :) On RISC OS, when you click a drive icon, the directory viewer pops up instantaneously (and did so even on an 8 MHz ARM), while on Windows it may still take seconds… So we definitely should keep the smoothness of the system, even if we move to a higher-level language. Not everything in RISC OS is that smooth, though; filing operations comes to mind, where probably due to blocking for I/O the system almost grinds to a halt when copying a lot of files, whereas it hardly taxes a Windows system… Fixing something like this is non-trivial, especially if you are to work with raw assembly code, whereas C/C++ is much more amenable to changes. If written carefully, I think a RISC OS written in C/C++ should be comparable in performance to today’s version. For larger pieces of code, a compiler may well do better than a human assembly programmer, as it may do more global optimisation. |
Uwe Kall (215) 120 posts |
Terje: What development environment do you intend to use? |
Terje Slettebø (285) 275 posts |
Hi Uwe. I was thinking of using GCC, since the Norcroft compiler only handles C, really, and I’d like to be able to use C++ for this development. Jeffrey pointed out that using GCC could be problematic, if the shared libraries module go missing sometime, but then again, with C++, you need more than the shared C libraries, anyway. I also realise that people in the open-source community feels differently about C++: Some like it, some don’t, so I’ve deliberately avoided mentioning a specific language in this discussion. I’d prefer to use C++ for the higher-level abstractions and larger standard library it offers. For example, if you were to reimplement the Wimp module, you’d need some way of queuing messages for the active tasks. Furthermore, they need to be given in priority order. Well, what do you know: The standard C++ library already has a priority queue which should do nicely for the job… :) RISC OS is a large system, so we should avoid having to “reinvent the wheel”, if we can use components like this, where it makes sense… |
Jeffrey Lee (213) 6048 posts |
Although I don’t have many complaints about using C++, I’d be very wary about using STL. In my (somewhat limited) experience it’s very hard to control its memory usage. Specifically, you can’t easily predict/control when it allocates or frees memory, which can result in a lot of fragmentation, and that’s something that could seriously impact the long-term stability of the OS. Plus some STL implementations don’t scale very well to multi-threaded/multi-core systems (they might just have one or two global mutexes which they use to control access to everything), so if/when we start using it on multi-core systems we might not get the performance we want. |
Terje Slettebø (285) 275 posts |
Hi Jeffrey and thanks for your feedback. Naturally, if the usage area is not a good fit for STL, we might need to use other components, or make our own. However, before we decide to “roll our own” (or use components not in the standard library) then I think you’ll agree that we should first see if there really is a problem, because I have the impression that many have avoided STL due to perceived performance problems, that may have turned out to be either outdated experience or rumours. Since a lot of work has gone into the design and implementation of STL, as well as a lot of usage experience from it, it’s rather unlikely that individuals, on their own, can make substantial improvements to the components, in their area of applicability (i.e. general usage), and you may well end up with something suboptimal or buggy. However, let me address your specific concerns:
This is actually something that is well-specified in the standard (containers generally double in size for each resizing), and there are a variety of containers to choose from with different characteristics. Furthermore, you may call reserve() to reserve in advance specific amounts, to reduce or eliminate the need for any new allocations. If you still need better control than this, it’s possible to use “custom allocators”, and although they have been criticised to be hard to work with, it’s a possibility.
Given that C++03 (the last C++ standard, before the recent C++11 ratification) has no support for concurrency, it’s not surprising that the standard library (of which STL is a part) has no such support, either… :) There may however be standard library implementations with support for concurrency, as you mention, and how they work is then entirely up to the implementer. Since any locking policy is up to the implementer, it could be anything, but also, since the STL is standardised, it might exist other implementations with a locking policy more appropriate for us. In any case, as long as RISC OS currently has no support for concurrency, I don’t really see that this would be an issue at this point. In particular because there are no (before C++11) standard concurrency primitives, like locks, in the standard library, and RISC OS have none at all. I’m not sure what you’re thinking about as an alternative, but I see three options: 1. Use the standard library Given the situation we have today, is there any advantage of option 2 or 3, over 1? |
Jeffrey Lee (213) 6048 posts |
I’m not sure what you’re thinking about as an alternative, but I see three options: Personally I’d lean more towards option 3, so that we only have ourselves to blame if the library isn’t any good :-) But for the initial development phase, I guess there’s no harm in using STL. Then once everything is working we can start looking at how good things are in terms of performance/memory usage. |
Terje Slettebø (285) 275 posts |
The GNU License FAQ indicates that this is the case. Would there be any reason to believe that the stubs have a different license to the rest of the library, if they are part of GCC? |
Uwe Kall (215) 120 posts |
I think c++ to be the most appropriate tool/language too, even with stl, though my question concerning the development environment was also about debugging, profiling, editing, searching… |
Terje Slettebø (285) 275 posts |
Ah, well, so far I’ve just compiled at the command line, and used a regular text editor, like StrongED. Also, as I mentioned in another thread, compilation goes drastically faster if you copy the !GCC Debugging and profiling? Who knows. :) I tend to get by with print statements for debugging. Do you have any recommendations for tools? |
Andrew Rawnsley (492) 1445 posts |
At the risk of appearing a ludite, I can’t help feeling that moving away from Norcroft, when it has served us so well for this in the past, is a little too progressive. Whilst Norcroft’s C++ support is limited, it isn’t non-existant (several well-known R-Comp programs were/are C++ Norcroft apps), and encouraging dev in C rather than C++ is, I think, worth championing still (in the RISC OS world, at least). |
Terje Slettebø (285) 275 posts |
Hi Andrew. I certainly respect your point of view. :) While Norcroft may be a fine C compiler, it’s a lousy C++ compiler, and I feel that’s the crux of the matter, at least for me… Dating back to CFront (Stroustrup’s early C++ implementation, decades ago), it’s barely above “C with Classes” support, implementing an early form of templates, but no exceptions, for example. Therefore, to put it bluntly, Norcroft isn’t usable for C++ development, at least if you want to use modern C++, including advantages like resource management using RAII, etc. Even if you stick to “early C++”, the generated code (at least the last time I looked at it, about 15 years ago), was pretty suboptimal. For example, if you used classes/structs, it always passed them around as pointers to the object, even if the complete object could fit in a single register. For me, I have no problem with people developing applications using C with Norcroft (or GCC for that matter), and it’s probably a fine compiler for that. However, reimplementing RISC OS using C just doesn’t hold any attraction for me… I’m sorry to say so, and I have nothing against C as such: It’s a beautiful, little language (at least C89; they have since added things that seems to go against its spirit, like support for complex numbers directly in the language), but it’s basically a “portable assembler”, and as such it does its job in an admirable way. For me, however, its facilities for abstraction and design space is just too poor, for larger applications (or modules). All you have is basically functions and structs, and, yes, pointers. One of the major problems with C for me is that if you want to do a proper error handling, you should check every return code, and doing that, the code tends to become a mass of nested if’s, completely obscuring the application logic. Because of this, most people tend to omit a lot of error checking (“This function is unlikely to return an error…”), and as a result, we end up with a lot of buggy, brittle applications (witness how Windows tend to “silently crash” when it runs out of some resource, for example). It’s things like this that exceptions solve, for one thing, where you can’t forget to check a return code. In any case, I suggest we let the community decide. If someone wants to reimplement a module in C, that’s perfectly fine by me. It certainly beats assembly code, at least (unless the code is horrible… :) ). Likewise, if someone would like to take a stab at doing something in C++, I think we should let them, too… If nothing else, we gain experience with several approaches this way. Yes, C has served us well in the past, as has assembly code (I’ve been developing extensively with both, especially assembly code, but e.g. extAOF, the AOF front end to extASM, is written in C), but the world has moved forward since these languages were created, and I, for one, am not tempted to let go of the advantages I experience I get from a language like C++. Lots of ink (electrons?) have been spent on C vs C++, and I respect the preference of those who prefer C, so I hope this won’t turn into yet another “language war”. Live and let live is my attitude. :) |
W P Blatchley (147) 247 posts |
So, so, so true! I’ve often mused on this very problem. C gurus among us, is there not a neat way around this? |
Terje Slettebø (285) 275 posts |
I find it quite amusing that an innocent-looking program like this: #include <stdio.h> int main() { printf("Hello, World!"); } is actually “wrong”… Did you know that printf() actually returns -1 if it fails…? And how often do you test the return code of printf…? I thought so… ;) It’s kind of reminiscent to Exception Handling: A False Sense of Security, which was something of a wake-up call in the C++ community… It took the community many years to meet the challenge posed by the author, but we now know how to write exception-safe code (code that doesn’t leak resources in the presence of exceptions), and although it requires skill to do that where you handle “raw” resources, you can generally forget about it in the rest of the code, using the RAII idiom, ensuring that everything has an “owner”. |