Copilot for RISC OS coding
Julie Stamp (8365) 474 posts |
I’ve tried asking copilot to make simple Wimp programs in BBC BASIC, and it gets most of the way there, but does get things wrong like inventing a SWI Wimp_MoveWindow or missing TASK from the Wimp_Initialise arguments. Many people using copilot for programming on other platforms have said that it gives them a tremendous boost, so I was wondering what we could to do make copilot code more correct when programming on RISC OS? I think the ways we have to teach it about RISC OS coding are
So does anybody have insight into how to approach this, or even had success in generating good code? I’d be particularly interested in anybody who has Visual Studio+Copilot or similar and whether it does something better than the web app. Or does anyone know of another situation where people have managed to improve chatbot output for their work? |
Jon Abbott (1421) 2651 posts |
You’d probably have better success using an open-source LLM and exposing it exclusively to RISC OS source-code. As for Copilot, I tried really hard to use it this year when developing for RISC OS and found it to be lacking. It fails at even the simplest of questions about PC standards that were defined long ago. For example questions about GPT and FAT come back with incorrect results and quite often it just recites one or two sites as there’s not enough information out there for it to come back with an accurate response. I do use Visual Studio for RISC OS development, so if that’s being fed back to Copilot, perhaps it might improve over time. |
Paolo Fabio Zaino (28) 1882 posts |
@ Julie One of the main ideas behind creating an organization on GitHub was to help train Copilot. GitHub is the primary resource Microsoft uses for this training.1 Creating an organization contextualizes the source of knowledge, making things a bit faster. However, it’s not a must-have; personal accounts are also fine as long as they contain repositories with enough context and not just raw source code. Contextualizing an organization means creating something like “RISC OS Community” or “RISC OS Code Repository” rather than names like “Underground Coding” or “John’s Awesome Organization.” I have developed several proprietary models for code analysis that are not open-source. However, it’s entirely feasible to have LLMs that fully understand RISC OS, all its programming languages, and even assist users with daily tasks. Here are my suggestions on how to tackle this: - Upload more documented example code on GitHub. You can use my organization, which is well-contextualized, your own account, or create a new well-contextualized organization. - Write your code as a set of simple examples and document them thoroughly. Be verbose and imagine you are writing for a complete beginner who understands ANSI C but has no idea how it works on RISC OS. Here is an example of program context: https://github.com/RISC-OS-Community/GNUASM-Tutorials/tree/main/src/PlayingWithCapsLockLED Here is an example of verbose comments: https://github.com/RISC-OS-Community/GNUASM-Tutorials/blob/main/src/PlayingWithCapsLockLED/s/FlashCapsLock - Write more articles explaining RISC OS. Again, ensure they are verbose and provide context. Here is an example of an article that AI “likes” a lot: https://paolozaino.wordpress.com/2020/08/07/risc-os-introduction-to-the-arm-aif-object-file-format/ Note the context in the URL, title, and the article’s organization, including the table and the C struct at the end. Proper organization in chapters is also crucial. To effectively train an AI, many elements must be in place. I hope this makes sense. Gerph has done this really well with his BBC BASIC code examples here: https://github.com/gerph/riscos-examples Regarding BBC BASIC, there is a minor but significant issue. Copilot seems to have been trained with many examples of BBC BASIC for Windows, which isn’t ideal for RISC OS. This creates noise, similar to C, where you must always be verbose in the prompt. As Jon suggested, creating a full infrastructure from the ground up might be more effective for faster success, though it could incur costs when opened to the general user. HTH 1 My personal vision for RISC OS is to make it more useful and accessible to everyone interested. AI fits this goal well, so I am happy to help if needed. |
Paolo Fabio Zaino (28) 1882 posts |
Here is an example of how to use the stuff I shared above: Using Copilot: Prompt: "Using the information presented in the article in the following URL, write a simple ANSI C program that does a minimal AIF file header analysis. https://paolozaino.wordpress.com/2020/08/07/risc-os-introduction-to-the-arm-aif-object-file-format/" Response: #include <stdio.h> #include <stdint.h> typedef struct { uint32_t BL_decompress_code; uint32_t BL_selfreloc_code; uint32_t BL_zeroinit_code; uint32_t BL_imageentrypoint; uint32_t swi_OSExit; uint32_t size_ro; uint32_t size_rw; uint32_t size_debug; uint32_t size_zeroinit; uint32_t debug_type; uint32_t image_base; uint32_t workspace; uint32_t reserved[4]; uint32_t zeroinitcode[16]; } AIF32HeaderBlock; void print_aif_header(AIF32HeaderBlock *header) { printf("AIF Header Analysis:\n"); printf("Decompress Code: 0x%08X\n", header->BL_decompress_code); printf("Self Reloc Code: 0x%08X\n", header->BL_selfreloc_code); printf("Zero Init Code: 0x%08X\n", header->BL_zeroinit_code); printf("Image Entry Point: 0x%08X\n", header->BL_imageentrypoint); printf("OS Exit SWI: 0x%08X\n", header->swi_OSExit); printf("Read-Only Size: %u bytes\n", header->size_ro); printf("Read-Write Size: %u bytes\n", header->size_rw); printf("Debug Size: %u bytes\n", header->size_debug); printf("Zero Init Area Size: %u bytes\n", header->size_zeroinit); printf("Debug Type: %u\n", header->debug_type); printf("Image Base: 0x%08X\n", header->image_base); printf("Workspace: 0x%08X\n", header->workspace); // Additional fields can be printed as needed } int main() { FILE *file = fopen("example.aif", "rb"); if (!file) { perror("Failed to open file"); return 1; } AIF32HeaderBlock header; fread(&header, sizeof(AIF32HeaderBlock), 1, file); fclose(file); print_aif_header(&header); return 0; } Obviously, to do more advanced examples, it will need more training knowledge than that article! So, to reach points like to use AI to write WIMP Applications or converting AArch32 to AArch64 (I have seen someone trying this), the amount of knowledge will need to be a lot higher than what I have fed it in the example, and the knowledge has to be written following that structure type, so quite an endevour. |
Steve Pampling (1551) 8170 posts |
Ah, Copilot, the item that gave our EUC guys such fun. New active integration with Edge seems to intermittently kill/hide the task bar, but not if you shift the task bar to vertical mode. |
Paolo Fabio Zaino (28) 1882 posts |
Steve, let’s please stay on topic, this is actually a very good and interesting point Julie made and deserves more digging IMHO, hopefully people at ROOL are watching, because, for example, reorganising the ROOL’s Wiki to be AI friendly could help quite a bit! To your point, you are confusing (apologies if this term may sound too aggressive) Copilot the AI model and Cloud side infrastructure (that is what we are talking about) with a simple client developed to integrate with a browser. These are two different things. For instance I have my own client for RISC OS, for my own AI models, those two things are two completely separated softwares and a client usually uses a REST API client to talk with an REST API Server that control the AI Model data processing, so, technically there are literaly 3 different softwares in the stack you’ve mentioned. HTH |
Rick Murray (539) 13840 posts |
While assisting with RISC OS coding might be useful, I’d wait until these things are even slightly capable of recognising and understanding these little things known as software licences. While RISC OS is permissive, other things out there are not. The current situation of ignoring the problem, sticking fingers in ears, and saying “lalalala” is not a workable solution.
I don’t have Copilot, but I’ve tried getting ChatGPT to write some BASIC. It seems, sadly, to be a bit of a mishmash of “almost BBC BASIC” and the odd bit of random other BASICs. I’ve had it say “PAPER” to set a colour. Wasn’t that Oric BASIC? Oh, and don’t get me started on the absolute gibberish it spits out when it tries to do a VDU command. I think, with a legacy system such as RISC OS, there will always be a certain amount of noise unless an LLM is specifically trained on only RISC OS sources. |
Paolo Fabio Zaino (28) 1882 posts |
Indeed. One thing to mention, however, is that training a Large Language Model solely with source code written for RISC OS will take a long time. To be “independent” and write code on demand, the model needs a substantial number of examples. As a slightly off-topic comment: This is why I don’t like the term AI; it’s not “intelligent” it’s statistical modeling and correlation. For example, the repeated use of “RISC OS” (in the material I shared above) helps create a significant statistical correlation. Therefore, to become useful, it requires many examples, solving the same problem in various ways, and contextualizing these examples. Apologies for the slightly off-topic comment; hopefully, it helps explain why Copilot can’t handle RISC OS well yet. On a side note, I have written over 32 “small” programs in ObjASM for the next videos on the channel (this is because the community has voted ASM as the most requested language to cover in the videos). They are all verbose with comments and cover all aspects of coding in ASM on RISC OS. Once the videos are recorded, I’ll upload all the sources to the RISC OS Community Org on GitHub. This will certainly help Copilot a bit. I suggest that volunteers rewrite them in BBC BASIC, Python, etc., when they are available. I’ll also need to do the same in C++ as the community has voted it as the second most requested language for the videos. |
Rick Murray (539) 13840 posts |
🤯…it’s 2024 people! The OS sources are really the only place you ought to be seeing assembler these days. 😉 |
David J. Ruck (33) 1635 posts |
Don’t use automated word guessers to write code. If you can program, do it yourself, if you can’t, do something else. The last thing RISC OS needs is the low quality buggy crap I see lazy ‘coders’ thinking they can get away with churning out using AI on other platforms. |
Rick Murray (539) 13840 posts |
I’m still waiting to see a translation of assembler to C that isn’t a painfully literal mess that is, arguably, less clear than the original assembler (because any half competent C coder just wouldn’t write code like that). Maybe one day we’ll have an AI bot that is capable of understanding blocks of code to create equivalents. That day is not today. |
Paolo Fabio Zaino (28) 1882 posts |
Why? Does this mean you want people to manually write the famous unit (and integration) tests that they often neglect anyway? Also, weren’t you the one who criticized my very reasonable and logical post in the thread where people were proposing to hack a portion of RISC OS to make it 64-bit, even though we still do not have an official toolset that supports AArch64? Let me remember, you said something like “everyone is entitled to their opinion,” right? Maybe this is the same case? We have been automating repetitive and boring tasks in Software Engineering for ages, and if Copilot can help write these mundane portions of code, why not use it? Using such tools doesn’t mean people don’t know how to code. By that logic, people using C wouldn’t know how to code because it abstracts away repetitive portions of Assembly. Similarly, those coding in Java wouldn’t know how to code because Java abstracts away repetitive and tedious parts of C. This reasoning seems flawed to me. |
Colin Ferris (399) 1813 posts |
AI created Dr Who ??? |
Paolo Fabio Zaino (28) 1882 posts |
Tell me about it! XD – Well the code is done now, hopefully, with the video released, that will put (finally) an end to this! (I know I am dreaming here lol)
I hear you, my friend, but for the (hopefully) last time, let me repeat that the issue you’re mentioning isn’t caused by AI itself. This problem is caused by humans — the same ones who still copy our open-source software code and claim they wrote it, similar to what you imply AI does. But how? If you read carefully, word by word, my previous comments1, you’ll notice something crucial: AI doesn’t learn anything by itself. It’s not like OpenAI (or others) just downloaded some code and fed it to an AI. They actually hired people to categorize and classify content, including code, to enable the AI to use statistical modeling and correlation. When you hear Musk, Gates, and others talking about apocalyptic scenarios where AI is going to wipe us out or is as dangerous as the atomic bomb, you’re hearing from people who have invested billions of dollars into AI research. They are trying to create hype to ensure a significant return on investment, which is far greater than what AI is currently capable of. I am writing an article for my blog that will explain, with mathematical analysis, how the so-called AI evolutionary model actually works. AI doesn’t do anything extraordinary; it’s just a piece of software and a statistical model that follows patterns, provided the model is built with enough well-classified examples. This is why self-driving cars are still not a common reality. This is also why many other scientific advancements are still not mainstream. Yet, we are constantly bombarded by famous people telling us that X, Y, and Z are the future. These are essentially research funding ads, hoping that someone working in the field will come up with a breakthrough to overcome the current plateau (like GPTs combined with GPUs before and NPUs now have helped AI research progress to the next stage). I hope this makes sense. I have tried to use an English corrector for my post, but if it’s still unclear, here is a quote from one of the many sources about how GPT3 works: “AI models like GPT-3 do learn from vast amounts of data, but the data curation process often involves human oversight to ensure quality and relevance. It’s not just about downloading data but about preparing it in a way that the model can understand and learn from it effectively.” So, it’s humans who have decided to break software licenses, not AI. 1 Yes, I know my English is boring and I tend to put way too many details in what I write, I apologies for that. |
Paolo Fabio Zaino (28) 1882 posts |
No Colin. As probably we all know well, Dr. Who is a a member of the Time Lord society, and Time Lords were created by Sherwin (or so we were told in an inteview long time ago) ;) |