Automated testing and CI
Charles Ferguson (8243) 427 posts |
Just over a year ago I released the JFPatch build service. Then in October renamed it to the RISC OS build service, and in November announced its capabilities. But in that time, I’ve not heard from anyone using it. I don’t have any logging or recording of the interface, so… maybe it’s been useful to people, but I don’t know. I’d like to understand how the build tool has fared with developers, and what degree of interest there is in automated testing, given the seeming lack of any interest in the build service. If you’ve read any of my comments on various forums in the past, you’ll probably know that I care a lot that systems are designed well, are built for robustness and compatibility with other systems, are tested well, and that developers are able to get useful information out when things fail. This is, after all, the goal of my primary project – RISC OS Pyromaniac. So… if you’ve got any comments on the build service, or in the way in which you approach automated testing on RISC OS, I’d be interested to hear from you. I’ve created a small survey to gather a little bit of information, but if you’d like to reply here, or to me directly at gerph@gerph.org, I would be interested to hear comments. You can find the survey here: https://survey.gerph.org/index.php/613787 |
Paolo Fabio Zaino (28) 1882 posts |
@ Charles I have tried it, but the followings kinda stopped me from exploring any further: There’s no WindowManager. Wimp Tasks don’t exist. While I totally love the idea and really want to use on a more consistent base, I have few questions to ask:
Thanks in advance for your answers. |
Charles Ferguson (8243) 427 posts |
I think you’re mistaken if you think that those things make the build service unuseful for you. I think, though, that they are easy to resolve… 1) Regarding the things that the build service does not have: The build service is intended for building and testing the system, and for testing it’s useful to have the full capabilities. But the problems with applications largely don’t lie with the desktop integration (although that does happen) but with the code that underlies them. That means that testing in the Wimp isn’t necessary, for the vast bulk of the testing you will write. The same applies to the graphics system, the sound system and the printer/serial. None of these are necessary for the vast bulk of the testing. Most of these exist in some form, sufficient to allow you to test that tools work properly, and that’s what the system does – provides the necessary interfaces so that you can do what you need, without having a full blown system that makes it complex. 2) Regarding not having interaction: The build service is just that – it’s a service, not an interactive system, so you don’t want or need to have interactive components in it. If you’re thinking of it as a user-interactive system, you’ve missed the point of automated testing. Automated testing is intended to be used to test unattended without interactions. The purpose of automated testing is two-fold – providing tests that are repeatable, and can be run without human intervention. 3) Regarding the fact that the lack of these things stops you going further: Testing – and this is the biggy that underpins all of this – is not just about running things in the desktop, and that’s where people need to change their mindset. I have over the years heard so many people say that it’s not possible to test their desktop application, or their module because a) it’s complicated, b) needs interaction or c) is prone to crashing (or a combination). The issue of complication is a developer thing, and you need to split your tasks down to things that are simpler, and are testable. Almost always this is because when writing the code it has not been done in a modular manner that allows for testing. So you identify the places where you have parts of the code that can be exercised independantly without affecting the whole system. These parts are able to be reasoned about without reference to the whole, and you can therefore make better assumptions about the system. In code that was not designed modularly this will almost certainly mean that code that was mixed between dealing with interface operations and dealing with functional operations will need to be separated – that’s something that makes for more manageable, maintainable and testable code. The issue of interaction is actually a system integration issue, and should not be a blocker to applying testing of the parts that don’t need interaction. If you’re saying that your problem is that sometimes your tool fails for other people and you don’t understand why, then you need to go back to fundamentals – what do you know works? If you’ve manually tested your application, that gives you a little confidence, but only that what you think should happen, actually happens. And if it’s used by someone else with different expectations that quickly generates uncountable numbers of possibilities. So work the problem from things you /can/ test – test the underlying parts of the system. When you are testing the full system the hardest thing is that you’ll be testing only a small portion of the system that you know about and expect to work. You’ll rarely be testing the error paths, and you’ll rarely hit those hard to reach places that will be exercised when your assumptions are wrong. So stop thinking about testing the whole system as your first line of action, and start thinking about how you can make sure that the foundations upon which it is built are solid. Basically if you’re blocking yourself from testing because you cannot test user interactable components, you’re thinking about the problem wrongly – you’re not wrong that that is hard, but that’s why you deal with the more tractable testing problems. If you know that your foundations are solid, then the likelihood of the system as a whole falling over because of something odd that you missed in a low level library is less. And you can test those foundations by exercising the smaller, and more constrained part of the system – which, you’ll notice is something that becomes possible as you reduce the complication of the system I mentioned previously. The issue of the component being prone to crashing is usually the reason that you need testing, but that isn’t why the ‘being prone to crashing’ is a problem. Invariably the reason for it crashing is obscure, and the manner in which it crashes it usually sufficiently fatal that you cannot get useful information from it. This is heavily correlated with higher level testing. The higher level that you test at – the more integrated the system you’re testing – the harder it is to work out what’s going on. More moving pieces mean that working out which part of failing is hard. More code paths to traverse means that it’s more likely that an early mistake is compounded by others the deeper in the code it occurs, and by the time the failure shows itself to the user or developer, so many things have gone wrong that you find the system to be unusable. And as we know, RISC OS isn’t very forgiving in that regard. In modules, this is worse – a failure invariably causes problems in SVC mode, possibly on interrupts or in events. How do you attack a problem like that, where the environment itself is hostile to your testing? You take the hostile environment out of the equation. You defang it to the point at which it is manageable. For modules that means making your code modular in a way that you can test it like any other application, without involving SVC mode or any other part of the OS that could cause you a problem. That’s going to limit your ability to interact with the hardware, but you should be able to anstract that, feed your code the expected inputs as if the hardware were there. Every system has inputs and outputs, and it may have state. All those systems do is transform the input into an output. If your code is running as an application the only difficulty is triggering the appopriate inputs and seeing that the outputs happen – and this is exactly the same problem that you would have in a desktop application. Instead of having user interaction, you’ve got hardware events or input from the operating system itself. You can fake these inputs, to give you an appropriate level of confidence that things are working. You may notice some repetition in my explanation of how you address problems with testing. That’s because the root of how you attack these problems is pretty much the same in all cases. Reduce the thing you are testing to more manageable chunks. Exercise those chunks in isolation. Make their interfaces understandable to you. Put those building blocks together to make larger chunks. Exercise those chunks in aggregate. Make their interfaces understandable. As you build little bits of test code that exercise your program, make them reproducible and keep them. If you can make them automatically detect success or failure, you now have automated tests. If you can’t then either change the code so that it /can/ detect success or failure, or note down the behaviour so that you can manually check it in the future. A manual test isn’t ideal, but it’s better than no test at all. And if you find a bug, or get a reproducible problem, write a test for it. Don’t just fix the problem in the code, but have something that proves that you have fixed it – and that doesn’t necessarily mean clicking the same set of buttons that the user did, because if you can find the root cause, you can exercise the functions that make for that root cause being a problem. If you’re writing things from scratch, rather than dealing with something that already exists, start out with the mindset that you want to build small things to make the whole. Those small things can (and should) be tested as you go along. It’s a lot easier to introduce testing early on. This may mean switching your design mindset from ‘I want the click on this button to do this thing’ to ‘I want to be able to do this thing’. That may not seem like much of a difference, but it’s significant. If you design your code to do a job, rather than to react to the user, then you can change that user interface more easily – switching from buttons to command line switches, if you want to think of it that way. The effect of thinking of what you’re doing in terms of problems to be solvd, rather than the results of user actions is that you naturally break down each subproblem, rather than try to fit things to the user interface. And that produces more modular code, which is itself more testable. I’ll probably write in more detail about testing at another time, but I think that covers it for now. I think that answers the misapprehensions you have (or at least how I intrepreted your comment and questions) about the usefulness of the system. If I’ve said obvious things, then I’m sorry. I hope that others will find the comments more useful. To specifically answer the remaining questions…
The RISC OS Build service is integratable with GitHub, and GitLab, and any other system you care to integrate it work (Jenkins, etc) through the tools that have been provided. You will find the integration documentation, for use with GitHub and GitLab, on the build.riscos.online site under ‘CI Configuration’ (https://build.riscos.online/ci-build.html). There are a selection of examples linked to from the Pyromaniac support documentation under ‘CI examples’ (https://pyromaniac.riscos.online/ci-examples.html). In addition to the examples listed on the Pyromaniac site, the GitHub topic ‘riscos-ci’ (https://github.com/topics/riscos-ci) is where I put all the CI-able code that I have created. All these examples have comments alongside the CI integration to explain what they’re doing so that someone can take them and modify them for their own needs. More recently there is an hourglass creator (https://github.com/gerph/riscos-hourglass-maker) which uses Unix tools to generate the necessary code, before farming out the build service to actually build the modules, and LanMan98, which I put up on GitHub as a CI component on March 14th, 2 days after its release through the announcement on the ROOL forum (https://github.com/gerph/riscosdev-lanman98/network). As for RISC OS Community… given that anyone can create an account and host repositories on GitHub, I’m not at all sure how the organisation is much better than anyone just creating an account and using GitHub (or doing the same with GitLab for that matter). To take the name ‘RISC OS Community’ as if you somehow speak for that community, and implies that anyone who doesn’t join is not part of the community seems… non-inclusive and like something of a land-grab to make the work seem more than it is. However, that’s just my opinion, cynical as it may be, and has absolutely no relationship to automated testing.
The Wimp isn’t a particular goal for me. The main reason for implementing parts of it is that it’s a huge chunk of code that exercises the system, and forces the implementation of the OS to be closer to Classic, or at least to be acceptable. The Wimp is nice, but the build system is intended for building and testing (which, as I’ve said, doesn’t need to involve user interface interaction), and Pyromaniac is useful for trying to be able to test things, but honestly running desktop applications is not the intention. Essentially, the more you see of the Wimp working, that just shows how complete the implementation is. It is not a part of the design goals.
I haven’t spent any time looking at the Desktop Modernisation project. If you’re building and testing code, then obviously the build service can help with that. But no, I have commitments to my own projects. I assume, thought, that you have looked at how the Wimp was extended in Select, how the Configuration tools were made modular, how the Filer was extended, how FilerAction was made extensible, how Toolbox was improved, and the additional gadgets that were produced, and the future direction that I have discussed for the system (none of which are very advanced and only covered maybe a couple of years of work). And of course, I assume that you will make anything you do testable from the start. One of the questions on the survey asks about factors that influence the use of automated testing, and a response that is pertinent to your comments about the community is ‘More collaboration to encourage better practices’. Specifically what I was meaning by that response was that if people are looking over your shoulder (or you think they will) you write better code. And if you think that they’re going to say “and how have you tested this” when they review your work, then you’ll be more likely to ensure that you’ve tested it well. Moreso, if the people working on it are adament about there being automated tests which check that the system is working properly. So I offer that fortune cookie-esque advice for your project – instill a mindset of testing and quality to your work. If you, or anyone else, has specific queries or problems with the service, feel free to contact me. |
Andreas Skyman (8677) 170 posts |
Regarding most of the above I can only agree: changing how we write testable code is the way to go: Unittests when possible (seems to work today with proper make target?), integration test when necessary (might be covered by the build service, depending on the circumstances), and system tests as an exception (and entirely out of scope of your service). I really like the build service, and I intend to try writing a GitHub action which integrates testing some day soon 0. One thing that would possibly make it easier to integrate with your service would be if an argument could be given to the json-api which tells the service not to return the build artefacts (that is to say, not 0 Famous last words… My flat is apparently in dire need of workmanship after a water leak in the building two months ago, which keeps me occupied, but once this settles down a bit… I’ve seen your examples and it looks very promising. |
Charles Ferguson (8243) 427 posts |
`amu` certainly has always (pretty sure, anyhow) returned a non-0 return code for commands that either returned a RISC OS error, or returned a non-0 return code themselves. You need to be careful if you’ve invoked `amu` from within an Obey file, because Obey doesn’t terminate if there is a non-0 return code (though you can cause it to by checking yourself). I think in one of the CI builds (Nettle? I can’t remember) I had to do some of that checking myself.
I consider unit tests to be just working within your own code with no external dependencies (ie no files, network, other library or component interactions). But that gets loosened some times to ‘testing one library’ – if that library happens to be a thing that reads from a file, then it’s better to just call it a unit test, than to be strict and say it’s an integration test. I’d probably say that using things like ColourTrans or Squash or other utility modules would probably be in this loose definition of unit test. Integration tests, to me, mean working within the interactions of the code that you’ve written, using some of the external system access. In this case, you’re trying to test how your code libraries work together and starting to involve more of the system’s utility modules.
I’d say that system tests are when you’re trying to use it with interactions with the system. Anything above those limited things, and you’re into the realms of proper interactive testing, and Pyromaniac isn’t there yet. Maybe next year, but there’s a lot of other things that are interesting or have greater benefit. This actually reminds me that I have Rick Murray’s ErrorCancel module as one of the CI examples. It uses the Wimp error services to set up a little callback after a short period which presses escape on the error box. Because Pyromaniac provides a text-based error box, this just works, and is actually testable using the current system. Of course, the failure case if it doesn’t work is that that service hangs! But that’s valueable information too. I might throw in a change to make that get tested…
I’m actually not sure what happens in the JSON API if you don’t return anything, so I’ve looked it up… Hmm. Shows how little I’ve cared about the JSON API. So… If you POST to the `/build/binary` endpoint and you get a failure, you get back a 400 response describing the output and the throwback if any. Otherwise you get back a 200 response with the data in application/riscos format. If you POST to the `/build/json` endpoint you always get back a 200 (even if the build fails) the throwback, messages and structured data as a JSON encoded block, with the data itself encoded in Base64. In this case, the value of `rc` is 0 on success, and the return code of any failure (or 1 if there was a RISC OS error). Certainly I can make it not return the build artifact through a query parameter in the interface if necessary, but bandwidth wise it’s certainly not hurting me, and even LanMan98 doesn’t return a very large binary. Do you have any specific suggestion for how to change those APIs? The reason I’d given the JSON API less love was that it was largely a ‘RISC OS users won’t have a way to use the WebSockets interface, but they can easily send stuff through URL fetcher, so let’s make this easy for them’ solution. That was before I created the `robuild-client` tool, which generally makes more sense, as it gives you feedback as it’s executing, which is a little nicer in a build tool (especially if there’s a danger it might hang!). But I’m happy to update the API if there’s something to make it easier to use.
That’s true, but in the case of making a PR on GitHub, it’s actually handy if the results of the build are downloadable – it means that the person reviewing can just download what you’ve built and try it directly. They don’t need to build it themselves, and can actually try the results of what the CI produced. I actually don’t know the limits on how much you can store, but I do know that the artifacts generated by the actions are only stored for something like 30 days. GitLab has a similar default limitation (although I think in GitLab.com you can increase this). My home GitLab server is configured to keep for 6 months, although I may revisit that soon, because the disc that it uses is only 20GB and is getting full. Anyhow, whilst it’s a choice of the developer as to what they want to return, it’s handy to be able to get the successful results to see what’s happened.
The LanMan98 build is the most recent one that I’ve created, which has a Release action present, so that you can automatically generate the Release: https://github.com/gerph/riscosdev-lanman98/blob/master/.github/workflows/ci.yml This only creates the releases if the reason for building was because a tag starting with `v` was pushed, it downloads the artifacts that were created in the build phase, and then attaches them to the release it has created. The release is created as a draft, so that a human may review it before publication. It takes the release number from the version number in the `VersionNum` file when the code was built, and includes this in the archive name. Obviously other people may have different ideas about how to trigger releases, manage version numbers, and other things, but that sort of example is ideal. |
Charles Ferguson (8243) 427 posts |
Examples are always useful to give, so I created a simple BASIC program to exercise the cancelling of an error box. It’s not special, but it at least checks that what the module is meant to do is what it does. In doing so, I found the build service deficient in that it doesn’t allow you to handle input if you want to – as soon as the error box tried to read a character the `input.eof_effect=exit` configuration option kicked in and exited the system. I’ve added some configuration on the server to allow this option to be changed, and so now I can issue `pyromaniacconfig input.eof_effect=none`, which causes the read of a character to just wait for a key press. This then allows the 5 second ticker to go off, and the module sets the escape state, and the error box exits. The changes to the CI configuration, and the extra test program, are in https://github.com/gerph/errorcancel/pull/1. It’s often much easier to see what’s happening in a PR diff than to have to pick it out of the code. The output from the CI is listed in the ‘Did it work?’ section1 of the action2 here: https://github.com/gerph/errorcancel/pull/1/checks?check_run_id=2522957449 And it looks like this: System messages: Build tool selected: ROBuild YAML Return code: 0 Build output: ARM AOF Macro Assembler 3.32 (JRF) [07 Mar 2006] Loading module... Running test... Program renumbered Testing error box cancelling - if this hangs, we failed -------------------- [ Error from Myapp ] My Error Message <OK:O/Return>, <Cancel:X/Escape> ? Escape -------------------- Got response: 2 Took 5.02 seconds Success All done 1 Because that repo uses the JSON API to do the build. |
Paolo Fabio Zaino (28) 1882 posts |
Dear Charles,
Glad to hear they are easy to resolve. I am mostly seeking for a way to automate UI testing. For the code itself we now have some Unit Testing Framework that works on RISC OS and for a more BDD approach I am looking at porting something from Python.
I would not agree with the first part of your statement. For example I had quite few issues with using FrontEnd on some app mostly due the lack of documented examples and such (so yes the usual problem) as well as when trying to write tools to make WIMP programming easier, I obviously need to test those on the WIMP as well (is it opening the menu correctly? etc.). I do agree on the second part of your statement and for that (for example) I use CUnit which seems to be working fine with GCC 4.7.4 (and maybe one day I’ll have enough time to smooth it out to work with DDE too), Fortify, DrSmith MemCheck (for memory leaks, although if this has to be used thought RO 3/RO 4), cppcheck for source testing and few other utilities that are commercial, but that could be customised to analyse C/C++ source for RISC OS.
Sure I agree on that and so I don’t think I am missing the point.
No, I am thinking of automated UI tests, not user-interactive.
Thank you for your comments, from them I evince that you clearly do not know me and so let me take the liberty of saying that I am a professional software engineer, with many many years of international career, so I’ll skip the basic explanation of what testing is for you, hope you can understand.
Never said it’s impossible to test a desktop application on RISC OS, !Keystroke is your friend and works (obviously more can be done). Modules can be easily tested using a BDD approach for example and writing test automations from a “client” perspective. Again I am not asking for user interaction, just UI Test automation :)
Yes RISC OS is prone to crashes (we all know why, so not a secret) and that’s why I have dedicated Raspberry PIs (a Pi2 is a relatively cheap investment for most people these days) with a very simple modification using the P6 (probably people know this more as “RUN”) and controlling it from another RPi, it is possible to “remotely” hardware reset the crashed Pi and restart testing automatically after reboot (we all know how to use !Boot → Boot → Run on RISC OS right?). Generally the P6 is kept high, that means normal operation for the ARM Core, but when shorted then pin D15 is set to logical LOW clearing the IC2 and effectively hard resetting the crashed Pi… Now people have it, prone to crash is no longer an issue :)
Thanks and thanks for the examples.
Cinical? No, maybe a bit superficial yes (apologies for the directness here). The importance of a community with well defined rules, helps in the process of collaborating. If everyone has their own repository and personal rules it becomes hard to collaborate. So clearly personal repositories are great for when people just want to share their code without giving too much importance to a collaborative approach. While a community is the exact opposite and focuses on creating a process that everyone can review and use to work together. The community also helps on creating a single point of reference for coding styles as well as offering the possibility (for new developers that wants to start coding on RISC OS) to have a large source of code from where they can see also different point of view over a piece of code being put together and having the facility of organising repositories into projects as well as discussions and a method to report problems and establish a standardised intercommunication. An organisation can also help when someone is tired of maintaining a project. In that case it’s just a question of new maintainers to take over. While on personal repositories usually it ends up on people forking and continuing the development on another personal account and that tends to create many repositories of the same application and complexity in identifying which one is the latest or is the main trunk. Although if commit-dates may help on this regard, what if a specific developer is not following good testing practices and so he is producing code that has more issues than the older one? and so on and so forth.
Again Charles, my intention is not “as if I somehow speak for that community”, quite the opposite actually. As a proof of that I am asking questions in relations to choices for the GitHub community on here as well. Such questions include every aspect of the choices including the review of each document and the process, so the entire RISC OS community can provide feedback and opinions and therefore be constant part of it even not having an account there. Also transparency is key here. The choice of the name is actually far more thought that it may seem. For example in both cases ROD and ROOL are two Ltds and that may cause some issues to some professional engineer like me to collaborate directly with them (even if non-profits, but still Ltd). In the professional world of Software Engineering we can have strict contracts that forbid us from collaborating with “company-like” structures and therefore having a non-company community that clearly specify its pure open source nature and non-profit, non-market nature can facilitate the joining of professional engineers. The concept of community is also thought so that non-developers can join and help. When tasking is well thought and organised non-developers can join and contribute with what they can. You may argue that this is still possible on a personal repository, but maybe (just maybe) people would like to collaborate in a common ground and if the process is standardised and has received reviews from everyone interested, maybe it’s easier to be accepted and/or even shared as well as feeling part of it.
Not at all, again every single RISC OS user is involved in the decisional process, reviews and requests ( ROOL forum is part of it) even if this causes me a bit more work! This is definitely a clear message that everyone is included and in their own timing and availability and even their possibilities to collaborate. On top of that EVERYONE is welcome and EVERYONE can join or collaborate when they fill like. We also started from a clear code of conduct that expressly states the inclusive nature and desire of the community. And finally while ROOL has become the HUB for a certain segment of the RISC OS community (and with merits! and I always thank them for all the impressive achievements they have done over the years, especially in the release of the sources and the change of the licensing), RISC OS GitHub community also looks at the past (that past you are such an important part of as well), at what is now considered retro. So we want it to become a repository for everyone who wants to share code (not just RO5 code) and see it being maintained over the years either by the original author or by others. I truly have at heart the continuity of RISC OS (all of it), so I hope you’ll find some time to join us as you did to join us all on here.
Ok understood, so will find other ways for WIMP tests automation.
Ok, thanks, it was worth trying :)
Correct, not just looking, I have created a full rack system running older RISC OS machines (including RO 3.x) and obviously various releases for the ROS Ltd (from 4.02 up to the latest 6 ever released). So to test directly and make sure everything will work there as well and integrate as much as possible (this is where, if you have time your code reviews could be extremely precious actually! just sayin…).
Absolutely, which is why we are actively looking into integrating the RISC OS Community on github with your RO Build Services and not just for testing purposes, but also for build automation. The idea is to integrate with your services as part of our regular process, so people would just use ROBS as part of the regular activity on the community. If you prefer to take this discussion on email I am happy to, although if I am totally open, so here is totally fine for me :)
That’s one way of seeing it, I’d say that the ROS Community on github should also become a place where to form new developers for RISC OS (where new means new to coding on RISC OS, not necessarily new developers in general). So we want to share as much coding as possible, good practices, testing as fundamental part of the development process and also introduce Enterprise Software Development where possible on ROS. We also started to organise repos into “Tutorial Projects” organised by programming language, for example.
Absolutely and we started from the very foundations of that community, it’s part of the code of conduct and the commitment to quality is literally part of the code of conduct :) If you or someone else has ideas to improve the community rules and processes please give us your feedback, again link here on this exact forum, so no need to do anything special. Of course if you prefer github then contact us there and again EVERYONE is welcome to join :) |
Colin Ferris (399) 1818 posts |
Pardon me – what does the below mean BDD? - Modules can be easily tested using a BDD approach - |
Chris Mahoney (1684) 2165 posts | |
Chris Hall (132) 3558 posts |
CI means cast iron to me – not sure what you mean. |
Jeffrey Lee (213) 6048 posts |
Continuous integration. Wikipedia’s intro describes it as making sure developers are constantly merging their changes into the main repository, but most of the time when people talk about CI they’re talking about a system which uses that central repository to perform automated building, testing, and (maybe) release/deployment of the code. |
Paolo Fabio Zaino (28) 1882 posts |
Absolutely true, which is also another important reason to have clearly agreed processes, given that a lot of modern practices do present “different perceptions”. One of the “historical” aspects of CI gets deep into the Code Design mindset of a developer. In practical terms it means:
However all forms of modern software engineering practices seem to start from the formalisation of the requirements (and there are obviously many opinions there as well). So, as a general simplification, one could consider that “whatever” form of formalised requirement constitute the starting point of every SE practice method. A way to formalise such requirements is also fundamental to the applicability of CI (in other words it helps defining the feature as a smaller unit for example and therefore also its implementation in a CI fashion, if you allow me to use this wording) And forgot to mention that CI culture also impact Solution Architecture practices, because to be able to extend a solution through CI using the above process also requires that each architecture is designed to be extendable and modularised (in other words if one is trying to do CI in a Spaghetti-like architecture for example, then life could become a serious nightmare) |
Theo Markettos (89) 919 posts |
I filled in the survey but I might as well de-lurk my comments here. Most of the stuff I’ve done in the past decade on RISC OS has been to do with building software. Building stuff is Suprisingly Hard. It is generally fine if you have a folder of C files, but projects from elsewhere tend to use tools like autoconf, cmake, ninja, etc etc with complicated configuration stages before getting anywhere near a C compiler. Getting things to build for RISC OS is often substantial work. So the testing service is great and I’d love to use it, but I’d like to separate that from the build system. My input is not a git repo of C files, but the output of a complex automated build pipeline that spits out ELF binaries. Because Pyromaniac is a cloud service I can’t adjust the build side of things, which means I don’t have a suitable way to inject artifacts to be tested (and environment etc). In essence what I want to test is both the binaries and the build environment that created them. I’m not really asking for anything given lack of time to work on things of late, but just flagging up my reasons for being enthusiastic but not actually being able to use it. |
Theo Markettos (89) 919 posts |
Another useful thing, and please correct me if I’ve missed anything in this direction, would be for the community to coalesce on a framework for running and reporting tests. For example, to be able to indicate what tests are available, how to run them, and report the results back in a machine-readable format (eg JUnit or TAP are used elsewhere). With the framework in place, adding another test would just be a case of declaring it somewhere and writing the code. ‘make all-tests’ or whatever would run it and all the other tests and generate a report of what passed or failed, that could then hook into other tools. Currently it seems like you have to write your own testing framework before you write any tests, which is a disincentive to get started. |
Paolo Fabio Zaino (28) 1882 posts |
+1, also because for the building we might use different compilers (and different languages) as well as different building processes. For the testing side I would like to understand how does one can configure testing procedures for the different typologies of software testing: 1) Clearly the tool is not necessarily a Unit testing tool, that could be done within the realm of the building process eventually (or at least that’s where I usually run some form of Unit tests from, but this is also because for example RedHat build processes have well defined places to literally validate the build using some Unit Testing post/during code compilation) 2) The Testing Service Charles seems to have created so far may be suitable for BDD/TDD testing for functional testing, possibly regression testing and eventually acceptance testing. Not sure about integration testing given that these would probably require a more ‘production compliant’ environment where to test the build component together with everything else and mostly simulating user-interaction through tools like !Keystroke or something more evolved. 3) One thing that also might be useful, but that might be out of scope in this thread, is the possibility of allowing a user in difficulties to run the test procedure on his/her system (to collect the results and information). But for such type of situations we would need an installable tool on the user’s machine (it is possible to build one, but again this is definitely out of scope in this thread). |
Paolo Fabio Zaino (28) 1882 posts |
Everyday more I am thinking of building such tool and/or porting it from Linux. My feeling right now on this would be to focus on a BDD approach, because it might be the more re-usable and possibly what most people would be interested to, given that it focuses mostly on the user usage of the component/application. But I am open to suggestions. For the test form of a typical BDD test we would be looking at the GIVEN/WHEN/THEN form, so a simple parser for the main tool capable of capturing the component behaviour. At the very beginning this could achieved by redirecting the output to a pipe and obviously supporting the usual other behavioural forms like check for file creation, intercept wimp messages etc… and analyse the content via regex maybe. Again this is just an initial thought, nothing more, I’ll open a detailed implementation topic on the ROS Community on github if anyone is interested. Just as a general example it could be something like:
|
Charles Ferguson (8243) 427 posts |
Hiya, Quickly, I’m not ignoring Paolo’s response or other people’s… I’ve just committed myself to doing a talk about testing on May 17th at ROUGOL, so I’m trying to get my words into the right shape for that. Please do come along if you want to talk. I will try to respond more fully shortly, but I’ve only got 7 days to prepare (and the previous talk was 2 1/2 months to prepare… so don’t expect anything polished). |
Charles Ferguson (8243) 427 posts |
Sorry, just noticed this mention and wanted to make a clarification:
Test Driven Development was not what I was trying to explain, specifically. That’s just one way that you can attack the testing problem. Test driven development differentiates itself from more traditional forms by the tests being written up front – you write the tests first, you show that they fail, then you write the code so that they pass. Whereas in traditional development you write your code first, and then you write the tests to exercise it. Both are equally valid ways of working, and there are advantages to both. I rarely follow the TDD approach myself, and for adding tests to an existing codebase it’s not at all appropriate – your code already exists, so you cannot drive the development with tests. But you can add new features by TDD. Or you can add new features and then you can test. Just wanted to be clear that it wasn’t my intention to say that anyone should or should not do TDD. If it works for you, then great, but it’s just another development methodology. |
Andreas Skyman (8677) 170 posts |
Looking forward to it. Best of luck! |
Rob Kendrick (6084) 7 posts |
Paolo, you might enjoy looking at https://subplot.liw.fi/ which resembles your proposed syntax very closely. |
Paolo Fabio Zaino (28) 1882 posts |
@ Rob,
yup sounds very much like it, thanks! |
Paolo Fabio Zaino (28) 1882 posts |
@ Charles
No worries at all man, and looking forward to attend your presentation :) |
Paolo Fabio Zaino (28) 1882 posts |
Understood, but…
For example, these are all concepts very much studied in TDD… Now here is were TDD would help your definition which seems to lack what is known as Counter-Patterns. How would you test your test? One very simple answer to this is: Write your test first and run it to make sure it fails… Again while you’ve started to describe what could be seen as the first few paragraph of a TDD book, what is also important to mention is the Counter Patterns which are the other side of the Testing Practices (TP) and where most people gets stuck and stop testing because they get to the conclusion that TP are not so useful or worst not natural. The there is also the psychological issue with TP, which are various but can be grouped mostly in two big categories: These (and more) are the reasons why I have mentioned already formalised methodologies which also contains plenty of studies about the counter-patterns and the psychological effects instead of trying to re-invent the wheel on here and through what? Few very long posts? Why instead making a set of videos describing each portion in practice and applied? There are also those, but most likely NONE on RISC OS… just sayin… :)
Absolutely and my mention wasn’t either, it was a reference to the subset of practice you were trying to describe wit your own wording that is stuff that comes from the many years of study and experimentation in TDD and yes they have also been in bits and bytes added to other formalised testing methodologies, but the general agreed source of them is TDD. Again not saying people should use A or B, nor implying you were telling people what to use. Cheers |
Bryan Hogan (339) 593 posts |
ROUGOL meeting details now up – http://rougol.jellybaby.net/meetings/index.html |
Charles Ferguson (8243) 427 posts |
Apologies for delay in responding… laziness / recovering after the presentation, I guess. Quoting Paolo:
You cited it explicitly as a reason not to use the service, so I could only assume that it mattered to you and responded appropriately.
It doesn’t really matter whether you’re a professional software engineer, really; I know you’re competent, but I am answering in a public forum, where others will be reading, so I answered with detail for two reasons – 1) there’s a load of other people who will read it now, and many in the future and 2) I might be wrong, so it offers the reason for my position, which can be corrected if I’m wrong. I don’t need to say ’don’t take it as being condescending’, because you acknowledged that it’s not necessarily for you, but to anyone else reading… no offence taken either way :-)
Cute. I just have my MacBook and VMs and Pyromaniac… it does well enough for me.
As always, let me know if there’s any problems.
Yeah, that was an unhelpful comment by me. Sorry. I guess it’s partly my upbringing and my caution – I don’t claim to speak for anyone, and unless I’m completely sure in myself I won’t claim authority over things. That’s kinda why the gathering of the RISC OS Community name sort rubbed me the wrong way – “very big of you to claim that you’re the community and to dictate how they must act”, sorta thing. Anyhow, that’s entirely irrelevant to this discussion, so… um… sorry! From a separate response:
That’s the biggest problem with most RISC OS software, and that’s why I focussed a lot of the presentation on handling legacy code – I don’t believe that many of the RISC OS projects are designed with testing in mind. And, as you say, that makes this process difficult. But not, I think, intractable. Honestly, if people take a little on board and try refactoring code and doing some testing, or writing something new from scratch that is able to be tested, even if it’s not the way I’ve suggested… that’s got to be a win. I guess what I’m saying is that however you get there, the importance of doing testing needs to be in the developer’s mind. Which I think it pretty much what you’re saying? Like security, testing shouldn’t be bolted on afterwards… but that’s where we are on many projects. |