Weird BASIC is weird
Pages: 1 2
nemo (145) 2546 posts |
Basic has a lot of strangeness in it. I raised a ticket for a bona fide bug in the interpreter yesterday, but there are other “bugs” which can’t really be fixed. This one will make you sad. IF FALSE THEN REM This isn't going to do what you expect… ENDIF PRINT "Success?" Yeah, that’s an ellipsis at the end of that comment. :-( |
Clive Semmens (2335) 3276 posts |
And seemingly only if the ellipsis is the last thing on the line. Any idea what’s going on there? |
nemo (145) 2546 posts |
The very few bits of the BASIC interpreter that need to jump ahead in the code have a hard time because the end of a Basic loop is contextual. For example: PRINT "Starting loop" FOR X%=1 TO 10 IF X%AND1 PRINT "Odd!":NEXT IF (X%AND1)=0 PRINT "Even":NEXT PRINT "Loop finished" Horrible, but works. The reason FOR X%=1 TO 2 FOR Y%=3 TO 4 FOR Z%=5 TO 6 IF Z%AND1 NEXT Y% NEXT:REM This is only NEXT X% because of the previous line However, for WHILE has to skip over any nested WHILE/ENDWHILE loops. IF has to skip over nested IF/ELSE/ENDIF constructs. WHEN skips over nested CASE/ENDCASE blocks. In the case of IF and CASE, the block is actually recognised by the last token on the line – Unfortunately, Basic doesn’t check that what looks like a THEN or an OF is actually on the end of an IF or CASE — it can just as easily be the same byte on the end of a REM. This kind of error is quite common in BASIC — you can’t have the character with the same byte value as the ELSE token in a string in an IF statement – BASIC can try to interpret the rest of the string. (Well, I fixed that in my version, but you see what I mean). OF is the same as Ê, which is quite hard to get on the end of a REM, but THEN is the same as ellipsis, and putting that on the end of things is what it’s for… So, an unfortunate coincidence of deliberately sloppy code, choice of token and Latin1 character set. The token was chosen long before the Latin1 chr set of course, and the code is sloppy for speed. Incidentally, the fact that a multi-line IF/ELSE/ENDIF requires THEN to be at the end of a line, and ELSE and ENDIF to be at the start, whereas WHILE/ENDWHILE can be anywhere in a line, is why IF/ELSE/ENDIF is much MUCH faster than WHILE/ENDWHILE, and rather than relying on the WHILE, you should check for FALSE on entry: IF Z% THEN WHILE Z% ... lots of code ... ENDWHILE ENDIF Looks tautological, but goes literally ten times faster. |
Clive Semmens (2335) 3276 posts |
Thank you. Understood. Interesting. I’d realized ellipsis must be a token (no idea which one), but why BASIC was looking for tokens after REM and before line end I didn’t understand. In all my years of programming in BASIC, I’ve never collided with any of these things! Nor, incidentally, have I ever used a variable after NEXT – had never noticed that that syntax existed, and don’t like it now I know about it. Just reread the BASIC guide on the subject, and it not only allows it, it suggests that one ought to do it! The idea of making NEXT conditional (other than in the same IF structure as its corresponding FOR) is, as you say, HORRIBLE. As is terminating that Z% loop implicitly like that. When I write REMs, I might well put three stops on the end of a line, but happily my text editor doesn’t automagically turn them into an ellipsis… |
Clive Semmens (2335) 3276 posts |
(And when I first read your topic title, I thought, “Weird BASIC? I’ve not come across that language. Sounds interesting, but not sure it sounds useful – from nemo??”) |
Rick Murray (539) 13840 posts |
Is there a list somewhere of these…quirks? |
Steve Pampling (1551) 8170 posts |
I’m sure you can see the problem in that:- anything written with the assumption that feature exists is useless where it isn’t. To do a bit a of thread cross-fertilization the feature set in the Toolbox from “The Other Lot”, as you put it, was largely useless since it wasn’t available to everyone as the fail back didn’t exist. Clive said:
However my habit of producing self documenting code/scripts has caused me to hit things like this without understanding why it happened. Ironically if I was a better coder I probably wouldn’t have to comment things as much… |
GavinWraith (26) 1563 posts |
I once asked on this forum whether BASIC had ever had a grammar, or a specification. Oh how folk laughed. Anyway, now you understand better why I am a devotee of Lua. The weirdnesses of BASIC arose because it was never thought through. That shortcoming is down to its date of birth, and a reluctance to start with a clean sheet of paper, not any lack of talent among its progenitors. Other languages also had rocky beginnings simply because the salient aspects of language design were not understood at the time: Lisp, for example, was a right mess with all its dialects behaving differently, until the late seventies when Scheme was introduced. Over the years the number of languages and their interpreters that have been attempted must run to many hundreds; some ad hoc flashes of experimentation, some serious efforts with money behind them, some amateur, some professional. It has been suggested that many languages are designed in response to perceived failings in previous languages: for example, Pascal to lack of typing, Modula to lack of code re-usability. |
Steve Drain (222) 1620 posts |
A few are documented in my StrongHelp BASIC manual. Not this one, but the related ELSE one is there. Others are referenced in the Basalt manual, because either they have been worked around or exploited. My policy is to always write ‘well-structured’ code and side-step the quirks. ;-) Note that this quirk does not happen if you have Basic$Crunch set, which I always do. |
Steve Drain (222) 1620 posts |
… for the initial FALSE case. Otherwise it make little difference, does it? How about keeping the loop short: |
Rick Murray (539) 13840 posts |
That’s because part of the reason compilers fail with BASIC is because the specification is the implementation of the language itself. As bad as BASIC may seem to be in some respects, it was extremely fully featured compared to some that were utterly deplorable. |
Steve Pampling (1551) 8170 posts |
Now there’s a fast fix to issue – turn on Basic$Crunch by default. Then unless people are daft enough to turn it off… |
jan de boer (472) 78 posts |
FWIW, there is a syntax definition listed in the BASIC ROM User Guide by Mark Plumbley on pp. 319-326. It’s available from www.dragdrop.co.uk on the 55 books CD. Whether it’s enough to use as guidance for an existing parser/compiler, is the question. E.g. when using CALL with parameters, a compiled program has to store variables just like BASIC does, otherwise included assembler cannot find them back. There is more weirdness – inclusion of DEFFN and DATA in other subroutines, REPEAT’s are allowed without UNTIL, THEN after WHILE, excess ENDIFS in a subroutine, with multiple DEF FNs/PROCs only the first is used, functions that can return different variable types e.g. DEFFNa(a$,a,a%,par%):IFpar%=0:=a$ELSEIFpar%=1:=a ELSE=a%. I think a lot of unwanted weirdness could be warned against by your program editor, but a program checker therein needs a good parser, which is not available atm, afaik. And to remove weirdness from BASIC could potentially break existing programs. |
Rick Murray (539) 13840 posts |
Yay for laziness! I use ellipsis a fair amount in written speech, but since I’m lazy I just press ‘.’ three times. ;-) |
Clive Semmens (2335) 3276 posts |
Is that laziness, or sensible spelling? :-) |
nemo (145) 2546 posts |
Rick asked
<points at head> Steve said
Bravo, and I fully agree. But it’s FINE to fix BASIC if it is available as a soft-load for old OSes. For the record, my version is started with Gavin suggested
<bristles on Sophie’s behalf> ACTUALLY, this class of problem was triggered by the internationalisation work on the second BBC Master MOS. Prior to that you’d have had no reason to have a top-bit-set character in a REM (but the ELSE-token-in-string bug was present back then). These things are this way for speed, not because they weren’t anticipated. The only way of avoiding this kind of problem is by looking at every byte of every line, and that will slow BASIC down – compare WHILEFALSE/ENDWHILE with IFFALSE/ENDIF and you’ll see the latter is ten times faster than the former, and this is why. Steve said
Yes, which was on my mind because I’ve been working on introducing iterators to FOR. Very exciting, but does require the no-loops-at-all case, so I looked at the precedent… and what happens whenever I look closely at something? Another of the Steve’s suggested
Which is fine if you’ve fixed the various crunch issues, like jan said
…but you should ALWAYS use CALL with parameters… or you may end up calling OSBPUT, OSRDCH or OSBYTE by mistake! (I believe that’s been removed from RO5? Boo, hiss)
The multiple-entry points one is fun, and upsets compilers: DEFFNmsg(A$):LOCALB$,C$,D$ DEFFNmsg1(A$,B$):LOCALC$,D$ DEFFNmsg2(A$,B$,C$):LOCALD$ DEFFNmsg3(A$,B$,C$,D$) ...
The rules about which endloops can pop which other loops from the stack is not described anywhere, as far as I know. For example: UNTIL can pop FOR, but NEXT can’t pop REPEAT. Arcane.
Can’t picture that one. Many keywords don’t do what you think at all. DEF is functionally a REM. If you’re not inside an IF/ENDIF then ELSE is a REM (and one that uniquely doesn’t need a colon before it). Outside of a CASE you can have a multiline REM like this: WHEN You can type anything you like here BASIC just doesn’t care, and you can have lots of discussion, before ending with... ENDCASE There’s many other such constructions. Many.
I think it’s likely to be a lack of keyboard handler that has convenient keypresses for …, ×, •, etc! |
nemo (145) 2546 posts |
Steve said
Yeah, and I’m sure we’ve discussed efficiency of BASIC’s strategies in the massive-structure Wimp program case. The problem is you’re trading one inefficiency for another – the overloaded and (until recently) unsorted PROC/FN chains. What a pity they weren’t part of the variable chains. If only there was a way to keep a reference to a PROC in a variable. <winks> |
Steve Fryatt (216) 2105 posts |
You mean like
Oh, sorry, wrong dialect of BBC BASIC… |
Rick Murray (539) 13840 posts |
Unless something has changed recently, it is provided in the plain version of BASIC, but not in the FP versions. |
GavinWraith (26) 1563 posts |
No disrespect intended. I think a lot of BASIC’s weirdness comes from sticking with line numbers. I grant you that BASIC’s format does give you faster scanning, but I do not think that compensates nowadays for its disadvantages. I may be sticking my neck out here, but I think most modern languages like to treat the source code as plain ASCII text in which blank space and newlines carry no semantic content (excepting Haskell and Python, about which there remain a few contentious issues), in case ill-behaved mail-servers should mess with the meaning of programs being sent as email. Interpreters tend to be a pipeline of processes, of which the first stage, lexing, transforms the stream of bytes into a stream of tokens and literal values. It is at this stage that comments are recognized and discarded, so that later stages (parsing, code generation, … ) simply do not see them – though the lexer usually records newlines so that error messages can report where the error was raised. So lexing needs only the most superficial aspects of a language. The point of having a pipeline with the separate stages being as independent as possible is that it becomes easier to modify or extend the language later; you get all the usual virtues of modularization. But as far as I can see, the BASIC interpreter does not do things this way. |
Steve Drain (222) 1620 posts |
Careful; BetaBasic is an actual language for the Sinclair Spectrum. ;-) I had a lot to do with the author back then. It is really rather good for structured programming and has been quite an influence on me. |
Steve Drain (222) 1620 posts |
I’ll bite. ;-) Basalt allows this with the syntax
Space as a formatter is a bit odd. A comma is always an alternative, isn’t it? |
Steve Drain (222) 1620 posts |
The efficiency depends a bit on the cache hits, and that is not in the programmers control. If
Or:
;-) |
nemo (145) 2546 posts |
Oh jolly good, though it should always have been a command-line option, or switched by program size or something. I have a contrived example that “happens” to assemble code at &FFF4 and then Surprise! can’t call it (without a parameter). Gavin thought
TBH they’re the least weird bit. The bug I ticketed a few days ago is on another level: I%=1 WHILE I% x=ACS(I%) I%=0 ENDWHILE PRINT"Success?" This works fine… unless I% is zero.
BASIC interprets exactly what you wrote. It is possible to get it to CRUNCH your program before running it, and that will discard the REMs.
Presumably you’re having to check every byte for nested FORs so you can find the matching NEXT, which may be a comma, and which may be sooner than you’d think due to named NEXT vars?
Well it’s a separator, and absolutely required in many places, not least However, usually the space is needed for tokenisation to work, but not for interpretation… but the foo% = PTR(PROC_bar) I knew you’d have done that! |
Rick Murray (539) 13840 posts |
I don’t. Line numbers are pretty much a non issue unless you’re stuck in the mindset of using GOSUB and GOTO.
Arguably so does BASIC. It just happens to convert that plain text to and from an intermediate tokenised representation.
With a few exceptions (which is the one that tab offsets have meaning?), that is generally true. However, note that because of this, the language has to have alternative means of delineating sections of code. I C-like languages, this is usually curly braces and semicolons.
The problem is, interpreting is slow. Very few languages actually do it. PHP IS a widely used interpreted language, but what happens “under the bonnet” is that the text code is translated to bytecode, and that is cached for the next time… BASIC carries a heritage of its era. The tokenisation is to save space in memory more than anything else, and the interpreting is because that’s what worked on eight bit hardware. If I was writing a BASIC compiler, I’d prefer to work with tokenised code so at least some basic analysis has been performed. It would also be useful to attempt to create a formal specification of BASIC – including the weird edge case issues (even if not intending to support them).
He he, we’re discussing dated stuff in BASIC and you casually mention the fact that in 2019 we might still run into a mail server that chokes on eight bit content! |
Pages: 1 2