Ticket #430 (Invalid)Thu Nov 10 22:00:13 UTC 2016
Serial input buffer on Pi shows non-empty but is empty
Reported by: | Chris Hall (132) | Severity: | Normal |
Part: | Release: | ||
Milestone: | Status | Invalid |
Details by Chris Hall (132):
There is a very subtle bug in the serial module that was shown up by SatNav. What I think is happening is that the Serial_Op call returns carry clear even though there is no character waiting and SatNav therefore gets stuck in an infinite loop (due to an otherwise harmless typo). The code is:
<pre><code> 1050 REM Try to obtain (only) up to twenty characters in each poll
1060 if%=0
1070 REPEAT
1080 SYS “OS_SerialOp”,4,-1 TO ,r1% ;r2%
1090 if%+=1
1100 IF (r2% AND 2)=0 THEN
1110 REM Character in buffer (C clear)
1120 IF r1%<>-1 AND r1%<>0 THEN
REM add character CHR$(r1%) to string
ENDIF
1580 ENDIF
1590 UNTIL (r2% AND 2) OR i%>20:REM Buffer empty or 20 attempts</code></pre>
Basically SatNav (prior to 1.00) runs happily for a few hours and then appears to freeze, ALT-Break will cause it to quit. After investigation, the loop that it was stuck in is shown below: after a few hours, it says, within a Wimp_Poll null reason code, that there is always a character in the buffer, returns (so far as I can tell) either -1 (preserved) or 0, and because the UNTIL statement has the wrong variable (i% instead of if%) isn’t terminated after up to 20 attempts to extract a character (which is actually a sort of work around anyway as no character limit should be needed). Twenty attempts every Wimp_Poll is plenty for a serial buffer size of 110 or so bytes that is being filled at 9600 baud (i.e. at a maximum rate of about 10 characters per centisecond) with only up to 400 characters each second (mostly only 150 characters every second depending how many messages are being sent). Twenty characters per centisecond is 2000 characters per second after all. I have corrected the UNTIL statement now so that if the bug appears it is worked around. Note the -1 parameter in R1 is simply to flag (separately from the carry flag) that no character is available as R1 is preserved if no character is fetched.
This occurs with a 26 October 2016 low vector rom from Update5L and once the character limit terminates the loop (despite there being an apparently non-empty serial input buffer) by the time of the next Wimp_Poll (or if not the next then within a few seconds anyway) things have recovered.
NB I have to check for unwanted null bytes as the checksum is fairly primitive and would not pick them up but they would cause problems when decoding the string.
Changelog:
Modified by Chris Hall (132) Sat, January 21 2017 - 14:28:17 GMT
It is possible that the ‘freeze’ occurs for another problem with the serial port as finding the source of the freeze was impractical. Hence a fix submitted 21 Jan may have cured this. Will do some more testing…
Modified by Jeffrey Lee (213) Sat, January 21 2017 - 17:58:13 GMT
If the WFI deadlock fix in the Portable module solves the problem for you then that would certainly be nice – as you can probably guess I was trying to recreate this issue when I encountered the deadlock. But since you mentioned that alt-break allowed you to recover then that suggests that it’s a different problem that you were hitting (the WFI deadlock was a complete CPU deadlock – even a JTAG debugger couldn’t break in)
If you’re still seeing the problem, then it would be good to know what model(s) of Pi are affected. So far the only issue I’ve seen was the deadlock, but my setup isn’t exactly the same as yours since I’ve got an Iyonix at the other end of the serial link rather than a GPS hat.
Modified by Chris Hall (132) Wed, January 25 2017 - 10:24:21 GMT
- Status changed from Open to Invalid
I have tried for some time with the following test programme:
10 WHILE NOT
20 SYS “Wimp_Poll”,mask%,block% TO reason%
30 CASE reason% OF
40 WHEN 0:
50 PROCrefresh
60 ENDCASE
70 ENDWHILE
80 SYS “Wimp_CloseDown”
90 END
100 :
110 DEFPROCrefresh
120 IF serial%=TRUE THEN
130 a$=""
140 if%=0
150 REPEAT
160 SYS “OS_SerialOp”,4,-1 TO ,r1% ;rf%
170 if%=1
180 IF (rf% AND 2)=0 THEN
190 REM Character in buffer (C clear)
200 IF r1%<>0 AND r1%<>-1 AND LEN<200 THEN a$=CHR$(r1%)
210 IF r1%=0 THEN ERROR 27,“Zero character received.”
220 IF r1%=-1 THEN ERROR 28,“Char present but -1 returned.”
230 ENDIF
240 UNTIL (rf% AND 2):REM OR if%>100
250 REM Buffer empty [or 100 attempts]
260 ENDIF
270 ENDPROC
as the problem seemed to be that a character was present (carry clear) and that due to a typing error at line 240 repeat attempts were made (without returning to RISC OS) until the carry was set. Have left this running for hours now and the problem has not recurred. However I think an alternative explanation is possible – looking carefully at the checksum calculation I now realise that it assumes that the line is terminated by nn[EOL] and the programme would also freeze if there was no ’’ on the line. As there is no handshaking on the serial connection to the GPS module, this can occur. In fact I have already spotted that the longitude (which I would expect to be of the form ‘,00231.4567,’ meaning 2 deg 31.4567 min) is sometimes received as ‘,231.4567,’ having passed the checksum test. As the checksum test is an 8bit exclusive OR of each character in turn a missing pair of identical characters would be overlooked. If ‘00’ could be omitted then so could ‘*’. Hence evidence poor for this bug existing. Sorry!