RISC OS Open: Forum: Curious hourglass behaviour

Dec 3, 2018 9:49pm

I’m writing a multitasking file comparison app, and I’m giving it a good thrashing on some ~2GB files. The app uses very short time slices; one test I use is to play Patience. If Patience operates smoothly with almost imperceptible delays, I think the multitasking is good enough.

My app is using almost all the CPU time, according to TaskUsage. It doesn’t use the hourglass at all. The curious thing is that, for periods of several minutes, I’ve seen an hourglass appear, either continuously or flashing. My app seems to provoke it, although it doesn’t directly cause it (no calls to the hourglass module).

It’s not the only time I’ve seen this; other very busy apps, also not calling the hourglass module, seem to have done the same.

Is there any way to find which app is putting the hourglass up?

Has anyone else seen the same behaviour?

BBxM, RO 5.27 (20-Nov-18).

Dec 3, 2018 10:42pm

Andrew Conroy (370) 740 posts

Do you have any ShareFS mounts active? ShareFS can hourglass if it can’t contact the mounted drive (if that’s the correct term).

Dec 4, 2018 8:37am

Dave Higton (1515) 3534 posts

There are two other machines on the LAN that have drives shared via ShareFS. I’ll have to look whether they are mounted on my machine when it happens.

Although there would be lots of LAN traffic, I can’t see why there should be enough to prevent ShareFS traffic getting through, though. Maybe the fact that the machine is so busy reduces the number of callbacks that can happen?

Dec 6, 2018 10:54am

Dave Higton (1515) 3534 posts

Some more tests show that the hourglass happens even though no ShareFS shares are mounted on the BBxM. There is just a “Drives” icon on the icon bar.

Dec 6, 2018 11:33am

Chris Evans (457) 1614 posts

I don’t think it relevant to your problem but the colour of the hour glass can easily be changed. The only thing I know that does change the hour glass colour is ShareFS which changes it to red if it hits a problem, it seems to be the normal blue when say it is normally busy but then changes to red if it loses connection.

Dec 6, 2018 2:34pm

Jon Abbott (1421) 2651 posts

The only thing I know that does change the hour glass colour is ShareFS which changes it to red if it hits a problem

That possibly explains why my Hourglass is always red, I’ve always assumed it was an issue with GraphicsV not passing on the palette when you switch graphics drivers.

Some more tests show that the hourglass happens even though no ShareFS shares are mounted on the BBxM

That’s curious as the disc imager in ADFFS does a fairly intensive compression, compressing 512 bytes on each Wimp_Poll, but I don’t think I’ve ever seen an hourglass appear. How long after your compression starts does the Hourglass appear?

I’d also be interested to know how you got the multitasking to be smooth whilst compressing, as despite my code calling Wimp_Poll thousands of times during the compression cycle, the machine becomes pretty unresponsive.

Dec 6, 2018 6:10pm

Colin (478) 2433 posts

There is still ShareFS activity when no shares are mounted – to enumerate the shares on the network. This can have problems the same as any other sharefs transfer – this is why there have been problems with shares not appearing. I’d ensure that every computer on the network, including the one with problems, has *sharefswindow 1 set. If that fixes things you may find you can increase the value of sharefswindow on some of the computers.

Dec 6, 2018 8:08pm

David Feugey (2125) 2709 posts

There is still ShareFS activity when no shares are mounted

Hum. Before having SMP, it would be good to offload this kind of component to cores not (yet) used by RISC OS.

Most AMP OSes use only core 0 for the OS and (possibly) the others only for apps. RISC OS could reserve some cores for critical part of the OS. For example SSL code or the whole networking stack.

Sorry to be off topic (is it?).

Edit: a few minutes later, I think it’s really a good idea. It would be much simpler to offload some parts of the OS to other cores than try to adapt the OS to make the apps running on other cores. IMHO, a very valuable short term solution. A tube like system? Only Jeffrey can do it :)

Dec 6, 2018 9:05pm

Dave Higton (1515) 3534 posts

I’d also be interested to know how you got the multitasking to be smooth whilst compressing, as despite my code calling Wimp_Poll thousands of times during the compression cycle, the machine becomes pretty unresponsive.

I’m doing comparison rather than compression. I’ve had the same sort of results with encryption. The good things about both comparison and encryption is that the two data streams (both in for comparison, in and out for compression) are of identical size, and the processing is almost nothing. The good speed and responsiveness come from file transfers that are block aligned, and in fact are integral powers of 2, which gives the fastest possible disc performance. I’m using 32kiB or 64kiB.

I also did a pair of backup and restore apps that are on my website at http://davehigton.me.uk and use zlib for compression and decompression. For compression I use 32kiB chunks of input; clearly I have no choice for the output chunk size. This all seems to fairly fly along too. Only some file types are compressed, but compressing any 32kiB block seems to be almost instantaneous. I’m only using Deflate, which is the quickest (for backup, I think that’s the best choice – your app may have different criteria).

Dec 6, 2018 9:07pm

Dave Higton (1515) 3534 posts

How long after your compression starts does the Hourglass appear?

Maybe a minute or so before it first appears. During a long period of application activity, it comes and goes.

Dec 6, 2018 9:12pm

Dave Higton (1515) 3534 posts

There is still ShareFS activity when no shares are mounted – to enumerate the shares on the network. This can have problems the same as any other sharefs transfer – this is why there have been problems with shares not appearing. I’d ensure that every computer on the network, including the one with problems, has *sharefswindow 1 set. If that fixes things you may find you can increase the value of sharefswindow on some of the computers.

ShareFSWindow has always been 1 on all my machines. It’s one of the first things I set up when commissioning a new machine.

I’d like to understand why a busy CPU and/or network causes ShareFS to protest. If there were significant periods of single-tasking, I could understand it – but there aren’t. The apps I’m working on are considerate citizens :-) They don’t take any long time slices once the files are open.

Dec 6, 2018 9:54pm

Dave Higton (1515) 3534 posts

On the topic of speed: the file comparison app used to compare each block in BASIC. I’ve changed it to do a word length comparison in assembly language, which is extremely quick of course; if there is any difference, it reverts to the BASIC to count the diffs at byte resolution. The time for ~2GB files came down from about 1642 seconds to 905 seconds.

Dec 7, 2018 9:45am

Colin (478) 2433 posts

I’d like to understand why a busy CPU and/or network causes ShareFS to protest.

From what I’ve pieced together so far the problem is the lack of preemption which means that multitasking has to happen at the application end rather than the device end so a swi call can’t have the thread wait in the background for a resource to become available as there is only 1 thread.

Networking requires callbacks to move the data from the interrupt context to the main thread and if the callback doesn’t happen incomming data just uses all the mbufs and you get dropped packets. When packets get dropped sharefs sits waiting for a reply that has been dropped and you get a timedout hourglass. You can also get a hourglass on your machine if the remote machine is having problems replying.

If you are doing something in a loop in userspace you need a swi call every so often to trigger callbacks. In supervisor mode an OS_LeaveOS OS_EnterOS combination and a bit of glue triggers callbacks though this is not without problems as witnessed by lanmanfs and lanman98fs – the main problem being that you want network callbacks triggered but don’t want other callbacks triggered.

A recursive copy command, for example, will stop callbacks because it doesn’t call a swi in usermode after the initial swi call. It only works with, for example, Lanmanfs because LanmanFS explicitly triggers callbacks using the OS_LeaveOS/OS_EnterOS method.

Don’t know if this helps you any but it helped me a bit writing it down :-)

Dec 7, 2018 2:01pm

Jeffrey Lee (213) 6048 posts

Edit: a few minutes later, I think it’s really a good idea. It would be much simpler to offload some parts of the OS to other cores than try to adapt the OS to make the apps running on other cores. IMHO, a very valuable short term solution.

Yes, offloading processing done by modules to other cores will be easier than offloading processing done by applications (and is already possible using the prototype SMP module).

Only Jeffrey can do it :)

Incorrect.

If you are doing something in a loop in userspace you need a swi call every so often to trigger callbacks.

Depends on the flavour of callback. For non-transient callbacks, yes, it’ll only happen on return from a SWI to user mode. But transient callbacks (which are the more commonly used variety) will trigger when a SWI returns to user mode or when an IRQ returns to user mode (and also, I believe, when an RTSupport routine returns to user mode).

Apart from the occasional bug/issue related to returned errors blocking callbacks, I wouldn’t expect an app which sits in user mode and calls zero SWIs for long periods of time to cause any problems.

Dec 7, 2018 3:01pm

Dave Higton (1515) 3534 posts

My thanks to Colin and Jeffrey for the above information.

However, my app is doing the most simple and straightforward thing possible: it’s calling Wimp_Poll many times a second, every second (unles there’s any delay in fetching data from the NAS drive or the local SSD). Doesn’t that give callbacks the best chance of being called?

One of the files is from NAS and is therefore putting significant traffic over the LAN. But the numbers are a file of ~2GB in about 900 seconds, which by my arithmetic ends up as about 22Mb/s or so plus TCP and IP overheads. The switch is an HP 10/100/1G, so I can’t see it contributing to the problem. The BBxM is running RO 5.27 20-Nov-18) with SharedCLibrary 5.97 (11 Jun 2018).

MBufManager doesn’t seem to offer any commands, so is there any way of checking for mbuf exhaustion?

Dec 7, 2018 5:23pm

Chris Evans (457) 1614 posts

If you are not using ShareFS whilst running your program why not try turning it off to see if it is the culprit?

Dec 7, 2018 6:45pm

Colin (478) 2433 posts

MBufManager doesn’t seem to offer any commands, so is there any way of checking for mbuf exhaustion?

showstat -a

which by my arithmetic ends up as about 22Mb/s

Data can arrive at the computer faster than that – callback intervals can make the overall transfer rate slower.

Just to confirm a networking problem if you compare 2 files where you have a problem on the same ssd does the problem go away?

Doesn’t that give callbacks the best chance of being called?

You would have thought so.

Dec 7, 2018 6:58pm

Rick Murray (539) 13851 posts

The curious thing is that, for periods of several minutes, I’ve seen an hourglass appear, either continuously or flashing.

I think I may have seen this and I’m not using ShareFS ! The machine, a Pi2 (ARMv7) may, for unknown reasons, simply freeze for anywhere between 10 and 40 seconds. I don’t recall if the Hourglass is on (I think it is but can’t wear to it), but the SD activity indication is solidly on (no flicker). NumLock toggles the LED, Alt-Break does nothing. The volume is good according to DiscKnight.

Data can arrive at the computer faster than that

Correct me if I’m wrong, but aren’t these supposed to be negotiated protocols, rather than “throw something at the wall and see what sticks”?

Dec 7, 2018 7:03pm

Steve Pampling (1551) 8172 posts

~2GB in about 900 seconds

On average the header contributions will end up giving you roughly 10 bits per byte transferred so for easy maths just shift the decimal by one place divide by time passed (900 sec) and then divide by 10^6 for the number of million bits per second (about 244) and since 244 is around 25/% of theoretical maximum on a (nominal) 1Gb/s interface then unless there’s a “disc” transfer bottleneck I’d be looking for a duplex mismatch on the links on the route because that tends to be the answer in most network transfer problems where the transfer speed is around 20-25% of the theoretical available.

Unless you have a managed switch you stand little chance of seeing any evidence other than lots of re-sends in a wireshark capture (which you may find difficult to do unless the endpoints can run wireshark or you’re using a managed switch that can do port mirroring (SPAN if you’re a Cisco droid)

Dec 7, 2018 7:47pm

Colin (478) 2433 posts

Correct me if I’m wrong, but aren’t these supposed to be negotiated protocols.

Yes but the requested data goes into mbufs, the mbufs are consumed in a callback. You can get the data arriving quickly and put into mbufs, callbacks can happen some time later. If too much data is requested it can exhaust mbufs before a callback is called causing dropped packets – I think this is more of a problem on GB networking.

I’ve seen 250ms callback intervals.

mbufs are shared between all sockets – in and out.

Dec 7, 2018 8:52pm

Steve Pampling (1551) 8172 posts

Correct me if I’m wrong, but aren’t these supposed to be negotiated protocols, rather than “throw something at the wall and see what sticks”?

You connect a device to a network and the negotiation on capability is between the device and the network ingress port. Stick a 1Gb/s port at one end of a network and a 100Mbs port at the other and you find the 10Mb port getting data arriving faster than it expects with the local switch buffer (if any) filling and then packet drops with resends.

Dec 7, 2018 11:12pm

Dave Higton (1515) 3534 posts

I think I should clarify a couple of things.

First: although the hourglass comes on, the machine still multi-tasks as smooth as silk. So nothing appears to be being blocked.

Second: I’m not streaming data to the machine willy-nilly; I’m requesting or sending chunks, normally 32kiB or 64kiB, via OS_GBPB 4 or 2. So, unless the OS is doing some read-ahead, I don’t see why the buffers should be overwhelmed. But I will use showstat when I run the comparison again – thank you, Colin, that gives a lot of useful info.

Dec 8, 2018 11:52am

Dave Higton (1515) 3534 posts

I RMKilled ShareFS and ran a long compare again. The hourglass still appeared (though, as always, multi-tasking was still fully operative). As before, one file was read via LanMan98, the other from local SSD.

showstat -a shows hardly any small mbufs in use, no large mbufs in use, and no mbuf exhaustions.

Dec 8, 2018 12:32pm

Dave Higton (1515) 3534 posts

Two files from local SSD, ShareFS still NOT running: no hourglass during the transfer.

Dec 8, 2018 1:52pm

Dave Higton (1515) 3534 posts

I rebooted so that everything is as normal, and re-ran the tests with two files on local SSD. No hourglass.

So it looks like the issue is related to the network, but not specifically to ShareFS.

That is on the assumption that it was enough to rmkill ShareFS – would I need to kill Freeway (or anything else) in addition? I really don’t know exactly what Freeway does.

Curious hourglass behaviour

Reply

Search forums

Social

ROOL Store

Donate! Why?

RISC OS IPR

Description

Voices

Options

Dec 3, 2018 9:49pm Dave Higton (1515) 3534 posts	I’m writing a multitasking file comparison app, and I’m giving it a good thrashing on some ~2GB files. The app uses very short time slices; one test I use is to play Patience. If Patience operates smoothly with almost imperceptible delays, I think the multitasking is good enough. My app is using almost all the CPU time, according to TaskUsage. It doesn’t use the hourglass at all. The curious thing is that, for periods of several minutes, I’ve seen an hourglass appear, either continuously or flashing. My app seems to provoke it, although it doesn’t directly cause it (no calls to the hourglass module). It’s not the only time I’ve seen this; other very busy apps, also not calling the hourglass module, seem to have done the same. Is there any way to find which app is putting the hourglass up? Has anyone else seen the same behaviour? BBxM, RO 5.27 (20-Nov-18).

Dec 3, 2018 10:42pm Andrew Conroy (370) 740 posts	Do you have any ShareFS mounts active? ShareFS can hourglass if it can’t contact the mounted drive (if that’s the correct term).

Dec 4, 2018 8:37am Dave Higton (1515) 3534 posts	There are two other machines on the LAN that have drives shared via ShareFS. I’ll have to look whether they are mounted on my machine when it happens. Although there would be lots of LAN traffic, I can’t see why there should be enough to prevent ShareFS traffic getting through, though. Maybe the fact that the machine is so busy reduces the number of callbacks that can happen?

Dec 6, 2018 10:54am Dave Higton (1515) 3534 posts	Some more tests show that the hourglass happens even though no ShareFS shares are mounted on the BBxM. There is just a “Drives” icon on the icon bar.

Dec 6, 2018 11:33am Chris Evans (457) 1614 posts	I don’t think it relevant to your problem but the colour of the hour glass can easily be changed. The only thing I know that does change the hour glass colour is ShareFS which changes it to red if it hits a problem, it seems to be the normal blue when say it is normally busy but then changes to red if it loses connection.

Dec 6, 2018 2:34pm Jon Abbott (1421) 2651 posts	The only thing I know that does change the hour glass colour is ShareFS which changes it to red if it hits a problem That possibly explains why my Hourglass is always red, I’ve always assumed it was an issue with GraphicsV not passing on the palette when you switch graphics drivers. Some more tests show that the hourglass happens even though no ShareFS shares are mounted on the BBxM That’s curious as the disc imager in ADFFS does a fairly intensive compression, compressing 512 bytes on each Wimp_Poll, but I don’t think I’ve ever seen an hourglass appear. How long after your compression starts does the Hourglass appear? I’d also be interested to know how you got the multitasking to be smooth whilst compressing, as despite my code calling Wimp_Poll thousands of times during the compression cycle, the machine becomes pretty unresponsive.

Dec 6, 2018 6:10pm Colin (478) 2433 posts	There is still ShareFS activity when no shares are mounted – to enumerate the shares on the network. This can have problems the same as any other sharefs transfer – this is why there have been problems with shares not appearing. I’d ensure that every computer on the network, including the one with problems, has `*sharefswindow 1` set. If that fixes things you may find you can increase the value of sharefswindow on some of the computers.

Dec 6, 2018 8:08pm David Feugey (2125) 2709 posts	There is still ShareFS activity when no shares are mounted Hum. Before having SMP, it would be good to offload this kind of component to cores not (yet) used by RISC OS. Most AMP OSes use only core 0 for the OS and (possibly) the others only for apps. RISC OS could reserve some cores for critical part of the OS. For example SSL code or the whole networking stack. Sorry to be off topic (is it?). Edit: a few minutes later, I think it’s really a good idea. It would be much simpler to offload some parts of the OS to other cores than try to adapt the OS to make the apps running on other cores. IMHO, a very valuable short term solution. A tube like system? Only Jeffrey can do it :)

Dec 6, 2018 9:05pm Dave Higton (1515) 3534 posts	I’d also be interested to know how you got the multitasking to be smooth whilst compressing, as despite my code calling Wimp_Poll thousands of times during the compression cycle, the machine becomes pretty unresponsive. I’m doing comparison rather than compression. I’ve had the same sort of results with encryption. The good things about both comparison and encryption is that the two data streams (both in for comparison, in and out for compression) are of identical size, and the processing is almost nothing. The good speed and responsiveness come from file transfers that are block aligned, and in fact are integral powers of 2, which gives the fastest possible disc performance. I’m using 32kiB or 64kiB. I also did a pair of backup and restore apps that are on my website at http://davehigton.me.uk and use zlib for compression and decompression. For compression I use 32kiB chunks of input; clearly I have no choice for the output chunk size. This all seems to fairly fly along too. Only some file types are compressed, but compressing any 32kiB block seems to be almost instantaneous. I’m only using Deflate, which is the quickest (for backup, I think that’s the best choice – your app may have different criteria).

Dec 6, 2018 9:07pm Dave Higton (1515) 3534 posts	How long after your compression starts does the Hourglass appear? Maybe a minute or so before it first appears. During a long period of application activity, it comes and goes.

Dec 6, 2018 9:12pm Dave Higton (1515) 3534 posts	There is still ShareFS activity when no shares are mounted – to enumerate the shares on the network. This can have problems the same as any other sharefs transfer – this is why there have been problems with shares not appearing. I’d ensure that every computer on the network, including the one with problems, has *sharefswindow 1 set. If that fixes things you may find you can increase the value of sharefswindow on some of the computers. ShareFSWindow has always been 1 on all my machines. It’s one of the first things I set up when commissioning a new machine. I’d like to understand why a busy CPU and/or network causes ShareFS to protest. If there were significant periods of single-tasking, I could understand it – but there aren’t. The apps I’m working on are considerate citizens :-) They don’t take any long time slices once the files are open.

Dec 6, 2018 9:54pm Dave Higton (1515) 3534 posts	On the topic of speed: the file comparison app used to compare each block in BASIC. I’ve changed it to do a word length comparison in assembly language, which is extremely quick of course; if there is any difference, it reverts to the BASIC to count the diffs at byte resolution. The time for ~2GB files came down from about 1642 seconds to 905 seconds.

Dec 7, 2018 9:45am Colin (478) 2433 posts	I’d like to understand why a busy CPU and/or network causes ShareFS to protest. From what I’ve pieced together so far the problem is the lack of preemption which means that multitasking has to happen at the application end rather than the device end so a swi call can’t have the thread wait in the background for a resource to become available as there is only 1 thread. Networking requires callbacks to move the data from the interrupt context to the main thread and if the callback doesn’t happen incomming data just uses all the mbufs and you get dropped packets. When packets get dropped sharefs sits waiting for a reply that has been dropped and you get a timedout hourglass. You can also get a hourglass on your machine if the remote machine is having problems replying. If you are doing something in a loop in userspace you need a swi call every so often to trigger callbacks. In supervisor mode an OS_LeaveOS OS_EnterOS combination and a bit of glue triggers callbacks though this is not without problems as witnessed by lanmanfs and lanman98fs – the main problem being that you want network callbacks triggered but don’t want other callbacks triggered. A recursive copy command, for example, will stop callbacks because it doesn’t call a swi in usermode after the initial swi call. It only works with, for example, Lanmanfs because LanmanFS explicitly triggers callbacks using the OS_LeaveOS/OS_EnterOS method. Don’t know if this helps you any but it helped me a bit writing it down :-)

Dec 7, 2018 2:01pm Jeffrey Lee (213) 6048 posts	Edit: a few minutes later, I think it’s really a good idea. It would be much simpler to offload some parts of the OS to other cores than try to adapt the OS to make the apps running on other cores. IMHO, a very valuable short term solution. Yes, offloading processing done by modules to other cores will be easier than offloading processing done by applications (and is already possible using the prototype SMP module). Only Jeffrey can do it :) Incorrect. If you are doing something in a loop in userspace you need a swi call every so often to trigger callbacks. Depends on the flavour of callback. For non-transient callbacks, yes, it’ll only happen on return from a SWI to user mode. But transient callbacks (which are the more commonly used variety) will trigger when a SWI returns to user mode or when an IRQ returns to user mode (and also, I believe, when an RTSupport routine returns to user mode). Apart from the occasional bug/issue related to returned errors blocking callbacks, I wouldn’t expect an app which sits in user mode and calls zero SWIs for long periods of time to cause any problems.

Dec 7, 2018 3:01pm Dave Higton (1515) 3534 posts	My thanks to Colin and Jeffrey for the above information. However, my app is doing the most simple and straightforward thing possible: it’s calling Wimp_Poll many times a second, every second (unles there’s any delay in fetching data from the NAS drive or the local SSD). Doesn’t that give callbacks the best chance of being called? One of the files is from NAS and is therefore putting significant traffic over the LAN. But the numbers are a file of ~2GB in about 900 seconds, which by my arithmetic ends up as about 22Mb/s or so plus TCP and IP overheads. The switch is an HP 10/100/1G, so I can’t see it contributing to the problem. The BBxM is running RO 5.27 20-Nov-18) with SharedCLibrary 5.97 (11 Jun 2018). MBufManager doesn’t seem to offer any commands, so is there any way of checking for mbuf exhaustion?

Dec 7, 2018 5:23pm Chris Evans (457) 1614 posts	If you are not using ShareFS whilst running your program why not try turning it off to see if it is the culprit?

Dec 7, 2018 6:45pm Colin (478) 2433 posts	MBufManager doesn’t seem to offer any commands, so is there any way of checking for mbuf exhaustion? `showstat -a` which by my arithmetic ends up as about 22Mb/s Data can arrive at the computer faster than that – callback intervals can make the overall transfer rate slower. Just to confirm a networking problem if you compare 2 files where you have a problem on the same ssd does the problem go away? Doesn’t that give callbacks the best chance of being called? You would have thought so.

Dec 7, 2018 6:58pm Rick Murray (539) 13851 posts	The curious thing is that, for periods of several minutes, I’ve seen an hourglass appear, either continuously or flashing. I think I may have seen this and I’m not using ShareFS ! The machine, a Pi2 (ARMv7) may, for unknown reasons, simply freeze for anywhere between 10 and 40 seconds. I don’t recall if the Hourglass is on (I think it is but can’t wear to it), but the SD activity indication is solidly on (no flicker). NumLock toggles the LED, Alt-Break does nothing. The volume is good according to DiscKnight. Data can arrive at the computer faster than that Correct me if I’m wrong, but aren’t these supposed to be negotiated protocols, rather than “throw something at the wall and see what sticks”?

Dec 7, 2018 7:03pm Steve Pampling (1551) 8172 posts	~2GB in about 900 seconds On average the header contributions will end up giving you roughly 10 bits per byte transferred so for easy maths just shift the decimal by one place divide by time passed (900 sec) and then divide by 10^6 for the number of million bits per second (about 244) and since 244 is around 25/% of theoretical maximum on a (nominal) 1Gb/s interface then unless there’s a “disc” transfer bottleneck I’d be looking for a duplex mismatch on the links on the route because that tends to be the answer in most network transfer problems where the transfer speed is around 20-25% of the theoretical available. Unless you have a managed switch you stand little chance of seeing any evidence other than lots of re-sends in a wireshark capture (which you may find difficult to do unless the endpoints can run wireshark or you’re using a managed switch that can do port mirroring (SPAN if you’re a Cisco droid)

Dec 7, 2018 7:47pm Colin (478) 2433 posts	Correct me if I’m wrong, but aren’t these supposed to be negotiated protocols. Yes but the requested data goes into mbufs, the mbufs are consumed in a callback. You can get the data arriving quickly and put into mbufs, callbacks can happen some time later. If too much data is requested it can exhaust mbufs before a callback is called causing dropped packets – I think this is more of a problem on GB networking. I’ve seen 250ms callback intervals. mbufs are shared between all sockets – in and out.

Dec 7, 2018 8:52pm Steve Pampling (1551) 8172 posts	Correct me if I’m wrong, but aren’t these supposed to be negotiated protocols, rather than “throw something at the wall and see what sticks”? You connect a device to a network and the negotiation on capability is between the device and the network ingress port. Stick a 1Gb/s port at one end of a network and a 100Mbs port at the other and you find the 10Mb port getting data arriving faster than it expects with the local switch buffer (if any) filling and then packet drops with resends.

Dec 7, 2018 11:12pm Dave Higton (1515) 3534 posts	I think I should clarify a couple of things. First: although the hourglass comes on, the machine still multi-tasks as smooth as silk. So nothing appears to be being blocked. Second: I’m not streaming data to the machine willy-nilly; I’m requesting or sending chunks, normally 32kiB or 64kiB, via OS_GBPB 4 or 2. So, unless the OS is doing some read-ahead, I don’t see why the buffers should be overwhelmed. But I will use showstat when I run the comparison again – thank you, Colin, that gives a lot of useful info.

Dec 8, 2018 11:52am Dave Higton (1515) 3534 posts	I RMKilled ShareFS and ran a long compare again. The hourglass still appeared (though, as always, multi-tasking was still fully operative). As before, one file was read via LanMan98, the other from local SSD. showstat -a shows hardly any small mbufs in use, no large mbufs in use, and no mbuf exhaustions.

Dec 8, 2018 12:32pm Dave Higton (1515) 3534 posts	Two files from local SSD, ShareFS still NOT running: no hourglass during the transfer.

Dec 8, 2018 1:52pm Dave Higton (1515) 3534 posts	I rebooted so that everything is as normal, and re-ran the tests with two files on local SSD. No hourglass. So it looks like the issue is related to the network, but not specifically to ShareFS. That is on the assumption that it was enough to rmkill ShareFS – would I need to kill Freeway (or anything else) in addition? I really don’t know exactly what Freeway does.