Document Status Distribution: General Release Title: Acorn URL fetcher API specification Drawing Number: 1215,220/FS Issue: 0.25 Author(s): Paul Wain Carl Elkins Stewart Brodie Andrew Hodgkinson Issue Date: 12/11/1998 Change Number: ECO 4131 Last Issue: 0.24 (04/08/1998) Contents ======== Issue history Overview Outstanding issues Client to URL module interface Protocol module to URL module interface URL module to protocol module interface URL module service calls URL module *-commands URL errors Performance targets References Glossary Issue history ============= 0.16 19/10/1997 First formal version of specification based on uncontrolled textual programmer's notes. (RCE) 0.16a 20/10/1997 Incorporated notes from ADH & SB. (RCE) 0.19 17/11/1997 Incorporated details of service calls. (SNB) 0.20 20/11/1997 Incorporated details of URL parsing SWI. (SNB) 0.21 11/06/1998 All other updates incorporated. (SNB) 0.22 22/06/1998 Comments after first review incorporated Added details of proxy enumeration SWI. (SNB) 0.23 25/06/1998 Comments from interested parties incorporated. (SNB) 0.24 04/08/1998 No longer live. ECO 4082. (SNB) 0.25 12/11/1998 Four digit years on all dates. Tidied up white space. Removed smart quotes and n-dashes. Added author details to history. Corrected references on R0 exit words from URL_ParseURL to URL_Status. Added details of bit 1 of flags word in R0 to URL_ParseURL. Clarified a few sentences here and there. ECO 4131. (ADH) Overview ======== The URL (Universal Resource Locator) module is a general purpose module for fetching data from various Internet services. This specification reflects the behaviour of version 0.42 or later of the URL_Fetcher module. The purpose of the module is to provide a uniform entry point into a set of "fetcher" protocols (e.g. FTP, HTTP, Gopher, NNTP, etc.), without the need for a client application to understand how that protocol works. This is done using a number of generalised URL SWIs. The fetcher protocols modules (hereafter just "protocol modules") with which the URL module communicates, are called only by the URL module itself. The entry points into the protocol modules have similar names to the entry points into the URL module, but these are NOT the same, despite similarities. The system structure is shown in figure 1 below. /----------------\ | Applications | \----------------/ | | v /---------------------------\ | URL module | \---------------------------/ ^ | ^ | | | | | | v | v /----------\ /----------\ | HTTP | | FTP | . . . . . \----------/ \----------/ Figure 1: URL Fetching system structure Each client fetch occurs with in the context of a 'session'. Each session is identified by a different session identifier. Client session identifiers are issued by the URL module upon request and remain valid until the client informs the URL module to discard the session. Subsequently, session identifiers may be re-issued by the URL module for new sessions. Only a single object fetch can be performed in any one given session. Sessions cannot be re-used by clients, even if a prior object fetch in that session has completed. The typical client usage of the system is: * Obtain a session identifier (SWI URL_Register) * Start fetching an object (SWI URL_GetURL) * Repeatedly, whilst multi-tasking if in the desktop environment: - Read blocks of data (SWI URL_ReadData) - Process that data * Discard session (SWI URL_Deregister) If an application decides it requires a premature termination (e.g. the user asked the application to quit whilst an object was being downloaded), then the application calls SWI URL_Stop immediately and then discards the session with SWI URL_Deregister. Typical clients, such as web browsers, will, most likely, have several sessions active concurrently. The URL module uses its own session identifiers that are passed in many of the SWI interfaces to the protocol modules which are not those known to the client application - the URL module maintains its own private sessions into the protocol modules. Service calls are also provided to ease interaction between the URL module and the fetchers, mainly to inform other modules of the arrival or departure of a particular module. Each protocol module accepts data and returns results as per the HTTP protocol. Thus any extra client data associated with a request (passed in R4 to SWI URL_GetURL) will take the format of a (possibly empty) set of HTTP headers, an empty line and then the data; and each response will start with an HTTP/1.0 or HTTP/1.1 Response-Line of the format: "HTTP/1.0 200 OK" followed by various headers identifying the content-type of the retrieved data, followed by an empty line, followed by the data itself. Outstanding issues ================== None. Client to URL module interface ============================== A typical client would be an application, such as a Web Browser. The following SWI calls provide the interface for an application to control and transfer data via the URL module. SWI URL_Register (&83E00) On entry: R0: Flags: Bits 31-0 reserved (0). On exit: R0: Reserved - currently zero. R1: Session identifier. All other registers preserved. SWI is not re-entrant. Interrupt status undefined. This SWI initialises a client session with the URL module and provides the client with a session identifier that can be used to monitor the status of the URL module within that client's context. The session identifier is unique for each client session that is registered with URL and is also used as an identifier in subsequent interactions with the URL module. Multiple registration by the same client application is permitted. This will provide the client with multiple identifiers to the URL module. Calling this SWI does not result in the calling of any protocol module SWIs. The URL module imposes no limit on the number of concurrently registered sessions, other than having the required memory available in which to store details of the session. SWI URL_GetURL (&83E01) On entry: R0: Flags: Bit 0 => R6 is valid. Bit 1 => R5 holds length of data in R4 specified buffer, otherwise a single NUL terminated string in buffer. Bits 31-2 reserved (0). R1: Session identifier. R2: Bits 7-0 => Method (8-bit value, held in bits 7-0). This is protocol dependent. See table below for values. Bits 15-8 => Method dependent. Bits 31-16 reserved (0). R3: URL - The document we are after including the protocol, e.g. "http://www.acorn.co.uk/". R4: Data block: Data to send in addition to the URL. Validity is protocol and method dependent. R5: If R0:1 is set, length of data in R4 data block. If R0:0 is clear, must be 2. R6: 'User Agent': Pointer to string to use as 'User Agent' identifier in request header if R0:0 is set. A NULL pointer or NULL string implies use default identifier - see below. On exit: R0: Protocol status (as defined for SWI URL_Status, below). All other registers preserved. SWI is not re-entrant. Interrupt status undefined. This SWI is used to instigate a transfer of data to or from (mainly from) a resource server. When this SWI has been called, the URL module checks the per-session and global proxy settings, looking for a match (see SWI URL_SetProxy for details on setting proxies and proxy conflict resolution). If no proxy is to be used, then URL looks for a protocol module which is capable of handling the URL specified by R3. If a proxy setting was found, then a pointer to the proxy URL is placed in R7, R0:31 is forced to value 1, and URL looks for a protocol module which is capable of handling the specified proxy URL. In both cases, if a suitable module cannot be located, the URL module generates an error. If a protocol module capable of handling the URL was found, then all client registers are passed onto the protocol module via the Protocol_GetData SWI call with the exceptions stated above for proxy handling. On exit, R0 will hold the status code returned by the protocol module. The extra data pointed to by R4 on entry is method and protocol specific. For example, in HTTP, the data comprises HTTP headers and, if appropriate, an entity body. Protocol modules should use this style wherever possible. Note that these headers do not include lines such as an HTTP Request-Line (ie. the "GET / HTTP/1.0" part. For example, when posting data to an HTTP URL as the result of a form submission on a web page, the web browser would supply a Content-Type header, Content-Length header, potentially some kind of encoding header, a blank line and then the entity body. The User Agent string pointed to by R6 if R0:0 is set, is in indication to the underlying protocol module of how the module should identify itself to remote systems. This controls the User-Agent header for the HTTP protocol module, for example. The protocol module is free to define its default identifier as it pleases, however, following the format of the HTTP User-Agent is recommended where possible and appropriate to the protocol. Modules may choose to ignore or amend any User-Agent string. For example, the AcornHTTP module will suffix the client's User-Agent with its own version number, resulting in complete identifiers such as: User-Agent: Acorn Browse/2.06 AcornHTTP/0.82 where the client only specified "Acorn Browse/2.06". Table of method numbers FTP HTTP and others Comment 1 RETR/LIST GET ("Get this object" operation) 2 n/a HEAD ("Get entity headers" operation) 3 n/a OPTIONS ("Get server options" operation) 4 n/a POST ("HTTP POST" operation) 5 n/a TRACE ("HTTP TRACE" operation) 6 n/a n/a (Reserved to Acorn - do not use) 7 n/a n/a (Reserved to Acorn - do not use) 8 STOR PUT ("Store this object" operation) 9 MKD n/a ("Create directory" operation) 10 RMD n/a ("Remove directory" operation) 11 RNFR/RNTO n/a ("Rename object" operation) 12 DELE DELETE ("Delete object" operation) 13 STOU n/a ("Store object unique" operation) Applications for new method codes should be made to Developer Support. The range 128-254 is reserved for private non-distributed modules. Method numbers 0 and 255 are reserved and must not be used. The list of methods specific to FTP quoted above are fully implemented in version 0.28 of the FTP Fetcher module. The list of methods specific to HTTP quoted above are fully implemented in version 0.82 of the AcornHTTP module. SWI URL_Status (&83E02) On entry: R0: Flags: Bits 31-0 reserved (0). R1: Session identifier. On exit: R0: Status word: Bit 0 => Connected to server. Bit 1 => Sent request. Bit 2 => Sent data. Bit 3 => Initial response received. Bit 4 => Transfer in progress. Bit 5 => All data received. Bit 6 => Transfer aborted. Bits 31-7 reserved (0). R1: Preserved. R2: Server response, as an "HTTP" response code (200, 401 etc.). R3: Bytes read so far (total body data count). R4: Total bytes to be transferred in whole transaction if known (approximate value only), or -1 if unknown. All other registers preserved. SWI is not re-entrant. Interrupt status undefined. This SWI is used to monitor the transfer of data from a remote service. It is protocol independent - the exit status bits are common to all services. Clients must test this field bit-wise, since the value is cumulative. Clients may not assume that the states returned in R0 will progress in any particular combination or order. However, the likely progression during a fetch for a resource being retrieved over a network (when the bits are combined into a single decimal value) is: 0,1,3,7,15,31 and then R0:5 set upon completion, and R0:6 set at any stage when an error has occurred. Since each protocol module is returning its results according to the HTTP protocol, R2 can be treated as an HTTP response code whatever the URL being fetched. For example, the FileFetcher module will indicate file not found errors by setting the response code to 404 (HTTP's Not Found error code). Note that in the case of, for example, an HTTP 400 (Forbidden) return, some explanatory data may be received, too. If the amount of data to be received is unknown, R4 will contain -1, however R3 will contain the number of bytes received so far. The R4 value should be treated as approximate, since the exact interpretation varies between protocols. When this SWI is called, the URL module invokes the Protocol_Status SWI for the protocol module concerned with the request. SWI URL_ReadData (&83E03) On entry: R0: Flags: Bits 31-0 reserved (0). R1: Session identifier. R2: Client buffer for received data. R3: Size of buffer pointed to by R2. On exit: R0: Status word (see SWI URL_Status). R2: Preserved. Contents of buffer modified. R4: Number of bytes transferred to R2 buffer. R5: Number of bytes still to be read to complete object (if known) or -1 if unknown. All other registers preserved. SWI is not re-entrant. Interrupt status undefined. This SWI is used to read the data pending from a request, find out how much data has been read on this call and how much more there is remaining to be read for the request. R2 is a pointer to a buffer on entry (and R3 is the size of the buffer), on exit the buffer contains the new data, R4 contains the amount of data written to the buffer and R5 contains the amount of data left to be read. If the amount of data left is unknown R5 will contain -1. R1 always returns the protocol status code. In the event of all the data being read (R5 = 0 on exit), a call to URL_Stop is not required as this is performed automatically when URL_Deregister is called for the client session. Once all data has been read a call to URL_Status can return no meaningful information, simply indicating that the transfer has completed. The data returned will take the form of a complete HTTP compatible response. Responses should use HTTP/1.0 if possible and avoid HTTP/1.1. For example, AcornHTTP will downgrade any higher version responses to HTTP/1.0, having taken care to remove any features applicable only to the higher version, such as chunked transfer encodings. When this SWI is called, the URL module invokes the Protocol_ReadData SWI for the protocol module concerned with the request. SWI URL_SetProxy (&83E04) On entry: R0: Flags: Bits 31-0 reserved (0). R1: Session identifier. R2: Address of buffer containing a URL base. R3: URL 'method' to proxy (address of URL fetch identifier to be proxied). R4: 0 => Proxy request. 1 => Don't proxy request. All other values reserved. On exit: R0: Status word (see SWI URL_Status for details) All other registers preserved. SWI is not re-entrant. Interrupt status undefined. This call is used to set up a proxy server to use for a session with the URL module. If R1 is zero then the proxy is considered global and is used for all sessions. If R1 is a valid session identifier then the proxy server for that session only is set. R2 is a pointer to a string containing the base URL to pass the request on to when a proxy request is made. This is of the form "http://www-cache.demon.co.uk:8080/" (note the trailing '/'). A common error is to omit the port number. If the port number is not specified, then the default port number is used. See discussion under URL_ProtocolRegister regarding how the default port number is derived. R3 is a pointer to a buffer containing the initial part of the URL to proxy - the URL scheme (e.g. "http:", "ftp:"). This system has the advantage that requests to certain hosts can be proxied and not others (e.g. by giving "http://www.acorn.co.uk/" as the scheme). However, if R4 is 1, this indicates that no matter how the proxy settings have been defined, requests to the base URL should not be proxied in this case (R3 is undefined). When a URL_GetURL request is received, the proxy settings are evaluated in the following order: 1 Client no-proxy 2 Client proxy 3 Global no-proxy 4 Global proxy This is to ensure all client settings override global settings and thus remain safe for the given client - ie. a client which sets up a proxy server and then defaults all other URLs to no-proxy, can, no matter how the global settings are changed, be sure of where requests will end up. If R2=0 on entry, then all proxy settings for the specified session are cleared. Calling this SWI does not result in any calls being made to protocol modules. SWI URL_Stop (&83E05) On entry: R0: Flags: Bits 31-0 reserved (0). R1: Session identifier. On exit: R0: Status word (see URL_Status for details). All other registers preserved. SWI is not re-entrant. Interrupt status undefined. This call aborts a current request if there is one associated with the session identifier. In the event of no request being associated with the identifier, an error is generated. The purpose of this SWI call is to provide the client with a way of enforcing the termination of a request. It is not called by the client just because all the data associated with the request has finished being transferred, although it may do that if it so chooses. The URL_Stop call will be made automatically by the URL module when the session is deregistered by the client using SWI URL_Deregister. When this SWI is called, the URL module invokes the Protocol_Stop SWI for the protocol module concerned with the request. URL_Deregister (&83E06) On entry: R0: Flags: Bits 31-0 reserved (0). R1: Session identifier. On exit: R0 Status word (see SWI URL_Status for details). All other registers preserved. SWI is not re-entrant. Interrupt status undefined. This call deregisters the client session from the URL module, freeing up any information the URL module may have kept about the client session (e.g. proxy information). The session identifier ceases to be valid and becomes available for re-issue on a subsequent call to SWI URL_Register. When this SWI is called, the URL module invokes the Protocol_Stop SWI for the protocol module concerned, if it has not already done so (e.g. during the processing of URL_Stop). SWI URL_ParseURL (&83E07) On entry: R0: Flags: Bit 0 => If set, R5 contains number of words in data block, else a default of 10 words is assumed. Bit 1 => If set, character codes 0 to 31 and 127 in the URL will be escaped (hex encoded, e.g. space becomes '%20') - only available in URL 0.42 or later. URL 0.38 through to 0.41 inclusive always escape these characters. Versions prior to 0.38 never do this. Bits 31-2 reserved (0). R1: Reason code: 0 => Return component buffer requirements. 1 => Return component data in specified buffers. 2 => Construct full URL from component buffers. 3 => 'Quick parse'. R2: Pointer to base URL. R3: Pointer to URL relative to base URL (or NULL if none). R4: Pointer to data block of R5 words (unless R1=3, see below, or R0:0 is unset, in which case R4 points to a buffer of at least 10 words in length). R5: If R0:0 set, size of R4 block in words. If R3 is non-NULL, it is assumed to point to a partial URL which needs to be resolved with respect to the base URL pointed to by R2. If R3 is NULL, then R2 is assumed to point to a full URL. On exit: R0: Flags: Bits 31-0 Reserved (0). All other registers preserved. SWI is not re-entrant. Interrupt status undefined. Data block at R4 is updated in line with entry reason code. This SWI is used to parse URLs into their constituent parts, enabling clients to extract the various fields from the URL in a reliable manner. The call is also capable of resolving a relative URL to produce a fully-qualified URL, and of reconstructing a full URL from a set of components. The data block referred to above is either a block of integers which will be updated to contain the size of the required buffer for each element, or a block containing pointers to buffers for the actual data. All strings are zero-terminated and all lengths include space for the zero terminator. The number of entries in the block is specified in R5 if R0:0 is set on entry. If R0:0 is clear, then the default value of 10 is assumed. The format of the data block is: Offset Usage + 0 Fully canonicalised URL. + 4 URL protocol (e.g. "http", "ftp") forced to lower-case. + 8 Hostname (e.g. "www.acorn.com") forced to lower-case. + 12 Port (e.g. "80"). + 16 Username - used for FTP authentication and mailto. + 20 Password - for FTP. + 24 Account - for FTP. + 28 Path (e.g. "pub/riscos/releases") [See note]. + 32 Query - for HTTP, things after a query character. + 36 Fragment - for HTTP, things after a hash character. It is anticipated that this SWI will be called twice: the first time to find the lengths of the buffers, and the second to retrieve a copy of the data into the buffers. The URLs pointed to by R2 and R3 (if used) need not be fully-qualified. e.g. R2 may point to "www.acorn.com/browser/". The fully canonicalised version of the URL at block+0 refers to a fully-qualified, canonicalised version of it, which in this example would be "http://www.acorn.com/browser/". During canonicalisation, the port number will be elided if possible. See the discussion under SWI URL_ProtocolRegister for details of how URL discovers whether this is possible or not. [Note] The path will not start with a '/' unless the URL being parsed explicitly specified one - this is in keeping with the URL specification, so for example, given the URL "http://www.acorn.com/browser/", then the path component is "browser/", and not "/browser/"; the slash between the hostname and path is a separator only, not a part of either component. The entry reason codes are described below. URL_ParseURL_ReturnLengths (R1 = 0) When R1 is 0 on entry to the SWI, the data block is treated as a block of unsigned 32-bit integers. The contents of the block are ignored on entry, but on exit are filled in with the lengths of the individual components of the URL. A value of zero is stored for a field which does not exist; non-zero values include space for a zero-byte terminator. URL_ParseURL_ReturnData (R1 = 1) When R1 is 1 on entry to the SWI, the data block is treated as a block of pointers to buffers to receive the components of the URL. Each of the pointers in the data block must be either zero, indicating that the caller is not interested in that field, or point to a buffer which is sufficiently long to receive the field. The client can ensure this by having previously used reason code 0 to determine the length required. URL_ParseURL_ComposeFromComponents (R1 = 2) When R1 is 2 on entry to the SWI, the data block is treated as containing the broken down fields of a URL. Each of the pointers in the data block must be either zero or point to a buffer containing the value of the component, with the exception of the full URL field, which is a pointer to a buffer to receive the fully canonicalised URL. This buffer is filled in on exit. URL_ParseURL_QuickResolve (R1 = 3) When R1 is 3 on entry to the SWI, R4 points to a buffer for receiving the fully resolved URL. R5 is the length of the buffer. On exit, the buffer is filled in with the fully resolved URL obtained, and R5 is decreased by the length of the URL (including terminating zero byte). Hence R5 will be negative on exit if the buffer wasn't large enough. There is no fixed rule for calculating the minimum buffer length required for the answer. To guarantee that the buffer is large enough, it should be calculated as: length(base URL) + length(relative URL) + 4 If R0:1 is set on entry, there is the potential for up to the entire URL to be hex encoded. In this case, you would need to multiply the above by three. URL 0.37 and earler never hex encodes URLs. Note that URL 0.38, 0.39, 0.40 and 0.41 will *always* do this; the control through R0:1 was introduced in v0.42. Clients not knowing about this bit (therefore leaving R0:1 unset) will find that 0.42 or later do not automatically escape URLs, this being more sensible default behaviour on the whole. Characters which are already hex encoded in URLs are left alone in all versions of the URL module. Clients are strongly recommended to use this reason code if they wish to resolve a relative URL or canonicalise a URL and are only interested in the fully resolved and canonicalised form of the URL, since it is significantly faster than using reason code 0 and then reason code 1. To help reduce the chances of wildly over-allocating buffer space, setting of R0:1 SWI URL_EnumerateSchemes (&83E08) On entry: R0: Flags: Bits 31-0 reserved (0). R1: Context (0 for first call). On exit: R0: Status flags (currently unused). R1: Context for next call (-1 if finished). R2: Pointer to read-only URL fetch scheme (if R1 is not -1). R3: Pointer to read-only help string (if R1 is not -1). R4: Protocol module SWI base (if R1 is not -1). R5: Protocol module version (*100, if R1 is not -1). All other registers preserved. SWI is not re-entrant. Interrupt status is undefined. This call is used to discover which schemes are currently available to the URL module. It may be used, for example, to determine whether or not a client of the URL module may deal with a given URL (in combination with SWI URL_ParseURL to extract the scheme) and if not, pass it to the Acorn URI handler to see if anything else in the system can deal with it (see the Acorn URI Handler Functional Specification, 1215,215/FS). URL will not cope gracefully if the protocol module list is updated between calls to this SWI (you may get duplicate modules or miss some out). SWI URL_EnumerateProxies (&83E09) On entry: R0: Flags: Bit 0 => If set, enumerate the no-proxy list. Bits 31-1 reserved (0). R1: Session identifier (or zero for global proxies/no-proxies). R2: context (0 for first call). On exit: R0: Status flags (currently unused). R1: Preserved. R2: Context for next call (-1 if finished). R3: If R0:0 clear: Pointer to read-only URL to proxy (if R2 is not -1). If R0:0 set: Pointer to read-only URL to not proxy (if R2 is not -1). R4: If R0:0 clear: Pointer to read-only proxy URL information (if R2 is not -1). If R0:0 set: Corrupted, contains no useful information. All other registers preserved. SWI is not re-entrant. Interrupt status is undefined. This call is used to discover which URLs proxies are set for on a per session or global basis, or which URLs are not to be proxied. The information pointed to by R3 and R4 where applicable is a copy of that which was passed to SWI URL_SetProxy when the setting was made. If R0:0 is set on entry, then R4 will be corrupted on exit and may not contain a meaningful value. URL will not cope gracefully if the proxy list is updated between calls to this SWI (you may get duplicate entries or miss some out). Protocol module to URL module interface ======================================= This section defines the calls provided by the URL module to enable a fetcher protocol module to interact with it. SWI URL_ProtocolRegister (&83E20) On entry: R0: Flags: Bit 0 => If set, R5 contains protocol flags word. Bit 1 => If set, R6 contains the default port number. Bits 31-2 reserved (0). R1: Protocol module's SWI base. R2: URL fetch scheme supported e.g. "http:" etc. R3: Version number * 100 e.g. 116 => version 1.16 R4: Informational string. Up to 50 characters of descriptive text, e.g. "Acorn HTTP fetcher". R5: Protocol flags word, if R0:0 set. See below. R6: Default port number, if R0:1 set. See below. On exit: R0: Flags: Bits 31-0 reserved (0). All other registers preserved. SWI is not re-entrant. Interrupt status is undefined. This call is used by a protocol fetcher module to register its SWI base and the type of URL that it accepts with the URL module. The SWIs that are accessible from this SWI base are defined in the following section. If the module cannot be registered (e.g. another module is already claiming that URL base), then an error will be returned. R3 is an integer version number and R4 is a pointer to a string containing more information which will be displayed by the *URLProtoShow command (or 0 if no descriptive text is provided). Typically, it will be called during a protocol module's initialisation code or on a callback set from the module's initialisation code. If the protocol module is registered successfully, then URL will issue a service call Service_URLProtocolModule_ProtocolModule to inform any interested modules. If R0:0 is set, then R5 contains a protocol flags word. This is used to describe to URL how the resolver should treat URLs from this scheme. The current bits defined are: Bit Meaning when set 0 Path is *not* UNIX-like 1 No parsing should be performed on this scheme 2 Scheme allows "user@" to precede the hostname component 3 Hash (ASCII 35) allowed in hostname (e.g. for file: URLs) 4 No hostname component (e.g. mailto: URLs) 5 Remove *leading* ".." components in pathname. Note that the meanings of set bits are such that zero is a reasonable value to pass for unknown schemes. Note that if URL is requested to resolve URLs using schemes unknown to it, it will assume a protocol flags word value of zero. This may lead to inconsistent behaviour depending on whether the protocol module is loaded or not. If R0:1 is set, then R6 contains the default port number for this scheme. This is used by the URL resolving code to determine if explicitly specified port numbers can be elided from the URL. For example, when constructing the canonicalised form of "http://www.acorn.com:80/", the port bit is dropped as it serves no useful purpose, leaving "http://www.acorn.com/". The URL module is primed with knowledge of the following protocols: mailto:, telnet:, finger:, file:, filer_opendir:, filer_run:, local:, gopher:, ftp:, http:, https:, whois: It is not necessary for modules implementing those protocols to set either flag bit and hence no need for them to set R5 or R6. SWI URL_ProtocolDeregister (&83E21) On entry: R0: Flags: Bits 31-0 reserved (0). R1: SWI base. On exit: R0: Flags: Bits 31-0 reserved (0). R1: Number of client sessions that were using this module. All other registers preserved. SWI is not re-entrant. Interrupt status is undefined. This call should be used by the protocol module to tell the URL module that it is no longer available. The URL module will raise the appropriate disconnect messages with its clients, and tell the protocol module the number of clients that were affected. Typically, it will be called during a protocol module's finalisation code. If the protocol module is deregistered successfully, then URL will issue a service call Service_URLProtocolModule_ProtocolModule to inform any interested modules. URL module to protocol module interface ======================================= The protocol module SWI interface is only called by the URL module. URL module clients should never call the ReadData/Status/GetData/Stop SWIs directly. The protocol modules are required to supply a SWI interface. There are currently 4 SWIs that need to be supported which run from SWI_base to SWI_base+3. New SWIs common to all protocol modules will only be added at the low-end of the SWI range. Protocol modules must generate standard SWI not known error (error number &1E6) if they receive a call which they do not understand, so that the URL module can determine that they do not support the SWI. Note that there is no general requirement to use SWIs from offset 0 into a SWI chunk, although it makes sense to do this. Protocol modules which support multiple protocols should ensure that they do not place their internal "SWI bases" less than 16 SWIs apart to allow space to future expansion. e.g. AcornHTTP registers http: as &83F80 and https: as &83F90. Protocol specific SWIs should be added at the top-end of the SWI chunk (ie start at SWI_base+63 and work down) - the AcornHTTP module uses that range to provide clients with access to its HTTP cookie management code, for example. NOTE: the Session identifiers used by the URL module to talk to the protocol modules are NOT the same identifiers used by clients to talk to the URL module. They are NOT interchangeable. SWI Protocol_GetData (SWI_base+0) On entry: R0: Flags: Bits 30-0 => as specified by client in URL_GetURL. Bit 31 => R7 is valid. R1: Session identifier. R2: Method (See table earlier in document). R3: URL (including fetch scheme). R4: Pointer to block of data in addition to URL. R5: Protocol dependent. R6: Protocol dependent. R7: If R0:31 is set, proxy URL information. See below. On exit: R0: Protocol status word (see SWI URL_Status for details). All other registers are protocol dependent. SWI re-entrancy is protocol module dependent. Interrupt status is protocol module dependent. This call is used to start retrieving data. The protocol module should raise any events for the client via the session identifier provided in R1. The URL module calls this SWI in response to one of its clients calling SWI URL_GetURL. The proxy URL information specified in R7 (if R0:31 is set) gives the location of the proxy to be used in the format of a URL. For example, "http://www-cache.demon.co.uk:8080/". This information is supplied by the URL module and not the client. The protocol module must note that on a proxied request, the target URL indicated by R3 may not have the same fetch scheme. For example, it might be an ftp: URL being proxied through an HTTP proxy service. SWI Protocol_Status (SWI_base+1) On entry: R0: Flags: Bits 31-0 reserved (0). R1: Session identifier. On exit: R0: Protocol status word (see SWI URL_Status for details). R2: As URL_Status. R3: As URL_Status. R4: As URL_Status. All other registers preserved. SWI re-entrancy is protocol module dependent. Interrupt status is protocol module dependent. This SWI is used to monitor the transfer of data from the remote service. It is protocol independent, with the exit status bits of R0 being common to all fetcher services. R2 should contain the remote server's most recent response code where possible; note that even in the case of, for example, an HTTP 400 (Forbidden) response, some explanatory data may be received, and thus R3 may be non-zero. If the client is unknown to the protocol module then an error should be returned. If the client's last request has finished, but the client session has not yet been deregistered, then the protocol module should return the status code as of the time that the request finished (ie bit 6 or 5 will be set along with another combination if relevant). The URL module calls this SWI in response to one of its clients calling SWI URL_Status. SWI Protocol_ReadData (SWI_base+2) On entry: R0: Flags: Bits 31-0 reserved (0). R1: Session identifier. R2: Address of client's data buffer. R3: Size of client's data buffer. On exit: R0 Protocol status word (see SWI URL_Status for details). R2: As URL_ReadData. R3: As URL_ReadData. R4: As URL_ReadData. R5: As URL_ReadData. All other registers preserved. SWI re-entrancy is protocol module dependent. Interrupt status is protocol module dependent. This SWI is used to read the data pending from a request, find out how much data has been read on this call and how much more there is remaining to be read for the request. The register usage and description is the same as for SWI URL_ReadData. The URL module calls this SWI in response to one of its clients calling SWI URL_ReadData. Protocol_Stop (SWI_base+3) On entry: R0: Flags: Bits 31-0 reserved (0). R1: Session identifier. On exit: R0: Protocol status word (see SWI URL_Status for details). All other registers preserved. SWI re-entrancy is protocol module dependent. Interrupt status is protocol module dependent. This call aborts a current request if there is one associated with the session identifier. The URL module calls this SWI in response to one of its clients calling SWI URL_Deregister or SWI URL_Stop. URL Module Service Calls ======================== The URL fetcher system has been allocated a block of 256 service calls (&83E00-&83EFF). Two are currently defined. The other 254 are reserved by Acorn for future use. Service_URLProtocolModule (&83E00) This service call is issued by the URL protocol module to communicate important events to the protocol modules. On entry: R0: Reason code - reason for the service call (see below). R1: &83E00 (Service_URLProtocolModule). All other registers are reason code dependent. On exit: All registers must be preserved, unless claiming the service call. In all the currently defined cases, the service call must not be claimed. Protocol modules must ignore reason codes which they do not understand. Defined Reason Codes: URLModuleStarted R0: 0 URL module has initialised. R1: &83E00 Service_URLProtocolModule. R2: version Version number of URL module * 100. Upon receiving this service call, protocol modules should re-register with the new URL module by issuing SWI URL_ProtocolRegister as usual. It must assume that any previous registration is no longer valid. This service call must not be claimed. URLModuleDying R0: 1 URL module is dying. R1: &83E00 Service_URLProtocolModule. R2: version Version number of URL module * 100. Upon receiving this service call, protocol modules should note that the URL module has gone away and not attempt to talk to it any more until a future Service_URLProtocolModule/URLModuleStarted service call arrives. This service call must not be claimed. All other reason codes are reserved to Acorn and must not be used. Service_URLProtocolModule_ProtocolModule (&83E01) On entry: R0: Reason code. R1: &83E01 (Service_URLProtocolModule_ProtocolModule). R2: URL fetch scheme (e.g. "http:", "ftp:"). R3: SWI base chunk of protocol module. R4: Description of module as shown by *URLProtoShow. On exit: All registers must be preserved, unless claiming the service call. In all the currently defined cases, the service call must not be claimed. Protocol modules must ignore reason codes which they do not understand. Defined reason codes: URLProtocolModuleStarted R0: 0 Protocol module has just registered URLProtocolModuleDying R0: 1 Protocol module has just deregistered All other reason codes are reserved. URL module *-commands ===================== The URL module provides a single *-command. Syntax: *URLProtoShow Parameters: None Use: Display information on currently registered protocol modules. Help text: *URLProtoShow shows all the current protocols known and their SWI bases. Example: *URLProtoShow Base URL SwiBase Version Comment ============================================================================= --- 0x83e00 038 URL © Acorn 1997-8 (Built: 07 May 1998) gopher: 0x508c0 010 Gopher Fetcher © Acorn 1997-8 (Built: 17 Feb 1998) ftp: 0x4bd00 028 FTP Fetcher © Acorn 1997-8 (Built: 19 Mar 1998) file: 0x83f40 038 File Fetcher © Acorn 1997-8 (Built: 04 Jun 1998) http: 0x83f80 082 Acorn HTTP © Acorn 1997-8 (Built: 07 May 1998) Related SWIs: SWI URL_EnumerateSchemes URL errors ========== The URL module is allocated two ranges of error numbers, each range being 256 long. The first 32 errors are reserved to the URL module and the rest are reserved to Acorn protocol modules. Module Error range URL &80DE00 - &80DE1F HTTP &80DE20 - &80DE3F MAILTO &80DE40 - &80DE5F File &80DE60 - &80DE7F FTP &80DE80 - &80DE9F Gopher &80DEA0 - &80DEBF WhoIs &80DEC0 - &80DEDF Finger &80DEE0 - &80DEFF WAIS &81EF00 - &81EF1F HTTPS &81EF20 - &81EF3F News &81EF40 - &81EF5F Error numbers &81EF60-&81EFFF are reserved for Acorn use only. URL Module Errors Error Number Meaning &80DE00 Session ID not found. A client passed an unknown session ID in R1 to one of the URL module's SWIs. &80DE01 URL ran out of memory &80DE02 No matching fetcher for the URL could be found &80DE03 SWI not found (URL Module). URL attempted to call a fetcher's SWI and received a SWI not known error. &80DE04 Session already has had an object fetch performed in it. You cannot re-use this session. &80DE05 No fetch in progress for this session ID. You have called URL_ReadData or URL_Status having already terminated the fetch. &80DE06 SWI Method already exists. URL already knows of a module which provides this method for fetching - another cannot register. &80DE07 No fetch in progress for this session ID. You have not called URL_GetURL before URL_Stop,URL_ReadData or URL_Status. &80DE08 Message not found in Messages file. &80DE09 (No longer used) &80DE0A Unable to parse URL. Error numbers for protocol modules are not within the scope of this specification. Performance targets =================== Final code size of the version described by this document should be about 25K. When fetches are active, more memory will be claimed from the RMA to record details of the session. The amount claimed depends on the URL being fetched plus the small overhead for the session information. Temporary workspace is claimed from the RMA as required for URL resolution equivalent to three times the total combined length of the base and relative URLs involved. Workspace is claimed from the RMA to store details of registered proxies. All session-specific memory, including proxy information, is freed when the session is terminated. References ========== The following RFC documents are of direct relevance to the URL module: RFC 1738 - Uniform Resource Locators RFC 1808 - Relative Uniform Resource Locators RFC 2068 - HyperText Transfer Protocol specification version 1.1 Glossary ======== FTP File Transfer Protocol - an application level protocol for the transfer of files between a remote host computer and a local client, as defined by RFC 959. HTTP HyperText Transfer Protocol - a protocol designed to transfer resources ("documents") from a remote server machine to a local client, as defined by RFC 1945 (version 1.0) and RFC 2068 (version 1.1). HTTPS Secure HyperText Transfer Protocol - HTTP protocol over a communication channel encrypted using SSL. URL Uniform Resource Locator, as defined by RFC 1738 - a subclass of URIs (Uniform Resource Identifiers, defined in RFC 1630) which map onto network access protocols. More commonly, the addresses of objects on the World Wide Web. NNTP Network News Transfer Protocol, as defined by RFC 977. Gopher The Internet Gopher Protocol - a distributed document search and retrieval protocol. SSL Secure Sockets Layer. A specification for encryption of communications on networks. WAIS Wide Area Information Servers, as defined by RFC 1625.