I Want My FTP: Bits on Demand

ARPA's not-so-hidden agenda behind sponsoring this newfangled nationwide packet-network boondoggle was a faint hope it could stave off compoundig pleas for more and different computers for each and every project at each site. After all, when the day comes students can sit at cheap terminals and exploit the cycles and storage of any host in the country, ARPA could just sponsor a handful of top-end machines. Take one ARPANET, add terminal emulation and file transfer, and watch your funding headaches melt away.

Well, ARPANET didnít manage to make a dent in the demand for workgroup and, later, personal computing, but it was right to focus on the critical interoperability tools for exploiting heterogeneous computing environments (rather than, say, entirely new problems like the early 1990's High-Performance Computing and Communication's Grand Challenges). Last issue we dissected TELNET; this time the case study in directed protocol evolution is File Transfer, an equally primeval development -- in fact, coevolved with TELNET itself.

As defined in RFC959, edited by Jon Postel and Joyce Reynolds in October 1995, modern FTP separates into control channel operations and transfers themselves over data channels. Commands issued over the control channel to log in, navigate directory structures, and set transmission modes are all sent over a TELNET connection; the data transfers have distant roots in the Data Transfer Protocol (#264)

Cerf establishes the problem as early as May 71: why a control channel is needed, the slippery slope down to filesystem semantics, and the problems of multiplexing data transfer (bitstring until closed; chunks; control chars?)

There are two problems:

Movement of data from one site to another.

Interpretation of the data at receiving site.

-- Vint Cerf, May 19, 1971 in RFC 163

He correctly continues to identify that the former requires a "standard connection procedure" and marshalling bits of files "in the right order and know when the end of the file has been reached" -- and that "file naming, and format interpretation are left to the individual process to solve." Furthermore, "This information could be supplied either embedded in the file transmission data stream, or supplied over a separate control connection." He was writing as a followup to what 872 described as "the session at the 1970 [Spring Joint Computer Conference] in which the ARPANET entered the open literature ... entitled "Resource Sharing Computer Networks"

[Note that while 172 is "A FTP", not a few months later, 265 is "The FTP"] As of RFC 172, dated June 23, 1971, there is a note that "Alex McKenzie, BBN, is conducting a survey of network file systems to determine the practicality of standard pathname conventions." -- no record exists, though #775 was published on ??? by three other BBNers. It was revised by November 17, 1971 as RFC 265 (a/k/a "FTP-1"), which added create and append_with_create (in the end, they got folded back into store and append, as their default behaviors when the filename does not yet exist@@). At this point, we still only had hints for parsing: "A provision for indicating data structure (type and byte size) exists in FTP to aid in parsing, interpretation, and storage of data." (265) or "A provision for indicating data structure (type and byte size) exists in FTP to aid in parsing, interpretation, reconfiguration, and storage of data. " (172)-- these later became protocol modes unto themselves as MODE and STRU became FTP-2 commands. Extensibility was also a seed from the very beginning: "FTP is readily extensible, in that additional commands and datatypes may be defined by those agreeing to implement them. Implementation of a subset of commands is specifically permitted, and an initial subset for implementation is recommended." (172).

FTP-1 (265) had still not separated control and data in form, though they had in function. "Both data and control transactions are communicated over the same connection" -- but when TELNET became available for the control connection, it became a natural split.

Early seeds of CGI-BIN? "The protocol may also be extended to enable remote execution of programs, but no standard procedure is suggested." (172)

Transparent Block (B9)

'Descriptor and Counts' (BA)

Bitstream (BS/B0?) -- inherently limited to one file per connection / one connection per file.

Oddities: Transfer type 2 specified 8-bit bytes for transferring netascii -- a 7-bit payload. Allocating space in advance -- a MULTICS-specific step -- was already in FTP-1 without comment as a natural thing to do.

In general: the art of protocol specification has advanced considerably. The framers of FTP-1 relied on the fact that readers were looking at source code that's been lost to us a generation later. It is not possible to write an interoperable FTP-1 implementation from this spec: it is very incomplete about how strings like pathnames are serialized onto the wire, for example. Casually documented requirements like "A rename_from must always be followed by a rename_to request." were refined into explicit states in the machine -- an abstraction entirely missing from this early spec -- bridged by a specific response-code guiding the transition.

File enumeration already existed as a command, but the format of the data was left unspecified, as was the intention whether it belonged on the control or data channel.

"ARPANET Network Working Group (NWG), which was the collective source of the ARM, hasn't had an official general meeting since October, 1971" in http://sunsite.auc.dk/RFC/rfc/rfc871.html (a comparison of the ARPANET model with the ISO ref model) Funny: "That might not sound so impressive as a pronunciamento from an international standards organization, but the reader should be somewhat consoled by the consideration that not only are the views expressed here purported to be those of the primary workers in the field, but also at least one Englishman helped out in the review process." This September 1982 review also observed a segue from last month's column: "And when it came time to develop a File Transfer Protocol, the same virtualizing or CIR trick was clearly just as useful as for a terminal oriented protocol, so virtualizing became part of the axiom set too."

944 quoth (in that one edition of "Official Internet Protocols" only):

There has been some complaints from the Toy systems crowd recently that FTP is too complicated. Well, it may be too complicated for Toy systems, but in fact it is too simple for many Real file systems. For example, it has no way to encode a "library" (i.e., a named collection of subfiles). It is (barely) adequate for shipping around files of text, but not much more.

With the notable exception of Multics and UNIX, many operating systems support complex file structures of which the user must be aware. One is not doing the user a favor by hiding details that may reach out and bite him.

913, SFTP, worked over a single connection with a restricted BNF. Looks a LOT like Gopher eventually did - again, almost ten years later for Ankelsaria, et al. September 1984. Mark Lottor at MIT. Very similar command set, but without a TELNET control channel -- instead uses NULLs to delimit commands, and then, to send, states the byte length and just ships <count> bytes. Binary mode throws data away from 36-bit systems, oddly; Continuous mode shoves it all down. Rather than error codes, it uses response classes, + (success), - (error), and ! (proceed, i.e. logged in, changed directory, etc). [except for NAME/TOBE, the rename command pair, which uses + as its positive intermediate state]

MAIL and MLFL survived until the "great flag day" in 1982 when ARPANET moved from NCP to TCP, as orchestrated by Cerf (#781 ?). Note that there was no routing in those days: FTP servers were colocated with mailbox files, whether delivered by control channel or data.

871, "Therefore, we needed an agreed-upon representation of the commands--not only spelling the names, but also defining the character set, indicating the ends of lines, and so on. In less time than it takes to write about, we realized we already had such a [Common Intermediate Representation]: "Telnet".

So we "used Telnet", or at any rate the [Network Virtual Terminal] aspects of that protocol, as the "Presentation" protocol for the control aspects of FTP--but we didn't conclude from that that Telnet was a lower layer than FTP. Rather, we applied the principles of modularity to make use of a mechanism for more than one purpose"

640 ("error codes"), which fits in the RFC sequence according to its running page header, June 5, 1974; but is dated in the contents as June 19, 1975, must have been updated in mid-stream. Very odd to see out of sync dates. My guess is that the 1975 revision added Postel's sole-authored cover page reminding implementors that the preliminary go-ahead reply for data-connection-creating methods should be sent before using the data connection, for the benefit of hosts that find it difficult to select() amongst parallel connections.

640 was the first RFC to formalize a taxonomy for error states to be reflected in code numbers for automata simultaneously with text for humans. RFCs 354 and 454 tried rationalizing FTP-1, as did an earlier 542 for FTP-2, but this was the first principled reengineering of the codespace. Appendix E of SMTP (821, August 82) contained the same section on the "Theory of Reply Codes" as the FTP standard (October 1985). Multiline responses were also accommodated as 640 foresaw: any number of lines beginning "xyz-" followed by precisely one "xyz "-prefixed reply line. That is, for a block of lines corresponding to one reply message, all but the last are "escorted" (RFC @@) by a hyphen in the fourth column. HTTP, a far removed heritor of this philosophy, does not allow multiline responses, since it has a more principled separation of automata- and human-actionable feedback: the (single) response code line for machines; multimedia content bodies for humans.

@@691 deserves some credit for mocking the idea of reserving experimental reply code blocks like 9xx; that would specifically defeat automata, since "the user program has no way of knowing whether the reply is positive, negative, or irrelevant." Instead, we have extensibility within the broad blocks. Later, for example in HTTP, we explicitly allow any unknown xyz code to be treated as though it were an x00. 691 also mocks the specificity of the second-digit categories: "Why is 150 ("file status okay; about to open data connection") considered to be more about the file system than about the data connection?"

Digit	FTP/SMTP Reply Type	HTTP Reply Type
1xx	Positive Preliminary	Informational
2xx	Positive Completion	Successful
3xx	Positive Intermediate	Redirection
4xx	Transient Negative	Client Error
5xx	Permanent Negative	Server Error

Table 1 compares the definition of error categories in their earliest and more modern incarnations

Digit	Reply Subtype
x0z	Syntax
x1z	Information
x2z	Connection
x3z	Authentication/Accounting
x4z	Unallocated
x5z	Application (File system, mail, etc...)

Table 2 compares the definition of error subcategories

RFC 775, "Directory Oriented FTP Commands", is the first stab at directory management operations. The key to bridging disparate notions of "pathname" were 1) requests use relative paths by default, 2) replies use absolute paths, and 3) and a climb "up" operation which works without knowing the separator. With these three rules, making, removing, and navigating directory trees became possible. Originally, these were X commands, later promoted to full four-letter glory in 959.

RFC 1415, "FTP-FTAM Gateway Specification", January 1993. Making peace with the OSI File Transfer, Access, and Management protocol. The bidirectional mapping is not complete; they have nonoverlapping sets of capabilities; but the similarities are strong enough to allow "transparent" application-level gateways. The main hack is overloading the FTP SITE command to specify the X.400 address of the OSI end of the connection. Its 57-page bulk is more of an indictment of FTAM's bulk than anything else.

RFC 949, "FTP Unique-named Store Command", July 1985 is an attempt to publish another OS-specific filesystem semantic: atomic file creation with a unique file name. Intended for use in spool directories. He is correct not to overload this behavior onto STOR-with-no-filename. The refusal to do an X command name seems silly, especially in retrospect. See the memo for a silly defense of the STOU command abbreviation choice.

@@notsure@@ Data connections were traditionally active opens by the server to a well-known or cilent-specified port on the client. Only the client could initiate use of a non-standard port by using a PORT command. @@Sometime in the transition to FTP-2, PASV was added to allow the server to initially select a port and do a TCP passive open, allowing the client to complete the process upon reply with its own active open (which would now succeed because the packet-filter knew that port on that server was "OK" since the server had made and outbound call first). PASV was a good idea because it allowed third-party transfers where one client directed two other FTP servers to exchange files by creating data connections between them (one with a passive open, the other an active open). RFC 1579, "Firewall-Friendly FTP", February 1994 argues that since "packet filter-based firewalls... cannot permit incoming calls to random port numbers", FTP clients should always use PASV mode (and PASV support is required as an Internet Standard (see STD 3)). @@I'll ignore the other, speculative part of Bellovin's proposal, an APSV that makes "all passive" the default for a session.

Internet Anonymous FTP Archives were a hallmark innovation of Internet culture; their practices were described in a report of the IETF working group of the same name, IAFA. RFC 1635, "How to Use Anonymous FTP" from May 1994 explains the myriad compression, bundling, and transformation (encoding) options that made up for relevant lapses in metadata handling in FTP per se. They warn of such pitfalls as clients that choke on excessive multi-line status responses (typically, use a hyphen as the first character of your "password") -- note this is even though multi-line status responses have been legal, well-defined, and trivially parseable for twenty years at that point. It also describes common commands like mget ("multiple get") which are not protocol affordances at all, but the result of client-side processing issuing many FTP commands in succession. @@the section on binhex explicates how difficult life can be when the FTP virtual filesystem does fail -- it can't represent Macintosh files with their internal structure as data + resources directly.

RFC 1639, "FTP Operation Over Big Address Records (FOOBAR)", June 1994, sketches out one path of extensibility. In preparation for IPv6's longer addresses, this proposal allows for arbitrarily large address and port specifications for up to 256 address families. Overkill, yes, but sufficient for Experimental status.

RFC250 mentions the SEX time-sharing system; the only other mention of SEX and UCLA of note is 174.

RFC 691, "One More Try on the FTP" by Brian Harvey, May 28, 1975 has some doozies. He's arguing that many of the benefits of "FTP-2" can be delivered on FTP-1 ("Leaving Well Enough Alone" is his subtitle). "Reading several RFCs that I (sigh) never heard of before writing 686 has convinced me that although I was right all along it was for the wrong reasons." or "The great-FTP-in-the-sky isn't showing any signs of universal acceptability, and it shouldn't stand in the way of solving immediate problems." or " FTP-2 was established by a duly constituted ARPAnet committee and we are duty-bound to implement it. I don't suppose anyone would actually put it that baldly, but I've heard things which amounted to that. It's silly." or " I'm afraid that I can't work up much excitement about helping the CIA keep track of what anti-war demonstrations I attended in 1968 and which Vietnamese hamlets to bomb for the greatest strategic effect even if they do pay my salary indirectly." or " Moral #1: Security freaks are pretty weird. Moral #2: If you have a secret don't keep it on the ARPAnet."

@@minor achievement of FTP-2: straightening out the print-file mess. Turns out there are two dimensions: ascii vs ebcidic (vs many other charsets, of course, but this is the pre-i18n internet) and plain vs ASA (Fortran) formatting (vs Telnet formatting?) The major one seems to be proliferation of new MODEs and STRUctures -- public standards for optimizations between consenting filesystems.

RFC 385 was the first mention of the MAIL and MLFL commands. ("If the user field is empty... then the mail is designated for a printer..." -- shades of TELEX...). MAIL was hacked in to "allow TIP [terminal-only] users to send mail conveniently without using third hosts." After the MAIL user command, type in mail headers and message terminated by <CR><LF>.<CR><LF>, still visible in user interfaces twenty years later. In this era, there also wasn't mail routing: FTP service was colocated with mailboxes, and the sender was obliged to connect directly to the final destination.

Note 17 in RFC 385 appears to be the first use of X for experimental -- though its reservation of 9xx was a bad idea.

"The FTP is an open-ended protocol designed for easy expandability. Experimental commands may be defined by sites wishing to implement such commands. These experimental commands should begin with the alphabetic character 'X'. Standard reply codes may be used with these commands. If new reply codes need to assigned, these should be chosen between 900 and 999. If the experimental command is useful and of general interest, it shall be included in the FTP command repertoire."

@@need to make a chart of the major FTP "releases" and drafts over the years.

114, @@

264, 1971?@@

354 offline

385 Comments on the File Transfer Protocol (#354) August 18, 1972

454??

542 (Aug. 12, 1973)

RFC 309, "Data and File Transfer Workshop Announcement", March 17, 1972 (for a meeting held April 14-15, 1972). Aimed at revising FTP-1 (#265). Accommodations in Random House were then $5/night; no Kendall Square, no Howard Johnson's; the only other recommendation was a motel in Fenway. At the time, DTP apparently used binary opcodes, and they were considering ASCII in DTP, or moving wholesale to TELNET. They already wanted to transfer files to third parties then. Investigation into "Uniform Pathnames and the ARPANET virtual filesystem".

RFC 751, "Survey of FTP MAIL and MLFL", 10 December 1978 documents the migration towards using mail-in-files. Notable for incidentally citing his survey results in the earliest APRANET "url": "Complete FTP scripts may be found, if you are interested, on MIT-DM, file NETDOC;MLFL SURVEY."

RFC 141 is the earliest online reference to FTP. It's a RAND reply to Abhay's very first cut. Already confused between file transfer and file access. It's not supposed to be interactive filesystem emulation; they wanted to manage (insert, delete, replace) records within a file, as well as list files, pursuant to access controls. "Upon abnormal termination... an identifying code to facilitate precise error recognition."

RFC 607 is Mark's "Comments on the File Transfer Protocol" of January 1974. First call for regularizing all command verb lengths at 4 characters (rename BYE to QUIT). A little too rigid for the modern dogma of "be liberal in what you accept"; they also called for setting maximum line lengths, etc. Called for the inital reply to procced to be sent before data transmission, as ratified in Postel/640 and incorporated into 959. Proposed that systems which do not implement certain standard commands return 2xx rathern than errors (e.g. ALLO or ACCT) . @@before 640's rationalization, the rule apparently was "The concept that success replies should have an even first digit and failure replies an odd first digit does not apply". They already had evidence to complain of "multiple reply codes having the same meaning to a user process"

RFC 614 was the response from Multics and Neigus (BBN), also January 1974. They gave on the 4-letter words and codified maxima (including 168-character pathnames :-) They defended multi-line replies from having to be "escorted" by status codes prefixing each line -- @@ did they win in the end? They gave in on errors: "We feel the whole reply code strategy should be redesigned.". Trivia: the MULTICS line-kill editing character was... @. Can you guess what that did to email-delivery?

Related proposals

At the same time reliable, streaming file transfer was bubbling along, the seeds were already being sown for something yet simpler to transfer bags of bits -- data transfer without the virtual filesystem stuff. This is evident from the very separation at birth of DTP and FTP.

RFC 269 is the earliest proposal that resembles modern TFTP. The UCSB approach breaks a file into chunks and allows random access of same. Only one outstanding request or chunk-delivery at a time. There was no way in SMFS to ask "what files do I have there?"

TFTP itself is currently RFC1350, July 1992, "The TFTP Protocol (Revision 2)." While FTP is based on TCP (or any other stream protocol), TFTP is based on UDP (or any other datagram protocol). It cannot list files, it does not have user authentication, it has no other modes than netascii and binary, framed in 8-bit bytes. Files are sent in fixed 512-byte chunks, one outstanding data packet at a time. Error handling is usually: terminate. First setup request is sent to well-known port; actual transfers are disambiguated by immediately migrating to separate ports. At a mere ten pages, it honestly comes by its name.

Of course, no protocol is complete without its option negotiation. RFC 2347, "TFTP Option Extension" was just issued in May 1998 on the standards track, along with the blocksize (2348), timeout interval, and transfer size options (2349). The client tacks on a vector of desired options to a Read or Write request; if the server recognizes the extra options and wishes to grant some, it uses an OACK reply rather than the traditional ACK. The primary goal was to lift the 512-octet restriction, which has grown particularly onerous in LANs with MTUs of 1500+ octets. (of course, you still want to avoid IP fragmentation and reassembly!). The two proposals in 2349 affect the timeout -- how frequently the sender tries to retransmit unacknowledged data packets -- and a way of sending along the ultimate file size, so recipients have some way of judging progress.

RFC 1440, SIFT/UFT, July 1993. Has the notion of delivering unsolicited files. Comes from a BITNET heritage. In modern terms, one would either upload a file to an anonymous FTP dropbox, or accept MIME-encapsulated files in email. Instead, defines yet another virtual command and data channel, then merges them into one connection, using a very simple syntax separated by nulls. Data delivery can proceed in chunks (HTTP/1.1) or until-connection-closed (HTTP/0.9).

There's a LISA paper of marginally relevant value: unauthenticated file delivery using LPR/LPD. Allows for queueing and transfer ("move" semantics). http://www.net/~jsellens/Papers/filetsf.html

RFC 1986, Enhanced Trivial FTP, August 1996, from the US Army implements NETBLT-like buffer delivery (effectively optimized for half-duplex) within TFTP. Does not need the higher-level file management of FTP's control.

RFC 998, NETBLT. It reduces the number of acknowledgment packets, since it reconstructs whole banks of buffers, not just packet-at-a-time. Acks only at buffer boundaries. Designed for very fat (gigabit) or wide (satellite) connections where it makes sense to send whole chunks of data -- and when packets are lost, reassemble contiguous buffer sequences rather than stall.

RFC916, RATP, is a rathole. It's an "asynchronous transfer protocol" only by comparison to *modem and kermit.

For future reference:

Padlipsky, M.A., "The Elements of Networking Style and other Essays and Animadversions on the Art of Intercomputer Networking", Prentice-Hall, New Jersey, 1985.

Leiner, Barry, Robert Cole, Jon Postel and Dave Mills, "The DARPA Protocol Suite", IEEE INFOCOM 85, Washington, D.C., March 1985. Also in IEEE Communications Magazine, March 1985.

938: Internet Reliable Transaction Protocol (vs, say TIP?)

network voice protocol from ISI, already circa 1985: more evidence of the T-10 rule about starting companies...

2009: GPS-based addressing and routing

RFC122: dig out the report on "Simple Minded File System" from Postel's archives.

US-ASCII turns thirty this year.

Here's another axiom of extensibility: reserving unknown spaces. See RFC 871 for: "Indeed, at each level of the ARM hierarchy the addressing entities are divided into assigned and unassigned sets, and the distinction has proven to be quite useful to networking researchers in that it confers upon them the ability to experiment with new functions without interfering with running mechanisms."

Another design principle is well-stated in 871, and we should cite it, too: "the design principle of Parsimony, or Least Mechanism"

http://sunsite.auc.dk/RFC