FTP Must Die

TableOfContents

The File Transfer Protocol (FTP) is specified in [http://www.faqs.org/rfcs/rfc959.html RFC 959], published in October 1985.

That's from the introduction. Does anyone here knows what a TAC is? I don't. I had to look it up, since the acronym wasn't even expanded in the RFC. It took three tries in Google, and I finally found it an some obscure Cisco IOS manual -- it apparently stands for Terminal Access Control protocol. Whatever that means.

If the fact that the RFC is nearly 20 years old didn't tell you how obsolete this protocol is, that acronym should certainly start ringing the alarms.

But just to reinforce it, the next section of the RFC discusses its history.

But this would be a sad and pitiful rant indeed if I focused solely on the age of the protocol -- after, I'm older than it is (albeit just barely, if we take 1971 as the origin).

No, my reasons for disparaging FTP are more substantive.

1. Yes, Let's Mangle The Data By Default!

The first and foremost reason is not really the protocol's fault per se, but rather, must be laid squarely at the feet of the common implementations.

There is no reasonable justification for transferring all files in ASCII mode without regard to their contents! An intelligent implementation would default to an automatic detection mode, and would use ASCII mode to send files that appear to be plain text, and IMAGE mode for those that do not (the FTP client must read the file anyway, so it might as well look at it). The user could still specify ASCII or IMAGE mode before the transfer to override the automatic guessing.

But instead of doing this, whole generations of Unix and Microsoft implementations of the command-line FTP client defaulted to ASCII mode, even for files that would be damaged by the line ending conversion.

To this day, I am still in the habit of typing bin every second or third command while I'm connected to an FTP server, especially right before a get or a put. Even though recent Linux and other Unix implementations have started defaulting to IMAGE mode, a dozen years of hard-earned experience with destroyed data are not so easily forgotten. (I've lost track of how many hours of my time, and how much Internet bandwidth, have been wasted by file transfers that had to be repeated again later because I had forgotten to type bin.) And my habits pay off when I use an ancient version which still defaults to ASCII mode!

2. The Client Shall Listen For Connections From The Server!

This is one of the most astonishing misfeatures I have ever encountered. Normally, in a client-server mode, one expects the server to sit there passively awaiting requests from the client. But in FTP, there is no clear distinction between client and server. Even the RFC doesn't use the word "client". I mean that literally. If you search for the word "client" in the RFC, it isn't in there!

The RFC gives no clear language which describes this process. It never comes right out and says "the client shall pick a random port and listen on it, and send the following bytes to the server, and then the server shall connect to the client's port". (Obviously, such clarity would make the document unfit for publication.) Nevertheless, that is precisely what "active mode" FTP does.

It has to be seen to be believed. Want to see?

And the log:

griffon is my Debian box, at 192.168.2.5. pegasus is the OpenBSD box, at 192.168.2.1. The log shows that the FTP client program, initiated on griffon, first sends a TYPE I command (IMAGE mode -- no longer that ancient default of ASCII mode!) to the server, by writing 8 bytes to file descriptor 5. It gets a 20-byte response from the server.

Then the client opens the data file, and gets its metadata (via fstat). Then it creates a socket, and then assigns a local address to it with bind. The local port (sin_port) is set to 0, meaning the kernel gets to choose a random unused one. When the client calls getsockname, we learn that the kernel has chosen port 34876. The FTP client breaks this into two bytes, in network order (big endian), 136*256 + 60 = 34876. Then it starts to listen on the socket.

The client sends the command PORT 192,168,2,5,136,60 to the server, telling the server where to connect. The client is giving its own IP address (192.168.2.5) and the randomly chosen port (136,60). (Why does the protocol use commas between the octets of the IP address? Who knows.)

The client gets its response to the PORT command from the server, and then sends a STOR .profile command. The server's response is received, and then the client accepts a connection from the server (192.168.2.1, port 60024).

Now, this is just the default ("active") mode of operation. The protocol also allows the client to specify "passive" (PASV) mode, in which the server is responsibile for generating the random port and sending the information to the client, and then the client initiates the second connection for the file transfer itself. More on this below.