FTP Must Die
A GreyCat rant.
Contents
- Yes, Let's Mangle The Data By Default!
- The Client Shall Listen For Connections From The Server!
- Firewall? What's A Firewall?
- You're Firewalled Too? Oh, Crap!
- What's Your Password? xyzzy? Great!
- I Love Sitting Around Waiting For Ten Round Trips To Get One File!
- Other People Agree
- And another thing... easy corruption if files are large/connection is poor.
- To Summarize
The File Transfer Protocol (FTP) is specified in RFC 959, published in October 1985.
The attempt in this specification is to satisfy the diverse needs of users of maxi-hosts, mini-hosts, personal workstations, and TACs, with a simple, and easily implemented protocol design.
That's from the introduction. Does anyone here know what a TAC is? I don't. I had to look it up, since the acronym wasn't even expanded in the RFC. It took three tries in Google, and I finally found it in some obscure Cisco IOS manual -- it apparently stands for Terminal Access Control protocol. Whatever that means.
Almost. http://tools.ietf.org/html/rfc931 says:
The only other names permitted are "TAC" to specify a BBN Terminal Access Controller ...
- Now, what is BBN? Bolt Beranek and Newman? (Note from Lance: For all you spring chickens, BBN usually refers to Bulletin Board Network)
That would be Terminal Access Controller, a piece of hardware manufactured by Bolt Beranek and Newman (Now BBN) which connected dumb terminals to the ARPANET.
- Now, what is BBN? Bolt Beranek and Newman? (Note from Lance: For all you spring chickens, BBN usually refers to Bulletin Board Network)
If the fact that the RFC is over 30 years old didn't tell you how obsolete this protocol is, that acronym should certainly start ringing the alarms.
But just to reinforce it, the next section of the RFC discusses its history.
FTP has had a long evolution over the years. Appendix III is a chronological compilation of Request for Comments documents relating to FTP. These include the first proposed file transfer mechanisms in 1971 that were developed for implementation on hosts at M.I.T. (RFC 114), plus comments and discussion in RFC 141.
But this would be a sad and pitiful rant indeed if I focused solely on the age of the protocol -- after all, I'm older than it is (albeit just barely, if we take 1971 as the origin).
No, my reasons for disparaging FTP are more substantive.
1. Yes, Let's Mangle The Data By Default!
The first and foremost reason is not really the protocol's fault per se, but rather, must be laid squarely at the feet of the common implementations.
3.1.1.1. ASCII TYPE This is the default type and must be accepted by all FTP implementations. It is intended primarily for the transfer of text files, except when both hosts would find the EBCDIC type more convenient.
There is no reasonable justification for transferring all files in ASCII mode without regard to their contents! An intelligent implementation would default to an automatic detection mode, and would use ASCII mode to send files that appear to be plain text, and IMAGE mode for those that do not (the FTP client must read the file anyway, so it might as well look at it). The user could still specify ASCII or IMAGE mode before the transfer to override the automatic guessing.
But instead of doing this, whole generations of Unix and Microsoft implementations of the command-line FTP client defaulted to ASCII mode, even for files that would be damaged by the line ending conversion.
To this day, I am still in the habit of typing bin every second or third command while I'm connected to an FTP server, especially right before a get or a put. Even though recent Linux and other Unix implementations have started defaulting to IMAGE mode, a dozen years of hard-earned experience with destroyed data are not so easily forgotten. (I've lost track of how many hours of my time, and how much Internet bandwidth, have been wasted by file transfers that had to be repeated again later because I had forgotten to type bin.) And my habits pay off when I use an ancient version which still defaults to ASCII mode!
2. The Client Shall Listen For Connections From The Server!
This is one of the most astonishing misfeatures I have ever encountered. Normally, in a client-server model, one expects the server to sit there passively awaiting requests from the client. But in FTP, there is no clear distinction between client and server. Even the RFC doesn't use the word "client". I mean that literally. If you search for the word "client" in the RFC, it isn't in there!
The RFC gives no clear language which describes this process. It never comes right out and says "the client shall pick a random port and listen on it, and send the following bytes to the server, and then the server shall connect to the client's port". (Obviously, such clarity would make the document unfit for publication.) Nevertheless, that is precisely what "active mode" FTP does.
It has to be seen to be believed. Want to see?
griffon:~$ strace -o log ftp pegasus Connected to pegasus.wooledge.org. 220 pegasus.wooledge.org FTP server (Version 6.6/OpenBSD) ready. ... ftp> cd /var/tmp 250 CWD command successful. ftp> put .profile local: .profile remote: .profile 200 PORT command successful. 150 Opening BINARY mode data connection for '.profile'. 226 Transfer complete. 1231 bytes sent in 0.01 secs (179.1 kB/s) ftp> quit 221 Goodbye.
And the log:
write(1, "local: .profile remote: .profile"..., 33) = 33 write(5, "TYPE I\r\n", 8) = 8 read(3, "200 Type set to I.\r\n", 4096) = 20 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 open(".profile", O_RDONLY|O_LARGEFILE) = 6 fstat64(6, {st_mode=S_IFREG|0644, st_size=1231, ...}) = 0 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 7 bind(7, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("192.168.2.5")}, 16) = 0 getsockname(7, {sa_family=AF_INET, sin_port=htons(34876), sin_addr=inet_addr("192.168.2.5")}, [16]) = 0 listen(7, 1) = 0 write(5, "PORT 192,168,2,5,136,60\r\n", 25) = 25 read(3, "200 PORT command successful.\r\n", 4096) = 30 write(1, "200 PORT command successful.\n", 29) = 29 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 write(5, "STOR .profile\r\n", 15) = 15 read(3, "150 Opening BINARY mode data con"..., 4096) = 57 write(1, "150 Opening BINARY mode data con"..., 56) = 56 accept(7, {sa_family=AF_INET, sin_port=htons(60024), sin_addr=inet_addr("192.168.2.1")}, [16]) = 8
griffon is my Debian box, at 192.168.2.5. pegasus is my OpenBSD box, at 192.168.2.1. The log shows that the FTP client program, initiated on griffon, first sends a TYPE I command (IMAGE mode -- no longer that ancient default of ASCII mode!) to the server, by writing 8 bytes to file descriptor 5. It gets a 20-byte response from the server.
Then the client opens the data file, and gets its metadata (via fstat). Then it creates a socket, and then assigns a local address to it with bind. The local port (sin_port) is set to 0, meaning the kernel gets to choose a random unused one. When the client calls getsockname, we learn that the kernel has chosen port 34876. The FTP client breaks this into two bytes, in network order (big endian), 136*256 + 60 = 34876. Then it starts to listen on the socket.
The client sends the command PORT 192,168,2,5,136,60 to the server, telling the server where to connect. The client is giving its own IP address (192.168.2.5) and the randomly chosen port (136,60). (Why does the protocol use commas between the octets of the IP address? Who knows.)
The client gets its response to the PORT command from the server, and then sends a STOR .profile command. The server's response is received, and then the client accepts a connection from the server (192.168.2.1, port 60024).
Now, this is just the default ("active") mode of operation. The protocol also allows the client to specify "passive" (PASV) mode, in which the server is responsible for generating the random port and sending the information to the client, and then the client initiates the second connection for the file transfer itself. More on this below.
There is actually one scenario that uses this decomposition of the FTP file transfers into two TCP connections. Besides specifying the port the client is expected to specify an IP address the server should connect to. And the IP address for the data connection does not have to be the same as the clients IP. Recall that it is also possible to specify if the server should listen or actively connect. This way a client can actually transfer a file between two servers without getting the file data. When the client connection is not as fast as the server connections and the file is large enough it may be a valuable option. Besides the file data is not transferred twice. This scenario is still subject to all the security issues discussed below, just as ordinary transfers.
3. Firewall? What's A Firewall?
The File Transfer Protocol predates the common use (and possibly even the invention) of concepts such as Network Address Translation (NAT) and firewalls. It dates back to a simpler time, when all computers on the Internet were true peers, and there was little reason to expect malicious intent.
But in the 21st century, the majority of end user machines have non-routable IPv4 addresses. This is due not only to the common use of firewalls, but the simple shortage of IPv4 addresses in most parts of the world outside the USA.
What does this mean for FTP? In the previous section of this rant, we saw that the client advertises its own IP address and randomly-chosen port number to the server, and expects the server to initiate an IP socket connection to the client program on that address.
But what happens when the client's IP address is non-routable? In the case of my own Debian box (griffon), my IP address is 192.168.2.5. That's not a real Internet address -- it's from one of the reserved private IP ranges, and my connection to the Internet is through a NAT gateway which also acts as a firewall.
If I make an active-mode FTP connection from griffon through a simple NAT to an Internet host, and then try to transfer a file, I'm going to tell the FTP server to connect to me on IP address 192,168,2,5 port X,Y. That won't work!
So it's a good thing we have passive mode, right? A client behind a NAT can tell the server to make up a random port, listen on it, and advertise it to the client. The client can then make a second connection, which the NAT will handle properly.
That is, unless the server is also firewalled.
4. You're Firewalled Too? Oh, Crap!
It is extraordinarily uncommon for a server to sit directly on the Internet without some sort of firewall protecting it these days. In many cases, that firewall may also take the form of a NAT gateway. The FTP server may be sitting inside a NAT, with the default FTP control and data ports (TCP ports 21 and 20) forwarded (redirected) from the firewall to the FTP server.
When an active-mode FTP client makes a connection to a server that's inside a NAT, everything works fine. The client makes up a random port, listens on it, and tells the server "connect to me on IP 200,201,202,203 PORT 204,205". The server can do that; the NAT handles it smoothly.
But when a passive-mode FTP client makes a connection to a NATted server, things don't go so well! The server chooses a random port and says to the client, "connect to me on IP 10,11,12,13 PORT 14,15". But alas, the server's IP address is non-routable! The client can't connect to that IP address.
The simple fact is that a NATted client and a NATted server cannot establish an FTP connection with each other, no matter which mode they choose (active or passive). Neither one works!
(Some firewalls go beyond simple NAT, and offer special hacks to try to let FTP work. So, in practice, it is sometimes possible to set this up, but you'll have to use special firewall-specific trickery to do it.)
(WillDye adds: Some consider PASV (passive) mode to be a security risk. The Firefox web browser, for example, ignores alternate server addresses, according to http://www.mozilla.org/security/announce/2007/mfsa2007-11.html .)
5. What's Your Password? xyzzy? Great!
As stated earlier, FTP predates the age when Internet activity was expected to be malicious. As such, it has no provisions for security against password sniffing, man in the middle attacks, and so on.
Your username and password are transmitted in the clear from the FTP client to the FTP server. Anyone with control over any of the routers along the path from client to server can read the entire session, including your password.
So, one can wrap the FTP session in an SSL tunnel, right? One can just use stunnel to encrypt the session, right? Well, that's half right. Remember, FTP doesn't just use a single connection from client to server and multiplex the data over that open channel. It opens a new channel for every single data transfer, even directory listings. So, while tunneling the FTP control connection may work and may protect the username and password from spying eyes, the data connection will still be unprotected.
Setting up an SSL tunnel would require special steps on both the client and the server. It would be a half-functional jury-rigged hack, and completely inferior to the methods of data transfer that have been developed in the two decades since the File Transfer Protocol specification was published.
If security of your authentication credentials matters one iota to you, you'll use scp to transfer files, not FTP.
6. I Love Sitting Around Waiting For Ten Round Trips To Get One File!
Retrieving a single file from an FTP server involves an unbelievable number of back-and-forth handshaking steps.
- The client makes a TCP socket connection to the FTP server's control port and waits for the TCP handshake to be completed.
- The client waits for the server to send its "banner".
- The client sends the username to the server and waits for a response.
- The client sends the password to the server and waits for a response.
The client sends a SYST command to the server and waits for a response.
The client sends a TYPE I command to the server and waits for a response. (This may happen later, but it must happen at some point.)
If the user needs to change directories on the server, the client sends another command to the server and waits for a response.
(In active mode) the client sends a PORT command to the server and waits for a response. (In passive mode, this would go in the opposite direction, but it's still another round trip.)
- A data connection (an entire new TCP socket connection, with the full three-way handshake) is established.
- The bytes of data are sent over the data connection.
- The client waits for the server to send a 2xx message on the control connection to indicate successful transfer.
The client sends a QUIT command to the server and waits for the server to say good-bye.
That's ten full round trips to get one file, and that's if you count TCP socket connections as just one round trip! (In reality, they're more involved than that.)
Let's see how many round trips it takes to get a file over HTTP:
- The HTTP client makes a TCP socket connection to the HTTP server.
The HTTP client sends a GET command to the HTTP server, including the URL, the HTTP protocol version, the virtual host name, and various optional headers, all at once, and waits for a response.
There is no step three. The response we just got includes the entire data stream. We're done!
That's two round trips (counting the TCP socket as just one). By that admittedly simple metric, HTTP is five times as efficient as FTP for fetching a single file from a server.
Both FTP and HTTP become more efficient if you transfer more than one file at a time, of course. With FTP, you don't have to send the username and password again for each new file you transfer within the same session. But HTTP has persistent sockets, so you can ask for another object (file) within the same TCP connection there as well.
If you want to set timestamp on a retrieved file (wget -N or curl -R) you have to get full directory listing.
7. Other People Agree
I received an email from someone who was apparently too shy to edit the wiki himself, but I found his insights quite interesting. I'm including them here with wiki formatting, and one typo correction; the rest is entirely his.
I have noticed your wiki page about FTP protocol being shit lacks some crucial information. As someone who had made the mistake of thinking it would be simple to implement a FTP server (http://repo.or.cz/w/dftpd.git/) I have some unique insight, you might be interested in including on your web page.
- The general consensus is that the FTP RFC specification and the reality are two separate things, but let me go into the details:
- The RFC is vague about server state changes. You have a global list of replies with one-sentence general descriptions and list of allowed reply codes for every FTP command. But how do these two relate to each other you have to guess yourself. For example, what happens if client wants to send a file, while there's already a file transfer? Should the reply be "425 Can't open data connection."? Or maybe "426 Connection closed; transfer aborted."? You just don't know.
- The RFC states:
5.1. MINIMUM IMPLEMENTATION In order to make FTP workable without needless error messages, the following minimum implementation is required for all servers: TYPE - ASCII Non-print MODE - Stream STRUCTURE - File, Record COMMANDS - USER, QUIT, PORT TYPE, MODE, STRU, for the default values RETR, STOR, NOOP.
The LIST command is Maximum Bullshit. And I'm not even touching the subject of opening a data connection to send the list itself. Did you know that the specification doesn't tell anything about WHAT should be sent as the list? The RFC states that "This command causes a list to be sent from the server to the passive DTP." and makes note that "the information on a file may vary widely from system to system, this information may be hard to use automatically in a program, but may be quite useful to a human user.". So you might get an ascii renderition of directory contents and you are expected to handle it. Thankfully what the servers do and what clients expect is the result of the unix ls command. But oh wow, do you know what konqueror sends when it wants to list directory contents? For reference, the allowed syntax of LIST command is "LIST [<SP> <pathname>] <CRLF>". Konqueror sends "LIST -la". RFC abiding server sends 450 error, as there is no file named "-la", to which konqueror responds with "LIST", but has problems with server's reply. I resorted to simply ignoring the "-la" crap.
- You have skimmed the subject of FTP's ancient history by mentioning TAC. My server sends this little welcome message: "You won't be able to transmit EBCDIC encoded 39-bit byte record-structured files with telnet format controls in block mode to your DEC TOPS-20s mainframe.". Yes, all that crap is required to be supported by RFC.
- Section 4.1.2.6 of RFC 1123 (October 1989): "The format of the 227 reply to a PASV command is not well standardized. In particular, an FTP client cannot assume that the parentheses shown on page 40 of RFC-959 will be present (and in fact, Figure 3 on page 43 omits them). Therefore, a User-FTP program that interprets the PASV reply must scan the reply for the first digit of the host and port numbers.". Can you guess what happens in real world? The "standard compliant" firefox fails horribly when the parentheses are not included in reply.
- At that point I've reached the point at which Total Commander seemed to be working somewhat reliably and web browsers were able to communicate with the server, so I stopped the development. There were many other unresolved issues in a wide variety of clients, but I didn't feel that checking what every possible FTP client on the planet expects would be something worthwhile.
8. And another thing... easy corruption if files are large/connection is poor.
From someone brave enough to edit the wiki directly.
There is something else that makes FTP even more unbelievably shit. It isn't widely understood but it generally involves the use of the PUT command to upload files (but could occur on a GET as well if the circumstances conspire and the client sucks).
It generally affects transfers which take time, involving large files and/or slow connections. Basically, if the socket fails during a PUT due to a connectivity problem or glitch between the client and server, the server may assume that the transfer has successfully completed. The file could be short by GB, but the receiving end has no way of knowing because this useless protocol doesn't take the simple step of telling the remote end how big the file is going to be before the transfer starts!
A common workaround for this is to add post transfer file size checking -- e.g. PUT the file under a temporary name, upon transfer completion check size of remote file matches local file and rename to final name. But in the 21st century no-one should be needing to workaround critical flaws in a shonky old protocol.
9. To Summarize
FTP is an outdated, insecure, slow and unfriendly pig of a protocol. It has no business being on the Internet in the 21st century.
FTP MUST DIE!