read_multipart bug (in Ruby's CGI and Rail's AbstractRequest)

I have written code that receives multipart forms. I'm currently testing the software by using curl to push files through a Rails controller. One of my test cases fails, and it strongly looks like the failure is due to incorrect code in read_multipart:

...       bufsize = 10 * 1024 ...         buf = buf.sub(/\A((?:.|\n)*?)(?:[\r\n]{1,2})?#{quoted_boundary}([\r\n]{1,2}|--)/n) do           body.print $1           if "--" == $2             content_length = -1           end          boundary_end = $2.dup           ""         end ...         break if buf.size == 0         break if content_length === -1       end       raise EOFError, "bad boundary end of body part" unless boundary_end=~/--/

The problem appears to be that if a bufsize chunk is read that matches the "buf.sub" regular expression above exactly, buf can become "", which has size 0, even though there is more information to be read. This can leave $2 set to "\r\n" with buf.size 0, so the EOFError exception is raised, even though there are more bytes to read (including the "--").

If I change bufsize to be 100 * 1024, or 4 * 1024, the problem goes away, because I no longer get a buffer that matches the pattern exactly. If I comment out the "break if buf.size == 0" the problem goes away, because the code can keep going and find the "--" it's looking for. Specifically, if I comment out the "break if buf.size == 0" and instrument the code to show me some of the values of the variables I get the output that I have after my signature. In it, it's clear that $2 is "\r\n" and buf.size is 0, even though there's more to come, but since we don't break on buf.size 0, the loop goes around once more and picks up the extra bytes.

Looking at all of read_multipart, I'm not sure why the "break if buf.size == 0" code is there, since earlier in the code an EOFError will be raised if body.read returns nil or "". On the other hand, the exit may be necessary to prevent some sort of infinite loop not associated body.read running into the end of the file.

I'm posting this to the Ruby mailing list, because I suspect there are people on the list intimately familiar with read_multipart who might be able to see the problem and fix it properly. The particular data set that I'm exercising contains sensitive data, but if there's interest, I can capture curl's output and scrub the data, although I'm not sufficiently familiar with CGI to know how to write a unit test for read_multipart.

I'm cc'ing this to the Rails mailing list because Rails has its own copy of the CGI code in actionpack/lib/action_controller/request.rb, and that's where I first encountered the bug.