out of memory (java heap space) on zip creation (jruby)

I am using rubyzip and am trying to put a huge csv file with 1.4 million rows into the zip file. Using jruby I get a out of heap error.

I believe the error happens in the block below:

    Zip::ZipOutputStream.open(zip_path) do |zos|       zos.put_next_entry(File.basename(csv_path))       zos.print IO.read(csv_path)     end

You’re reading the entire file contents into memory and then saving.

Look if there is a way for you to stream chunks (16 kilobytes for example) into the zip stream.

The error happens on the line:

zos.print IO.read(csv_path)

I see that p zos.class shows: Zip::ZipOutputStream

and that the print method is inherited from: http://rubyzip.sourceforge.net/classes/IOExtras/AbstractOutputStream.html where print is shown to be this according to doc:

# File lib/zip/ioextras.rb, line 130     def print(*params)       self << params.to_s << $\.to_s     end

I am not sure offhand how to stream the data, but gathered that the problem was from reading the file into memory

The default heapsize for the jvm is pretty small. I believe you can pass args to jvm when you start jruby

if you do something like -xmx1024m (Not sure that syntax is exactly correct, but it’s close) you might get enough. Of course that depends on the size of the file

Well, the csv file has something like 1.4 million rows and maybe 20 columns or something like that. When I get a chance, maybe I'll look into that if that seems like the thing to try ..

Jedrin wrote in post #1060204:

The default heapsize for the jvm is pretty small. I believe you can pass args to jvm when you start jruby

if you do something like -xmx1024m (Not sure that syntax is exactly correct, but it's close) you might get enough. Of course that depends on the size of the file

-- Greg Akinshttp://twitter.com/akinsgre

Well, the csv file has something like 1.4 million rows and maybe 20 columns or something like that. When I get a chance, maybe I'll look into that if that seems like the thing to try ..

"When I get a chance, maybe..."???

Greg gave you the answer. A default JVM instance heap space is limited to 64 Megabytes. If the file you're loading, plus the memory consumed by your application, goes over that memory limit the JVM will report "out of memory" and begin exhibiting unpredictable behavior.

It make no difference how much physical RAM your machine might contain. The JVM will NOT use more heap space that the maximum defined by the -xmx argument (-xmx64m being the default when not specified).

Greg gave you the answer. A default JVM instance heap space is limited to 64 Megabytes. If the file you're loading, plus the memory consumed by your application, goes over that memory limit the JVM will report "out of memory" and begin exhibiting unpredictable behavior.

It make no difference how much physical RAM your machine might contain. The JVM will NOT use more heap space that the maximum defined by the -xmx argument (-xmx64m being the default when not specified).

-- Posted viahttp://www.ruby-forum.com/.

So I launched my sinatra app like this and from my google searches the -J arg looks like what I want.

jruby -J-Xmx1024m -S recordset.rb

When I tried to download the csv file (which the server puts into the zip file and then crashes), I got the same heap space error, but it seemed like it did run longer before it crashed. II try to increase that number much higher than 1024m, I get:

Error occurred during initialization of VM Could not reserve enough space for object heap JVM creation failed

When I tried to download the csv file (which the server puts into the

zip file and then crashes),

I got the same heap space error, but it seemed like it did run longer

before it crashed. II try to increase that number much higher than

1024m, I get:

The heap contains all the objects created for the application… In this case, it looks like your file is still too big

Error occurred during initialization of VM

Could not reserve enough space for object heap

JVM creation failed

This means that you tried to allocate more than is available on the machine

Are you doing this for a single load, or will it be an application that will commonly receive large files?

If it’s the latter, I’d probably try to redesign the code you’re using to load the files. Sounds like this is part of a third party gem? If that’s the case, maybe they have some mechanism for handling larger files?

The heap contains all the objects created for the application.. In this case, it looks like your file is still too big

> Error occurred during initialization of VM > Could not reserve enough space for object heap > JVM creation failed

This means that you tried to allocate more than is available on the machine

Are you doing this for a single load, or will it be an application that will commonly receive large files?

If it's the latter, I'd probably try to redesign the code you're using to load the files. Sounds like this is part of a third party gem? If that's the case, maybe they have some mechanism for handling larger files?

-- Greg Akinshttp://twitter.com/akinsgre

What I do is create a csv file from the database. I had some memory problems there, but using active record find_in_batches() seemed to solve that.

The CSV file has 1.4 million rows. It gets created successfully. I then use rubyzip gem to create a zip file that just contains that CSV file. I just used examples I found from google searches on how to create the zip file which are shown earlier up in the thread. I looked at the class info on the web for rubyzip and didn't see an obvious way to stream data into the zip file. Tomorrow I can look at perhaps some other way to create a zip file using a different gem or some such ..

As I mentioned in my previous reply and similar to the problem you had when creating the file: you’re trying to load the whole thing.

There are two options for this:

A) You stream the contents of your CSV file, reading by chunks into a ZipStream

or

B) You zip the file from outside Ruby (shelling out to gzip for example)

As I mentioned in my previous reply and similar to the problem you had when creating the file: you're trying to load the whole thing.

There are two options for this:

A) You stream the contents of your CSV file, reading by chunks into a ZipStream

That's exactly what I would like to do, I wasn't sure offhand if the zip method will read it that way or how to pass it. I was hoping for an idea on how to do that.

The code where it all happens is here and the second line is where it crashes:

zos.put_next_entry(File.basename(fpath)) zos.print IO.read(fpath)

zos is an instance of Zip::ZipOutputStream. The print method is inherited from IOExtras::AbstractOutputStream

According to the docs, print() is like this     def print(*params)       self << params.to_s << $\.to_s     end

Since it does params.to_s, I'm guessing that is going to put it all into memory. The other methods may have similar problems.

However, the putc method looked interesting.

There is a putc() defined like this according to the docs:

def putc(anObject)       self << case anObject               when Fixnum then anObject.chr               when String then anObject               else raise TypeError, "putc: Only Fixnum and String supported"               end       anObject     end

So I tried that, here is my code, and the output follows, but the file I was trying to zip was another zip file. It appeared to be a bit bigger than it should have been and when I tried to open it, I got an error saying it was corrupted.

This isn't quite the same CSV problem, but I am doing a zip file into a zip file here.

  def zput(zos,fpath)     p fpath     zos.put_next_entry(File.basename(fpath))     f = File.new(fpath)     chunk_sz = 10000000     while !f.eof?       data = f.read(chunk_sz)       zos.putc data       puts 'read ' + data.size.to_s + ' bytes'     end   end

"web.war" read 10000000 bytes read 10000000 bytes read 8573823 bytes "data.war" read 10000000 bytes read 8655347 bytes "big.zip" read 10000000 bytes read 10000000 bytes read 10000000 bytes read 10000000 bytes read 10000000 bytes read 10000000 bytes read 10000000 bytes read 10000000 bytes read 10000000 bytes read 10000000 bytes read 10000000 bytes read 10000000 bytes read 10000000 bytes read 10000000 bytes read 10000000 bytes read 10000000 bytes read 10000000 bytes read 3431079 bytes

I changed the putc about to a write in the above post, followed by zos.print "" at the very end. print() adds $\ to the file it appears. My byte size of the zip file inside the zip was short by two bytes and I still get corrupted zip file errors on that.

It's late Friday and I am done for the day, but I just tried something else. It may be that I need to open the file in binary mode and I didn't. Initial tests seem to indicate that may be the case. Thanks for everyone's help.