Attach pictures gets very slow

Hello folks, I want to add about 15000 pictures. But after about 800 pictures it becomes very, very slowly. Does anyone know what could be the reason and how to solve it? The source and destination are on the same SSD

path = Dir["/home/username/pictures/M/**/*.jpg"]

counter = 1
path.each do |row|
  User.first.pictures.attach(io: File.open(row), filename: "%05d" % counter + ".jpg")
  counter += 1
end

What I suspect may be happening here is that the first thousand or more are going as fast as they possibly can, but as the number of files open and blocks in the garbage increase, your computer struggles to maintain all those pieces at once. Someone else here may be able to give you the tools to actually inspect this and confirm or refute my assertion here – I am entirely self-taught, but I have a lot of “machine empathy” and intuition guiding this theory.

You’re probably building up a huge amount of wasted memory or unharvested GC because you are are calling User.first inside your each. You only need that once (and ideally you would set it outside of the iterator). Further, you could use each_with_index to avoid incrementing the counter, although I don’t imagine that is causing you any actual memory or garbage collection issues.

Finally, you may be creating 15K tempfiles or open file references, and not harvesting them until the outermost (implicit) block closes. Try creating a block inside your iterator with the File.open as that will definitely close the file before moving on to the next one.

What you’re aiming for with this many turns of the wheel is for each one to maybe be a bit slower than your fastest loop time, but for them to each take exactly the same amount of time, so it doesn’t get slower as you go. Perhaps this will work:

path = Dir["/home/username/pictures/M/**/*.jpg"]
@user = User.first
path.each_with_index do |row, idx|
  File.open(row) do |file|
    @user.pictures.attach(io: file, filename: "%05d" % (idx + 1) + ".jpg")
  end
end

Hope this helps,

Walter

Hello Walter, many thanks for your help.

Unfortunately there was no significant improvement. The process is still getting slower every minute.

I’m thinking about not using Active storage. At least not for the first import. This would also work if I copy the pictures into the project and create a separate table for them.

2000 pictures ~ 50 minutes

Are you operating inside a transaction by chance?

I am experiencing the same problem as Tron0070 with exactly the same sort of each loop attaching a file per iteration. It gets slower and slower over the time, even though the logs do not show an increase in any queries being done to the database. I am not running the loop inside a transaction. Tron0070 did you manage to get it solved? Anyone has other ideas what might be causing it?

I solved the problem by not using Ruby anymore. There were too many problems and too few solutions.

Any particular reason to do it one-by-one? attach signature seems to accept an array of blobs

And if you look at the implementation it does seem to concatenate current blobs with whatever being passed as attachables.

If I were to guess the slowness must be coming from either accessing all blobs or keep reassigning blobs + attachables on every iteration.

It could be record.save as well in case if there is a callback that iterates over each blob after save. Or maybe the read at the end - record.public_send("#{name}")

Shouldn’t be hard to narrow it down by logging some timestamps in between each iteration along with in between every line of the attach method. This should show how much each piece of the work takes along with which piece requires more and more time to complete.

Best way to attach that many images is to split each attachment into a job.

class AttachImageJob
  def perform(user, path, index)
    user.pictures.attach(io: File.open(row), filename: "%05d" % index + ".jpg")
  end
end

user = User.first
Dir["/home/username/pictures/M/**/*.jpg"].each_with_index do |row, index|
  AttachImageJob.perform_later(user, row, index + 1)
end

I wonder if disabling the query cache might help here? Sometimes that can unexpectedly cause this kind of behavior if you don’t know that you’re using it and it keeps growing.

https://api.rubyonrails.org/classes/ActiveRecord/QueryCache/ClassMethods.html#method-i-uncached