Problem processing text file after uploading

I've got a web-app currently partially working. The user uploads a .txt, .docx or .doc file to the server.

Currently the model handles those files, saves some metadata (the extention and orig filename) then saves the file to the hard drive. Next it converts the doc and docx files to plain text and saves the output to a txt file.

My problem is I want to copy the plain text contents of those txt files to the :body field in my database, but by the time those files are written no more changes can be sent to the data base (because all the file handling is done in after_save)

Where or how do I sanely get the contents of those TXT files into the database?

See model attached:

Attachments: http://www.ruby-forum.com/attachment/7574/doc_file.rb

I've got a web-app currently partially working. The user uploads a .txt, .docx or .doc file to the server.

Currently the model handles those files, saves some metadata (the extention and orig filename) then saves the file to the hard drive. Next it converts the doc and docx files to plain text and saves the output to a txt file.

My problem is I want to copy the plain text contents of those txt files to the :body field in my database, but by the time those files are written no more changes can be sent to the data base (because all the file handling is done in after_save)

Where or how do I sanely get the contents of those TXT files into the database?

I built this feature in my first commercial Rails app. I used Paperclip for my file storage, which offers its own callback called 'after_post_process' that worked out perfectly for me.

First, I created a Paperclip processor to extract the text version of the uploaded file (mine were all PDF).

# /lib/paperclip_processors/text.rb

module Paperclip   # Handles extracting plain text from PDF file attachments   class Text < Processor

    attr_accessor :whiny

    # Creates a Text extract from PDF     def make       src = @file       dst = Tempfile.new([@basename, 'txt'].compact.join("."))       command = <<-end_command         "#{ File.expand_path(src.path) }"         "#{ File.expand_path(dst.path) }"       end_command

      begin         success = Paperclip.run("/usr/bin/pdftotext -nopgbrk", command.gsub(/\s+/, " "))         Rails.logger.info "Processing #{src.path} to #{dst.path} in the text processor."       rescue PaperclipCommandLineError         raise PaperclipError, "There was an error processing the text for #{@basename}" if @whiny       end       dst     end   end end

Then in my document.rb (model for the file attachment), I added the following bits:

  has_attached_file :pdf,:styles => { :text => { :fake => 'variable' } }, :processors => [:text]

  after_post_process :extract_text

  private   def extract_text     file = File.open("#{pdf.queued_for_write[:text].path}","r")     plain_text = ""     while (line = file.gets)       plain_text << Iconv.conv('ASCII//IGNORE', 'UTF8', line)     end     self.plain_text = plain_text   end

And that was that.

Walter

But...paperclip is OLD and unmaintained, and this is also a learning project.

So is there some (best practices) way to do the following things without having to make another pass over my doc_file or using paperclip:

1. upload .doc and store metadata 2. convert to plain text and write .txt to hard drive 3. grab contents of .txt file an store in database

Wouldn't the obvious answer be to do the file handling in before_save?

And is there a reason to write the text to a file in the first place if you're just going to save it in the DB?

Hassan Schroeder wrote in post #1067807:

Well, since it's a "learning project" maybe that would be a good place to start :slight_smile:

Alternatively, you might consider pushing the doc-to-text conversion into a background job, which adds the text of the db record once it's finished. Or use an Observer to add the text after after_save.

Multiple possibilities...

With files it is often better just to store them in files and not in the database. Certainly they should not be stored in both file and database.

Colin

Hassan Schroeder wrote in post #1067812:

Have a look at the Rails Guide on debugging for techniques that can be used to debug your code. If you still can't work out what is going on then come back with the details of the section of code that is failing to so what you expect.

Colin

Start by defining exactly what "doesn't seem to function" means :slight_smile:

Hassan Schroeder wrote in post #1067817:

You need to do some debugging to see what is going on. Is the save failing or is it not getting to the save statement for some reason? Having worked out which of those is happening then do more debugging to find out why.

Colin

OK, why not?

As Colin suggested, study the debugging guide (or just put logging statements in the code to see what's happening at each step).

I know you guys seem to be sticking to the RTFM hardline, but it seems as though debugging in the model has very few options without importing a bunch of gems.

Even on the page recommended there are 35 mentions of controller, and only 4 mentions of model.

I installed debugger 'gem install debugger', but it doesn't integrate at all with webrick ('rails s') and there apparently is no ruby-debug for 1.9.3 (ughh..)

I've put a bunch of logger.info in my model, but I now know no more than I did before.

When store_docfile is called before after_save, it never even gets to the first line containing the logger.info "we are now in store_docfile" message.

I have a feeling this might be something deeper than a tiny typo *shrug*

If one of you could PLEASE just look at my model and help me figure out what's up, it would be appreciated.

I don't see any obvious problems in your original file.

If not with after_save, how are you calling store_docfile now? You might want to post your new code for the model (and controller).

Hassan Schroeder wrote in post #1067836:

In your new example file, it's no surprise you're not seeing anything -- you're never calling `store_docfile` at all. (No, that random standalone `:store_docfile` doesn't do what you're hoping it does.)

Either invoke it from a before_save, or make it a non-private method (at least temporarily) and invoke it explicitly from your controller and see what happens.

That is the clue then, but you are misinterpreting what you are seeing. If it is not getting to the first line then it is not in fact calling the method at all. Check out how you are calling it.

Colin

Perhaps you could start by “learning” how to decide whether a gem is unmaintained. For instance:

https://github.com/thoughtbot/paperclip/commits/master/

doesn’t exactly look like “no activity” to me…

–Matt Jones