way to divide long article and store in database

I wonder if a Ruby on Rails developer has encounter this before: suppose it is a long article (say 100,000 words), and I need to write a Ruby file to display page 1, 2, or page 38 of the article, by

display.html.erb?page=38

but the number of words for each page can change over time (for example, right now if it is 500 words per page, but next month, we can change it to 300 words per page easily). What is a good way to divide the long article and store into the database?

P.S. The design may be complicated if we want to display 500 words but include whole paragraphs. That is, if we are showing word 480 already but the paragraph has 100 more words remaining, show those 100 words anyway even though it exceeds the 500 words limit.

Make each page a text file, put them all in a directory (document/1.txt, document/2.txt, etc), and then you won't even have to use the database.

Jian Lin wrote:

I wonder if a Ruby on Rails developer has encounter this before: suppose it is a long article (say 100,000 words), and I need to write a Ruby file to display page 1, 2, or page 38 of the article, by

display.html.erb?page=38

but the number of words for each page can change over time (for example, right now if it is 500 words per page, but next month, we can change it to 300 words per page easily

Why divide it in the database? Store it one field in the database, and when you fetch it from the database just perform the logic to find page=38 and then display that.

If actual testing indicates that's too slow with the actual quantity of data you expect, then you'd have to perform a word-boundary calculation on inserting the value in the db, and store the results as an 'index' to the text somehow.

Either way, I don't see any reason to actually split up the text file in the db. Unless you want to let the user _search_ for, say, word X on page N of the text. But then you're getting into complicated enough text searching land that I'd investigate using something like lucene/solr to index your text, instead of an rdbms, and seeing what support for page-boundary-based-searching eg lucene/solr have.

Jonathan Rochkind wrote:

Jian Lin wrote:

I wonder if a Ruby on Rails developer has encounter this before: suppose it is a long article (say 100,000 words), and I need to write a Ruby file to display page 1, 2, or page 38 of the article, by

display.html.erb?page=38

but the number of words for each page can change over time (for example, right now if it is 500 words per page, but next month, we can change it to 300 words per page easily

Why divide it in the database? Store it one field in the database, and when you fetch it from the database just perform the logic to find page=38 and then display that.

is it true that it all the 100,000 words are in one record (one row), then every time, the whole field needs to be retrieved. If we assume one work is about 6 characters long (with the space), then it is 600kbyte per read. I hope to make it "read as needed"... 500 words and about 3kbyte read per page each time.

If you *must* split it up in the database, your changing your mind from 500 to 300 is going to suck, otherwise you might use a "pages" assocation or something of the like which would be very simple...

for instance:

class Article < ActiveRecord::Base   has_many :pages

  validates_presence_of :text

  after_create     i = 0     b = text.scan(/\b\S+\b/)     b.each_slice(500) do |x|       self.pages.create(:page => i+=1, :text => x.join(" "))     end   end

end

class Page < ActiveRecord::Base   belongs_to :article end

Someone probably has a MUCH prettier method of doing this, was just kind of on-the-fly...

Cheers!