At the moment your inner loop body looks like this:
map[tok] ||= 0 map[tok] += 1
which reads from the hash twice and writes to it once or twice depending on whether the value is already set.
You could do
existing = map[tok] || 0
map[tok] = existing + 1
which always does one hash read and one hash write. You could also use a hash with default value
map = Hash.new(0)
And then in your loop, just map[tok]+=1
You could also argue that building up an array of words is wasteful - you’re allocating a great big array of words but you don’t actually need the array - just one work at a time… You could instead do
@data_str.scan(/\w+/) do |token|
map[token] += 1
end
In a quick benchmark this is about 15% faster than the original on my machine
Personally I wouldn’t consider parallelising this to be something that makes it more efficient - just something that makes it faster. Forking should work fine. As you say you will have to combine the results, but that’s not a problem. You might want to look at the map-reduce approach for more examples of this
Fred