How to improve code efficiency of my example

I was just at a job interview and was asked to write a function that did a frequency of various words from a string.

I wrote something out on the board similar to my freq() method below. When I got home I coded this example up as

an exercise

At the moment your inner loop body looks like this:

map[tok] ||= 0 map[tok] += 1

which reads from the hash twice and writes to it once or twice depending on whether the value is already set.

You could do

existing = map[tok] || 0

map[tok] = existing + 1

which always does one hash read and one hash write. You could also use a hash with default value

map = Hash.new(0)

And then in your loop, just map[tok]+=1

You could also argue that building up an array of words is wasteful - you’re allocating a great big array of words but you don’t actually need the array - just one work at a time… You could instead do

@data_str.scan(/\w+/) do |token|

map[token] += 1

end

In a quick benchmark this is about 15% faster than the original on my machine

Personally I wouldn’t consider parallelising this to be something that makes it more efficient - just something that makes it faster. Forking should work fine. As you say you will have to combine the results, but that’s not a problem. You might want to look at the map-reduce approach for more examples of this

Fred