Insights for SQL, cache, gem, regarding performance issues.

Hi there,
I would like to know some ideas from different people upon this performance issue I am going through.
I contribute to a gem named impressionist, It basically saves an ‘impression’ of a end user.
Everything works fine, I’m refactoring the code right now so if you checkout the gem, don’t be scared.
An user has posted a Performance Issue on github whereas he has 220.000 rows on their db(pg) and impressionist is working very slow.
The reason is, In a app controller one can specify how impressions are going to be saved, so one could choose to only log unique impressions.
Fair enough, However impressionist has to look through all the records in the db in order to save an unique impression using the following query:

(13684.8ms) SELECT DISTINCT COUNT(DISTINCT “impressions”.“request_hash”) FROM “impressions” WHERE “impressions”.“impressionable_id” = $1 AND “impressions”.“impressionable_type” = $2 [[“impressionable_id”, 60], [“impressionable_type”, “Artist”]]

As you can see it goes through the db searching for a distinct request_hash on that particular record saved.
That’s where it all started, It basically takes 13 seconds to do this task and then does something else, Obviously it is not suitable for production mode.

The solution I’ve got in mind is:
  • Use Ruby Threads(ThreadsWait standard ruby lib, written by Keiju ISHITSUKA) and Mutex to prevent deadlocks

  • Also use Memcache

Problems are:
  • Using cache I’d have to tell impressionist to do a in memory search(cached results) and upon that it may or may not save an impression.

  • I don’t know if this is viable, because Impressionist would have to go through all the cached records and to see if there wasn’t any record saved.

Thanks ever so much

Have fun, ABC( Always Be Coding)