We are thinking of different ways to handle hot data in our system. We get a lot of information that's very relevant for periods of time. So, for example, we have some data that we query on the basis of
> counts over last 30 minutes
> data collected in the last 30 minutes
> items collected today
> items collected within the past week
...and so on.
We are looking at different strategies to keep this data updated and to manage scaling the database. So, we are looking at horizontal sharding [splitting data into multiple tables that inherit from a master and have exclusion constraints] and are also considering having tables that hold data as:
> All the data
> Data for the last 24 hours
> Data for the last 2 hours
That way, when we want something recent, we would just query the most recent table, but in the few occasions that we need something more, we go to the larger tables that are sharded.
Just for reference, we are doing something like this:
select count(*), count(distinct group_id) from
(select group_id, recorded_on from data_store_v2 order by id DESC limit 900000) td
where recorded_on >= (now() - '24 hr'::INTERVAL);
We are getting 800,000 data items a day right now and the above query takes around 14 seconds on a single table that is not sharded and has around 268million records. Every week, this table becomes slightly slower since we add close to 6 million records every week.
I've read this: https://www.amberbit.com/blog/2014/2/4/postgresql-awesomeness-for-rails-developers/ and am looking at ways that everything is managed under Rails, if possible.
So, the questions (and thanks for reading this far) are:
> What is a good way to do this while still working within ActiveRecord?
> This is a capability that we'd like to attach to any model that might need it. What would be a good way to approach a gem for it?
I'm sure there will be a few more questions as we progress on this.