calculating running averages

I'm working on a community app that allows the residents of a village to log and monitor the traffic going past their houses.

There are multiple locations which have many log entries with a traffic count and a time block.

I want to let users view a location and see the average vehicles per day, as well as the average for a given month.

I'm thinking it won't be very efficient to run though every log entry and calculate a running average every time a user views the statistics page. I've never dealt with data in this manner before so I'm not sure the best practice. I suppose I could maintain a table with a record for each location. On creation or modification of a log entry I could calculate the average and store it there for viewing. Again, not sure if I'm heading the right direction.

The second part is calculating average for a give month, or indeed a year. I could present the appropriate year/month select box but how can I pull records that only relate to the given options?

I could probably pull every single record and compare the entry date to the given parameters, then act on it if it fits the time period. Once more I'm thinking this isn't the most efficient way to proceed.

Grateful for insight on this topic.

Thanks

Matt

I'm working on a community app that allows the residents of a village to log and monitor the traffic going past their houses.

There are multiple locations which have many log entries with a traffic count and a time block.

I want to let users view a location and see the average vehicles per day, as well as the average for a given month.

I'm thinking it won't be very efficient to run though every log entry and calculate a running average every time a user views the statistics page. I've never dealt with data in this manner before so I'm not sure the best practice. I suppose I could maintain a table with a record for each location. On creation or modification of a log entry I could calculate the average and store it there for viewing. Again, not sure if I'm heading the right direction.

Rails has several schemes for handling this sort of problem using caching. Have a good look at the Rails Guide on caching. It is quite complex so make sure you understand how the various methods work in order to work out which one is right for your particular problem. Come back and ask again if you are still not sure after understanding the guide.

The second part is calculating average for a give month, or indeed a year. I could present the appropriate year/month select box but how can I pull records that only relate to the given options?

I could probably pull every single record and compare the entry date to the given parameters, then act on it if it fits the time period. Once more I'm thinking this isn't the most efficient way to proceed.

Asssuming your records have a column called measured_at, for example, then you can query with comparison operators for these. So you can fetch records in a time range using something like the following in the query :conditions => ['measured_at >= ? and measured_at < ?', start_time, end_time] where start_time and end_time are derived from the values from the form.

Colin

If you want to have a 'quick' solution to the average problem why not do it like this:

a table like: id | number_of_days | total_cars

At the end of every day you add +1 to the number of days column and add the number of cars for that day. So as an example after two weeks it would look like this:

id | number_of_days | total_cars 1 | 14 | 43832

So if you wanted an average of the last 7 days, pull record 1 and then: 43832 / 14 * 7 = 21916

Its statistically not quite true because on some days you will inevitably have more traffic than on other days but its certainly the quickest solution without having to have a row for ever day. (if you have 3000 streets after 10 days you have 30000 rows...)

Of course you can also keep a record for every day and make a table like this so you dont always have to fetch all the records just for a quick and dirty solution.

Another idea is keeping a record for every day for every street like this:

id | street_id | datetime | cars | total_weekly_cars | 1 | 12 | 5-21-2011 (rapture) | 244 | 5444 |

where the total_weekly_cars are all the cars of that week accumulated. And then on Monday it resets to 0. This way you could keep a weekly average as well as a daily number of cars and only fetch 1 record for 7 days. or 4 records for 1 month. 48 records for 1 year. And it means you have accurate data for every day if you desire.

Just to give you an idea how I could approach this. (I'm not a professional programmer nor am I a mathematician. I'm 'just' and engineer :wink: )

Regards Stefano

Thanks to both for your input, I think I'll be able to implement a mostly efficient system using your ideas. I'll come back if something specific doesn't work :slight_smile:

Thanks

Matt