Best way to go - query large database?

Hi, looking for the best solution here. This is the situation: I have a large database (at least 100000 records). I have to find all the records from a group of people from a certain date, and after that I have to query and make calculations with a set of those records from every member of that group. Looping through every member of the group, and calculating, would take me very long, I'm afraid. I was wondering if it is possible to do the first query, and then loop (query) through the Array without going to the database again. To do a series of query on a (much smaller) Array, that is. Is that possible? If so, how? Is that the best way then? If it is not possible, what would be the fastest way to go? Or is there another to look at this problem? Many thanks for your thoughts ande suggestions.

One possibility would be to use CREATE TEMPORARY TABLE and create your smaller recordset that way.

Another way would be to use sql to do the calculations if possible on the fly using only the first query. (find_by_sql query)

Yet another way could be to pull all the data out with joins/relations (assuming you have multiple tables here) and then loop through it with code performing the calculations that way without going back to the database since you hopefully can get all your data out with one query.

The third one would probably be my choice, but I don't know how the data is stored.

Hopefully that could get you started.

Fredrik

Are the latter calculations just an attempt to further filter the results, or are you calculating new data? If the former, you can probably just use more complicated conditions to find the data. If the latter, you can do the calculation on the database using a more complex find_by_sql statement, or in Rails by executing Ruby code on the result set. The latter will probably take longer if the system is small, but be much faster when you have multiple clients hitting the server to do that kind of operation.

-faisal

Hi, looking for the best solution here. This is the situation: I have a large database (at least 100000 records). I have to find all the records from a group of people from a certain date, and after that I have to query and make calculations with a set of those records from every member of that group. Looping through every member of the group, and calculating, would take me very long, I'm afraid. I was wondering if it is possible to do the first query, and then loop (query) through the Array without going to the database again. To do a series of query on a (much smaller) Array, that is. Is that possible? If so, how?

You may be able to pull in all the required objects through eagerly loaded associations.

Is that the best way then?

For more specific suggestions you need to put more specifics into your question. It may even be sensible to offload everything you're trying to do to a single, probably complicated, SQL statement.

Michael

Let me try to describe the situation a bit more clear: I have 20 users who all have given points to each other. Each 'answer/point' is stored in a record. (= 20 x 20 = 400 records) What I need is to calculate the average of the points for every user, and list them. (Because I have maybe 100 users in total, who are giving points on a lot of occasions, the database will be be growing rapidly) What I'm looking for is the most stable (speedwise) way to get the appropriate records and do the calculations. I'd prefer to avoid a situation were results showing up would slowly start to take more time as the database got bigger. Thanks for any suggestion anyway.

you can presumably do this with a join, nested select, and some sort of average (e.g. sum(points)/count(points), assuming that the points are stored in floats).

that said, it's probably cleaner, easier, more portable, and faster* to select all the records you want and then use ruby to calculate the averages. sql will probably do a faster job of calculating the data for small data sets, but as the data grows and the number of clients grows it's going to be progressively slower. by putting it in ruby you move the progressive slowness onto the part of the system that can be easily replicated through the addition of cheap hardware.

grain of salt: i haven't tested this. i haven't even tried doing it. as with all estimations of performance, you should treat this as made up until you've profiled it.

-faisal

i take back my earlier claim that it would be easier to do this in ruby. also, if you do it this way in the db and your points are integers postgresql (at least) will produce answers in floats, so there appears to be no drawback to doing it this way, at least at first.

-faisal

I would the RDBMS do its job and do the calculation in an SQL statement. Should be faster (if you do have indices) and reduces the load between your sql machine and the webserver.