Matt Jones wrote:
> Maybe not directly relevant, but have you considered that you may have
> normalized your database into nonfunctionality? From the queries
> you've shown, most of the tables have one data field(?!), a date, and
> a foreign key field for the team. This would be an appropriate
> structure for describing a system where each type of statistic might
> be generated at different times, but it seems completely inappropriate
> for your application where all the stats are (obviously) generated
> simultaneously.
> Why did you decide to go this way, vs a single 'stats' table for
> everything?
> --Matt Jones
Hi Matt,
There are approx. 37 different categories of statistics for NCAA
football. Each category houses up to 14 columns. I would never attempt
to build one table that housed 518 fields.
Why not? Even MySQL's fairly restrictive limits are more than 5 times
that (see http://dev.mysql.com/doc/refman/5.0/en/column-count-limit.html).
The general argument against that is that tables with that many rows
aren't correctly separating concerns. (see this post:
http://forums.digitalpoint.com/showpost.php?s=c6f4b98064317f2ad75a532f0e1268ee&p=8255322&postcount=11
for a prime example of doing it wrong)
However, in this case, a single event (a game) generates all 518
values.
All of the stats are not generated simultaneously. They are built into
separate weeks. Each statistics table houses 120 rows of data paired to
the given week it was compiled. So, if there's say 17 weeks (counting
bowls) for college football, then there will be 17 weeks worth of data
(17 x 120 rows).
Nothing controversial there.
Yes, I could simply place all of that into one giant table (1 stats
table with 518 columns that contains 2040 rows per year. That's not
going to happen. It's inefficient and difficult to manage. If the
table gets corrupted, very likely given that type of scenario, it kills
my entire site at once. You also have to account for 3 x /tmp space for
the largest table you have in your database. Given the nature of this,
and the fact that these stats will not be deleted for years, that can
hit a server pretty hard.
Some quick math:
518 columns x 8 bytes per column (could be less, but assume there's
some overhead) -> 4144 bytes per row.
MySQL's limit is just shy of 65k, so no issue there.
2040 rows x 4144 bytes -> roughly 8.1 MB.
Given that there are live applications that manage hundreds if not
thousands of GBs of data in MySQL, I think this is a exceptionally
premature optimization, unless you plan on running the server
unaltered for the next 500 or so years. 3x /tmp space is similarly
beneath consideration, unless you're serving from an embedded
microcontroller or something.
Worrying about "table corruption" makes even less sense - if ANY data
gets corrupted, it's time to restore from backup.
Finally, I'll point out that while a combined row is way under the
limit, the mondo-JOIN query you're headed for is getting close to
another limit - the maximum 61 tables allowed in a join clause.
--Matt Jones