Help with AR design on a large (~100'sM) partition table

RobR · March 24, 2009, 2:32pm

First, I'm a physicist, not a computer scientist. I need to store 65M rows now, with a growth rate of ~21M/year whose entries arrive daily.

The purpose of this DB is to store metadata for environmental observations for the ionosphere. Basically it's a glorified file catalogue. I need the usual: fast enough inserts, reasonable recovery, REST like access, etc. I have access to Ruby AR/AS 2.1.1, Ruby 1.8.5 and eventually MySQL 5.1.x.

I'm trying to trade some storage space and lack of normalization for simplicity of design and ease of maintenance by me. Basically, I want to partition one very large table. My PK is essentially across 3 columns: filename, md5 checksum, destination. I'd like to partition by year which is discernible by the filename and by obs_time if I can but obs_time is not currently part of the PK. If that happens, then I would have up to 40M rows in a single table though most tables would have a few million; perhaps that is too many; perhaps a table for each half year would work.

Currently, I have no foreign keys but I may want them one day.

Any help and suggestions would be really appreciated!

My schema follows. I have created a simple model off this table and I'm using these decorators: validation, and before_save and after_save to build an inventory summary (in another table not shown). I've tried this on a non-partitioned table so far awaiting the installation of MySQL 5.1.x by our IT department.

How do I go about integrating AR with a partitioned table? Can you advise a simple pattern I might follow for such a large but simple relationship?

create table `ionofiles` (

-- TODO optimize variable sizes `id` int(15) NOT NULL auto_increment PRIMARY KEY, `file` varchar(100) NOT NULL, -- Any valid filename without the ".~N~" backup extensions. `md5` varchar(100) NOT NULL, -- MD5 Sum `destination` varchar(20) NOT NULL, -- E.g. MIDS, ADIC, AFWA, SWPC, ...

`version` varchar(20) default NULL, -- Optional version.

`ursi` char(5) default NULL, -- E.g. BC840, WI937. default NULL because we allow any filename.

`time_scope` varchar(20) default NULL, -- E.g. (individual, daily, monthly, yearly, etc). default NULL

-- because we allow non time-scoped files.

`content_type` varchar(20) default NULL, -- E.g. (root, scaled, image, ionogram)

`obs_time1` datetime default NULL, -- Start of measurement. Default NULL because we allow

-- non data files. `obs_time2` datetime default NULL, -- End of measurement, in the case file represents

-- multiple measurements. Default NULL because we allow

-- non data files.

`extension` varchar(5) default NULL, -- file extension (any case). E.g. SAO, MMM, 16C, TXT, png.

`size` int(11) NOT NULL, -- file size in bytes

`backup` int(2) default NULL, -- N of the "~N~" tilda backup file.

`is_ready` boolean NOT NULL default 0, -- Is this file's links ready? 0 = No, 1 = Yes. `is_public` boolean NOT NULL default 1, -- Is this file publicly available? 0 = Public, 1 = Private. `is_removed` boolean NOT NULL default 0, -- Has this file been marked deleted?, -- s.t. we don't ever want to see it again? -- This is useful in the event that we want to reject future submissions of file.

`file_on_disk` varchar(100) NOT NULL, -- Actual file name on disk: E.g. BC840_2009001000000.SAO.~10~ `path` varchar(200) NOT NULL, -- A directory or a full path to a ZIP or TAR or other aggregate file. -- E.g. data/BC840/individual/2009/001/scaled/ -- E.g. data/BC840/individual/2009/BC840_2009001_2009-01-01.tar.gz

`last_verified` datetime default NULL, -- last time we checked that this file actually exists at stated destination

-- TODO: ActiveRecord fields. What should be the data type? Timestamp or Datetime? -- The auto set and update features are currently handled by AR. -- Should we put some ON UPDATE, etc actions here for non AR interaction? `created_on` datetime NOT NULL, -- ActiveRecord auto set. `updated_on` datetime NOT NULL, -- ActiveRecord auto updated.

`note` TEXT default NULL, -- User note

UNIQUE KEY `key_file_md5_destination_public` ( `file`, `md5`, `destination`, `is_public` ),

KEY `key_file` ( `file` ), KEY `key_md5` ( `md5` ), KEY `key_destination` ( `destination` ), KEY `key_version` ( `version` ), KEY `key_ursi` ( `ursi` ), KEY `key_time_scope` ( `time_scope` ), KEY `key_content_type` ( `content_type` ), KEY `key_obs_time1` ( `obs_time1` ), KEY `key_obs_time2` ( `obs_time2` ), KEY `key_extension` ( `extension` ), KEY `key_size` ( `size` ), KEY `key_backup` ( `backup` ), KEY `key_is_ready` ( `is_ready` ), KEY `key_is_public` ( `is_public` ), KEY `key_is_removed` ( `is_removed` ), KEY `key_file_on_disk` ( `file_on_disk` ), KEY `key_last_verified` ( `last_verified` ), KEY `key_created_on` ( `created_on` ), KEY `key_updated_on` ( `updated_on` )

) ENGINE=InnoDB;

R

RobR · March 24, 2009, 7:49pm

Thank you. That's a handy suggestion.

bitsweat · March 24, 2009, 9:43pm

MySQL can deal with this out-of-the-box without breaking a sweat. I wouldn't worry about partitioning until you actually hit a wall. I think Moore's law is on your team for this project.

(Note: you probably don't want indexes on every single column individually. Add compound indexes tuned to your actual queries.)

Best, jeremy

RobR · March 27, 2009, 4:57pm

Jeremy,

I did remove many indices as they aren't needed for the most likely use cases. Thank you for the sage suggestion. I saw an incredible improvement in insert speed (1.5x) and reduction in disk space usage (0.6x).

I have to say I think I'm already hitting the wall with this simplistic approach for these reasons: 1) I see performance degradation on indexed searches at 90 million records in a much more slimmed down version of the table, i.e. only 3 columns, with 1 PK and 3 individual indexes. 2) It takes ~24 hours to rebuild a MyIsam version of the 90 million record table + indices via a mysqldump load; especially it's PK. The entire table is inaccessible during this PK rebuild since it ensures consistency. This happens ~1/year. 3) With a 5 year forecast in mind, I expect > 200million relationships or rows needed and hence a significant design goal. With the originally posted schema and most indices removed, I'm still looking at ~ 250 GB of physical disk space needed. Our DBs run using an attached storage environment with many small heavily redundant disks. I don't know much more about that except that 70 GB is the elemental partition size for DB's. So my design will need to be partition flexible in the near term and hence one reason why I looked toward Partition Tables.

Onward, I did realize finally that 95% of my queries inherently know a pretty good key a priori, i.e. the timestamp of the measurement is embedded in the file name. Most users will be able to specify the time range of interest, etc.

So, I've taken a stab at partitioning on month. MySQL has a limitation of 1024 partitions per table and I needed to represent data from 1940. I've designed this set of partitions on month boundaries from 1940 to 2020. I assume adding more partitions later is easy but since I didn't know how I went ahead and partitioned up to the year 2020. A better solution would probably use a histogram of my data and add more partitions during time periods of higher data density. The problem is that I do not yet know this histogram... though it would be easy to ascertain.

My present schema follows. Any comments are highly welcome! I'm currently bulk testing and see these numbers: 1) ~120 inserts + 120 selects per second which is pretty good. I have 2 primary problems to resolve. 1) MySQL doesn't have "global" indices so queries on Key that is not a PK, require a scan across the indices of all ~800 tables which takes ~1 second. 2) I need to wittle the forecasted ~280 GB down to ~200 GB.

create table `ionofiles` (

-- TODO optimize variable sizes `id` int(15) NOT NULL auto_increment, -- Any valid filename without the ".~N~" backup extensions. `file` varchar(100) NOT NULL, -- MD5 Sum `md5` varchar(100) NOT NULL, -- E.g. MIDS, ADIC, AFWA, SWPC, ... `destination` varchar(20) NOT NULL,

-- Optional version. `version` varchar(20) default NULL,

-- E.g. BC840, WI937. default NULL because we allow any filename. `ursi` char(5) default NULL,

-- E.g. (individual, daily, monthly, yearly, etc). default NULL because we allow non time-scoped files. `time_scope` varchar(20) default NULL,

-- E.g. (root, scaled, image, ionogram) `content_type` varchar(20) default NULL,

-- Start of measurement. Not NULL because we partition on this. `obs_time` datetime NOT NULL,

-- file extension (any case). E.g. SAO, MMM, 16C, TXT, png. `extension` varchar(5) default NULL,

-- file size in bytes `size` int(11) NOT NULL,

-- N of the "~N~" tilda backup file. `backup` int(2) default 0,

-- Is this file's links ready? 0 = No, 1 = Yes. `is_ready` boolean NOT NULL default 0,

-- Is this file publicly available? 0 = Public, 1 = Private. `is_public` boolean NOT NULL default 1,

-- Has this file been marked deleted?, `is_removed` boolean NOT NULL default 0,

-- Actual file name on disk: E.g. BC840_2009001000000.SAO.~10~ `file_on_disk` varchar(100) NOT NULL, -- A directory or a full path to a ZIP or TAR or other aggregate file. `path` varchar(200) default NULL, -- E.g. data/BC840/individual/2009/001/scaled/ -- E.g. data/BC840/individual/2009/BC840_2009001_2009-01-01.tar.gz

-- last time we checked that this file actually exists at stated destination `last_verified` datetime default NULL,

-- TODO: ActiveRecord fields. What should be the data type? Timestamp or Datetime? -- The auto set and update features are currently handled by AR. Should we put some ON UPDATE, etc actions here for non AR interaction? `created_on` datetime NOT NULL, -- ActiveRecord auto set. `updated_on` datetime NOT NULL, -- ActiveRecord auto updated.

`note` TEXT default NULL, -- User note

-- TODO: reorder the key constituents to reflect best sub selection capability -- Rows are unique by this PRIMARY KEY `key_destination_is_public_obs_time_file_md5` ( `destination`, `is_public`, `obs_time`, `file`, `md5` ),

-- TODO: Is there any value in reordering these keys? Does order matter? KEY `key_obs_time` ( `obs_time` ), KEY `key_id` ( `id` ), KEY `key_ursi` ( `ursi` ), KEY `key_extension` ( `extension` ), KEY `key_size` ( `size` ), KEY `key_updated_on` ( `updated_on` )

) ENGINE=InnoDB PARTITION BY RANGE ( TO_DAYS( obs_time ) ) ( PARTITION p195001 VALUES LESS THAN (TO_DAYS ( "1950-01-01" )),PARTITION p195002 VALUES LESS THAN (TO_DAYS ( "1950-02-01" )),PARTITION p195003 VALUES LESS THAN (TO_DAYS ( "1950-03-01" )),PARTITION p195004 VALUES LESS THAN (TO_DAYS ( "1950-04-01" )),PARTITION p195005 VALUES LESS THAN (TO_DAYS ( "1950-05-01" )),PARTITION p195006 VALUES LESS THAN (TO_DAYS ( "1950-06-01" )),PARTITION p195007 VALUES LESS THAN (TO_DAYS ( "1950-07-01" )),PARTITION p195008 VALUES LESS THAN (TO_DAYS ( "1950-08-01" )),PARTITION p195009 VALUES LESS THAN (TO_DAYS ( "1950-09-01" )),PARTITION p195010 VALUES LESS THAN (TO_DAYS ( "1950-10-01" )),PARTITION p195011 VALUES LESS THAN (TO_DAYS ( "1950-11-01" )),PARTITION p195012 VALUES LESS THAN (TO_DAYS ( "1950-12-01" )), ... -- up to year 2020 PARTITION p999999 VALUES LESS THAN MAXVALUE );

Topic		Replies	Views
Compatible Packages rubyonrails-talk	2	113	March 24, 2009
Strategy for dealing with large db tables rubyonrails-talk	1	279	November 27, 2008
Best way to handle multiple tables to replace one big table? rubyonrails-talk	15	1315	April 30, 2010
Speeding up ActiveRecord Querying on very large joins? rubyonrails-talk	8	348	August 10, 2009
How I've Monkeypatched AR rubyonrails-core	23	281	March 17, 2007

Help with AR design on a large (~100'sM) partition table

Related topics

More Resources