We recently built a tool to extract constraints from rails applications, and then do SQL query rewrite/optimization. We extract constraints by analyzing application source code. For example, validate_uniqueness_of :unique_column implies a unique constraint on the unique_column. Then we can use the uniqueness of the column to rewrite some queries. I list two new types of query rewrites using constraints here
- Remove DISTINCT
A simple example is SELECT DISTINCT(name) FROM R. If there is a unique constraint on the name column. The DISTINCT keyword can be removed safely. Query plans without the DISTINCT keyword might be much cheaper since DISTINCT is expensive.
- Add LIMIT 1
An example of this optimization will be SELECT name from R WHERE name = ‘foo’. If there is a unique constraint on the name column, the selection result has at most one record. Therefore, we can add LIMIT 1 safely. If the original query plan performs a sequential scan on the R, adding LIMIT 1 might speed up the query because of the early return.
We designed an algorithm to decide if 1), 2) can be performed safely. Rewriting queries manually and experimenting on a table with 10K records shows 2X ~ 3X improvement for both rewrites. We have some other rewrite rules, but the two are the most obvious ones.
To fully implement this feature, we need to merge our tool with rails, and change rails SQL query issue logic. After binding all variables, we will perform the rewrite, and send the rewritten query to the database. Basically, we are adding a new feature to rails, which can rewrite SQL automatically based on application constraints.
Will this feature be useful? Any feedback or comments are highly appreciated.