I was hoping to get your suggestions and comments on my GSoC proposal.
Let me know what you think!
Ruby on Rails currently lacks the ability to scale its communication
with the database. Multiple versions of a rails application can be run
and queries can be optimized, but after a certain point these alone
are not enough. At some point the limitation of a single database will
become too large of a bottleneck to ignore. There are people who
realize this and avoid rails because they would much rather select a
tool that handles the problem for them. Ruby on Rails needs a
scalability solution built in because all serious web applications
grow up, grow larger, and start to bring in more traffic. If Rails
does not, then the people will go elsewhere, but if Rails does, then
they can draw in people who previously would never have considered it.
Tell us a little about yourself.
My name is Allen and I will be graduating from Rochester Institute of
Technology this May with a Bachelors of Science in Software
Engineering with a minor in Computer Science and Psychology and a
concentration in Business. Then in fall I will be starting my graduate
degree in Computer Science studying database design and computer
learning. After graduate school I want to start my own business which
will develop web applications and most likely use Ruby on Rails.
I first started programming in 10th grade with Visual Basic, HTML,
love with programming and I started to develop some of my own side
projects in class. I competed to join the computer programming team at
my school and every year that we competed against other schools we
were always in the top three. In 11th grade I learned C++ in AP
Computer Science. In 12th grade I learned Java in AP Computer Science,
which had just changed to Java that year. In that same year I also
took IT Programming where I learned both ASP and PHP at the same time
and had to develop systems that did the same thing in both languages.
Throughout my college career I have had many interactions with various
languages including the .Net languages, Python, Perl, PHP, Java, C++,
are taught some of the most important concepts of software design,
such as design patterns, verification and validation for testing,
architecture design, designing distributed systems, and designing
I have worked for three companies thus far in the technological
industry. The first company was Measurement Specialties Inc. where I
wrote a Visual Basic application to interface with a new type of gas
pump system that measured gas flow. The system would control the flow
of gas through the system, simulate variances in pressure, and took
measurements which were stored in a database and allowed the data to
be evaluated. The second company I worked for was Riverside Regional
Medical Center in the financial department. At this company I wrote
programs that would translate data from a multi-dimension database to
a relational database. I also wrote programs to evaluate the data in
the database to validate that statements were balanced. The third
company, which I currently work part time at, is Rochester Software
Associates. At this company I work primarily with a Java based web
server that enables print flow management for large companies. At this
company there is a large concern for scalability because our
application is used by schools and companies that vary in size from
about 100 users to 10,000.
What will your availability be to work on this project?
I will be treating this project like a full time job. At least 40
hours a week will be spent on this project. I will be taking a class
over the summer, which will account for 4 hours a week plus homework.
However, this will be in addition to the 40 hours spent on the
Why do you use Rails? How would you like to see it improve?
I use Rails because of its simplicity. I also like the good design
patterns that were used that not only allow, but encourage good design
during development. Lastly, I use it because it has a strong community
that backs it.
There are a couple places I think Rails could improve. One area is
would also like to see some support for action specific, controller
specific, and application specific inclusion of resources like
Rails is its interaction with databases, which is what I am proposing
for my Google summer of code project.
I would like Rails to support a scalable system for interaction with
data out of the box. There are a few ways of doing this. One of the
simplest ways is to separate tables with little or no relationship
into separate databases. Another is to create a master-slave setup
where all write actions are directed to a single master and all read
actions are directed to one of the slaves. A third option is to have
replicated databases for each instance of a Rails application, which
is already possible in Rails. A fourth and probably most difficult but
arguably the most scalable method is horizontally partitioning
databases in a shared-nothing approach (sharding). Each one of these
solutions has benefits and limitations and each is applicable to
Though I believe Rails should support these solutions for
scalability, they should not be on by default. Scalability should be
done as needed; otherwise there would be a lot of unnecessary overhead
to setting up an application. If someone were to develop in Rails and
their site never experienced scalability issues, they should never
need to know about the various scalability options.
Why is this important to the Rails community at large? Why is this
important to you?
Ruby on Rails has a strong community. However, because there is no
support for scalability when interacting with data there are many
people who are hesitant to use Rails. People who want to build large
scale heavy traffic websites are reluctant to invest development
effort into a framework that does not fully support their goals. There
are plugins that enable various scalability features, but enabling
them tends to break something else, such as tests that use fixtures.
Developers are also frightened away from developing more in depth
solutions because it would require monkey patching which could
potentially break things and also mean that there is another feature
they have to maintain for each new version of Rails. If Rails had this
type of support built in the Rails ecosystem could grow larger and
gain more support from the people who avoid Rails for these reasons.
Some of the people who avoid Rails are large companies. If Rails had
the support of large companies the community would grow. In addition,
if large companies became member of the Rails community there would be
money behind developing for Rails which could lead to great new
features and plugins.
This is important to me because I use Rails. If the Rails community
grows it will inherently make developing in Rails easier for me. In
addition to that I want to open a business developing web applications
which have a very good chance of running into scalability issues and
needing features like the ones I would develop. Also, in a purely
selfish aspect, I enjoy the feeling I get when helping others and it
is a very satisfying feeling knowing that you have affected the lives
of many people.
List a clear set of goals/milestones you'll hit during the summer. Be
I am planning three milestones for this summer. Each milestone will
be a completed solution which could be merged into Rails edge. Every
subsequent milestone after the first will use the previous milestones
as a basis. Each milestone will include requirements elicitation from
the Rails community and documentation on how to enable and set up each
feature besides what is listed below. I will also be testing various
configurations of multiple databases to ensure the features work.
The first milestone will focus on handling multiple database
connections and handling tables in multiple databases. This will
include syntax for declaring multiple databases for development,
testing, and production in database.yml. This syntax will include a
way to name the connections and specify which connection is the
default connection. This will allow connections to be specified in
models by name and any model without a specified connection will use
the default connection. Ideally, both fixtures and migrations will use
the connection specified in the model, however this may not be
possible and the connection may need to be specified in them.
The second milestone will focus on enabling a master-slave
configuration for databases. Previous work with multiple connections
will be used to enable this feature. A new syntax will be added to
database.yml to allow the declaration of a master and slaves. It will
also be possible to combine a master-slave setup with the model
binding to a connection. In this case the master-slave connections
will be named rather than just a single connection. All write actions
will be routed to the master and all read actions will be routed to a
slave. I am not sure how load balancing between slaves will work so I
will get feedback from other developers on how it should work. I
imagine the load balancing will be something that people may want to
implement themselves for their specific setup. So a default
configuration will be selected, but it will be easy to override, to
allow different implementations. Fixtures and migrations will be
updated to work in this new setup, but not much additional work will
need to be done since writing is always handled by the master.
The third milestone is the most difficult. It will focus on database
sharding (shared-nothing). There are many choices for how to implement
this and I will rely heavily on the community when deciding how to
implement this feature. Like the other two milestones there will be
some method for declaring connections to be used as shards. There will
be a way to specify models as being sharded. This will likely include
declaring models global for common static lookup tables for types that
should be replicated between shards. There will be a way to specify
how a model is sharded. This feature will also support some kind of
balancing for when new shards are added. Lastly, fixtures will be
updated to support this new feature.
Give a rough timeline for hitting these milestones.
4/10/09 – 5/22/09 – Community bonding
5/23/09 – 6/19/09 – Milestone 1 – Multi-database connections, model
binding to connections
6/20/09 – 7/10/09 – Milestone 2 – Master-slave
7/11/09 – 8/10/09 – Milestone 3 – Sharding
8/11/09 – 8/17/09 – Code cleanup, finalize documentation
How will you measure progress? How will you handle falling behind?
At the beginning of each milestone I will determine a prioritized
list of things that need to be added and things that would be nice to
have. From this list I will scope out a projected schedule of when I
need to complete each item. From this schedule I will be able to gauge
my progress. If I start to fall behind, I will start cutting the
features that are not required. If I fall drastically behind I will
still complete all the needed features for that milestone and push
back the date of the following milestone. I imagine it is very
possible that I will not complete the third milestone even if I
complete the other two on time. In that case I hope to at least have a
strong basis for myself or someone else to continue and finish after
Google summer of code is completed.
What are the "unknowns" in this project for you? What kind of pitfalls
could you run into?
I have not worked with Rails internal before. However, I know how to
program in ruby and am very familiar with the design patterns and
practices that Rails is built upon. I have not worked with load
balancing and do not know how I might implement such a feature for the
slaves in the master-slave configuration. I think that my sponsor
would be able to assist me in this, though, and I can likely find
adequate information about it on the internet and in books. Some of
the pitfalls I could fall into would be trying to do too much, falling
behind on my project, and missing important requirements. The first
two risks can be mitigated by ranking important and planning a
schedule which I discussed in the previous section. The last can be
mitigated by rigorous testing, to ensure the feature works completely
and as expected, and through regular communication with my sponsor and
the Rails community.