Also I know that projects like FB or Twitter use both distributed DB and
relational DB(MySQL), and I wonder Why do they use both models ? What
benifits do you get by this approach ?
The relational model allows you to store more complex data, arranging
that into multiple tables so that it can be easily joined using SQL.
If you have a need for more complex data then you need relational.
Databases that support relational also support other kinds of data
structures, for example, PostgreSQL allows you to store XML, JSON and
also column oriented data using hstore.
Companies use multiple technologies because they have multiple
different needs. In startups, having a single database is common. As
companies evolve they need more applications and databases, often of
I read about this problems in architecture of highload projects. And I know
that relational model can't scale for millions simultaneously requests. All
projects use distributed DB for this problem, like Cassandra or Riak.
And I know, that for in-memmory db, you should build a cluster. Memcached,
for example, has this functionality. But I am trying to find similar for
It isn't true that the "relational model can't scale", though I have
seen that comment before.
PostgreSQL supports multiple standby nodes, called hot standby, that
allows you to scale out the number of copies of the database, which
then allows you to scale. Achieving >100,000 requests per second per
node is possible, so doing millions can work also.
The key point is usually database writes. Scaling writes requires
multiple write nodes. The technique that everybody uses is sharding,
that is placing a subset of the data on separate nodes. As soon as you
use sharding you need to limit yourself to simple key read/write
operations. Complex operations that require access to multiple nodes
don't scale well, whether or not they use relational model or
relational databases. It's the type of request that is a problem, not
the underlying technology.