When I was at Sun, I worked on the design for a replicated in-memory data store. One of the principles was, scale and throughput demands are increasing, and memory is getting cheaper, so why not put it all in memory, and use disk/database only as a backing store. We provided durability not through writing to disk but through replication, and then writing back to disk (e.g. to a traditional relational database) in the background. Even if there were a node failure, we would recover from the replica, not from the database.
We couldn't get this project funded for various reasons. It's been frustrating because I knew we were on to something but couldn't make it happen - and now, as many of us predicted, the industry is moving in that direction - a growing belief in denormalization, caching, and eventual consistency. But still, many applications write to the database as part of the cycle of a transaction.
But
this article at HighScalability hit the nail on the head - you evolve your application from database-centric to cache-centric to memory-centric. Money quote:
As scaling and performance requirements for complicated operations
increase, leaving the entire system in memory starts to make a great
deal of sense. Why use cache at all? Why shouldn't your system be all
in memory from the start?
Well, that sounds awfully familiar. And it's true. If you are replicating anyway, your risk of data loss is pretty minimal, and as Pat Helland, Amazon ex-architect says,
computer suck, and you should just plan to apologize sometimes. If you batch your changes every N minutes, then you have an N minute
window where some changes may be lost. But for many applications,
that's OK. And the wins you get in terms of throughput are significant
http://www.possibility.com/epowiki/Wiki.jsp?page=ItsTheLatencyStupid<br />">Latency
was slashed because logic was separated out of the HTTP
request/response loop into a separate process and database persistence
is done offline.
Note that moving to flash-based disk, with something like
Sun's OpenStorage solution, may give you the best of both worlds - durability
and speed...
It looks like the way they implemented an all-memory solution was using
Gigaspaces, which, by the way, has a solution that deploys and scales automatically on the Amazon EC2 fabric. And the result: near linear scalability and 1 billion events a day. Yeah, that's the ticket.