The Ruby on Rails application framework provides several options for storing session data. Each option has its advantages and disadvantages which may not be readily apparent. The tests outlined in this document attempt to lay some groundwork which should help when determining which storage container is most suitable for your application.
I started running these tests when I was performing some benchmarks on Elite Journal, my Ruby on Rails application. I had noticed about a 50% drop in requests per second from apache after about a day of testing. I made note of it on IRC and David suggested I clear out my /tmp. Sure enough, there were about 200,000 session files in /tmp that were causing a massive decrease in Rails (or, really, anything wanting to create files in /tmp) throughput. I then decided to investigate this a bit more.
The Rails framework provides several storage containers for session data. Some of these are parts of the standard CGI library, included with Ruby, and some are included with Rails itself.
All of these tests were run on a single machine with the following specifications:
All tests were run using the `ab` utility, making 200 connections using 5 concurrent clients (ab -c 5 -n 200). The tests were ran up to the equivalent of 50,000 sessions and the requests/second statistic of from ab was used. The 50,000 sessions is the equivalent of either active users or stale sessions (more about this later).
Results are plotted with gnuplot. Keep in mind that actual requests/second numbers can vary with hardware set up, so it is more important to take note of the trends of the lines and not the Y axis values.
For a baseline test, I did a run with Rails set to not keep session data. By setting the session_options parameter of Dispatcher#dispatch to nil, Rails will not keep any session information. The plot in Fig 1 shows a simple, flat line averaging about 130 req/s through 50,000 connections. The number of connections here does not really matter, since no state is being kept, but the run shows that hitting a Rails application 50,000 times in a row does not cause any hit to performance (due to any leaks in Ruby, fcgi, or whatever). It also provides a ceiling for the expected performance in the following tests.
With memory store, the session data is kept in an in memory hash in the process. Memory store is part of the CGI library. It is expected that memory store should be fairly fast, and Fig 2 shows that it is. However, there is a very interesting, and drastic drop in performance that occurs around 30,000 sessions. The spike back up to "normal" performance is indicative of some kind of garbage collection happening. The thing to keep in mind here is that these 50,000 requests occurred in a period of less than 30 minutes. Unless your application gets 50,000 hits every 30 minutes, you probably don't need to worry about this. The tests were run again, with a 5 second pause every 200 requests and then a 10 second pause every 200 requests, causing the tests to take about 30 minutes and 1 hour, respectively. The results in this plot show that the problem is mitigated with a slower hit rate.
DRb Store is a container included with ActionPack which, in its most basic form, provides a hash through distributed Ruby (DRb) in which to keep the session data. This differs from memory store in that it adds in the overhead of the DRb communication, but also gives the user the power to write a more complex container that might have some built in housekeeping or even be running on a different host. ActionPack comes with an example server which merely provides access to a Hash via DRb, as such:
require 'drb'
DRb.start_service('druby://127.0.0.1:9192', Hash.new)
DRb.thread.join
Fig 3 shows that the overhead of DRb is rather negligible in comparison to memory store, the difference being that the overall line remains quite flat. This could be due to differences in where garbage collection is taking place (e.g. in the Rails process for memory store and in the DRb server for DRb store). Overall, DRb performs very nicely and gives the user some choice over the container's precise implementation.
The PStore container is part of the CGI library and uses the PStore library to store session data in files on the file system. The data is stored in a marshaled format, so any object that can be marshaled can be put into the PStore container. This differs from FileStore (not discussed here) which does not use marshaling and can only store string data. For every session, PStore will create two files. In its smallest form (as used in these tests), one file will be 12 bytes, containing a marshaled, empty hash and the other file will be a 0 byte backup file ending with a tilde (~).
With PStore we can expect slight overhead for marshaling and file system access. Fig 4 shows, though, that the number of existing files can drastically affect the performance. PStore starts up with performance on par with that of DRb and memory store, but very quickly degrades as the number of sessions increases. The primary factor affecting this test is likely to be the file system. This test was done on NetBSDs FFS which seems to slow down greatly when creating files in a directory containing hundreds of thousands of files. A plot of a longer run can be seen here, which goes up to 126,141 sessions (252,282) files. At this point, the throughput has decreased by almost an order of magnitude. The plot stops here because the file system on the test machine ran out of free inodes. To further gauge file system effects on PStore, more tests should be run on other file systems. ReiserFS is a good candidate.
PStore is the default session container, and can yield very good performance. However, housekeeping is an absolute necessity and choice of file system plays an important role in a decision to use PStore. Even if a site is not particularly busy, the session files are not deleted by Rails and will build up over time, which can slowly degrade the throughput Rails is able to provide, if your file system cannot handle directories with many files very well.
ActiveRecord Store comes with Rails and allows an application's session data to be kept in the database (or a database) by using ActiveRecord. This can make housekeeping very convenient as well as adding a bit of security that may not be available to the other storage containers. This does come at a cost, though, as the overhead of ActiveRecord and the RDBMS does result in a drop in performance. Whether this drop is significant is dependent upon the expected traffic to the site, but it is significant enough to warrant serious consideration in many cases.
As indicated in Fig 5, the RDBMS used will impact performance. More importantly, though, is whether or not the session table's fields are indexed (see below). The primary systems that ActiveRecord supports are MySQL and PostgreSQL. SQLite is also supported by ActiveRecord, though I was unable to get my sessions to dump into the SQLite database in a reasonable amount of time, so it is not discussed here.
The MySQL RDBMS is well known for being a speedy little system and is widely used in web applications. PostgreSQL is another contender but is often thought to be a bit slower with write operations. In Fig 5 MySQL (red line) does outperform PostgreSQL (green line) in the tests. However, this should not be used in consideration of which RDBMS system is best suited to your application. It is unlikely that session storage will be the bottleneck of an application once all other factors are included.
In the figure for ActiveRecord Store, both lines remain very flat throughout the test run. After all, 50,000 rows in a table really is nothing any RDBMS can't handle, so this should be the expected outcome. With that said, performance can degrade if the table is not set up properly. The tables used in Fig 5 have an index on the sessid field. If you do not index the sessid field of the database, your performance will look more like this. It is obvious that one would certainly want to index the sessid field in any case.
Tests were run with MySQL on both the MyISAM and InnoDB table types. The results, plotted here, show that neither type offers any advantage over the other. Thus, the table type chosen when using MySQL should not be chosen based on this factor.
This plot shows the performance of all containers in one graph.
There are many choices to make when developing web applications, and many factors weighing on those choices. Performance of the session container becomes more important not only as the traffic to an application increases, but also as its running time increases. Rails and the Ruby CGI library do not clean up stale session data on their own (whether or not they should is a discussion for another day), so it is up to the application writer to perform housekeeping tasks to keep the application healthy.
Performing housekeeping tasks on a file based container can be as easy as removing files that have reached a certain age. With the ActiveRecord containers the programmer can add a timestamp field to the sessions table. An easy way to do this is to open up the CGI::Session::ActiveRecordStore::Session class and add a before_save method to set your timestamp field to Time.now. As an example, if your timestamp field is called updated_on then the following could be added to dispatch.fcgi (or the dispatcher of your choice):
class CGI::Session::ActiveRecordStore::Session
def before_save
self.updated_on = Time.now
end
end
Then, just like with the file based storage, rows can be removed once they reach a certain age. If a model class is made (eg with the new_model script), the application's environment can then be used to easily script the housekeeping tasks.
The memory based containers will have garbage collection occurring at times, though the programmer may have little control over when that happens and what sessions are cleaned up by the GC. Using DRb ought to give the programmer more control over that, with the writing of a custom session storage process.
The intent of this document is not to recommend one storage container over the other each one has its advantages or disadvantages given a particular situation. The intent is to provide the reader with some benchmarks so that he or she can make an informed decision when choosing a session storage container in the Ruby on Rails framework.
Any questions or comments about this document can be sent to rails @ elitists.net. All suggestions or corrections are welcome.
Thanks in particular go to David Heinemeier Hansson for not only creating Ruby on Rails, but also making the suggestion about clearing out my /tmp and sparking off this journey. Thanks also to Marcel Molina Jr. for reviewing this document and pointing out my many errors and misphrasings. Also, the entire #rubyonrails cast and crew (bitsweat, xal, js-, et. al) for their interest and encouragement.