rails fragment caching for diary list #2591
Comments
Author: tom[at]compton.nu Is it not possible to do it in a cleaner way than that? I'd have to get the books out to check but I thought caching was supposed to be more or less automatic in rails but that patch all looks horribly invasive and manual at a first glance. I'll also need to know more about the possible issues that might arise in a production environment - where is the cache held for example? How is it's size controlled - do we need to run some sort of cleanup job from cron? What (if anything) are the coherency issues when multiple processes and/or servers are involved in serving rails requests? |
Author: tom[at]compton.nu I think you've missed out the sweeper code from your patch? |
Author: tom[at]compton.nu Having looked at it some more, I suspect action caching is more appropriate for what your doing here. So for the diary view method a simple declaration like this:
|
Author: tom[at]compton.nu Having looked at it some more, I suspect action caching is more appropriate for what your doing here. So for the diary view method a simple declaration like this:
should do the job. Obviously a sweeper is still needed. It might be worth adding fragment caching for some unchanging bits of the layout though, or for individual entries in the diary list so the whole list doesn't have to be rerendered when a new entry is added. |
Author: amm Thanks for having a look at the patch. I chose fragment caching rather than action caching, as the page is not identical when a user is logged in or not and I thought action caching always caches the entire rendered page, so it sounded like it would be unsuitable for those dynamic parts. I didn't know about the layout option though, which sounds like it might do what is needed to allow for action caching, but I haven't yet understood what exactly his option does and haven't yet tried it out. How does it know though which parts are dynamic and can't be cached and which parts can, other than by explicitly telling it in code in the view and controller? The patch indeed needs sweepers to expire the caches whenever a new entry or comment is added, but they should already be included. If you look at the patch in the "original format" the new file "model/diary_sweeer.rb" is included, but for some strange reason, the trac diff viewer doesn't show that part. The fragments get cached in the fragment cache, for which there are 4 different backends available in rails.
The memory backend is per process and therefore not so good when having multiple rails daemons. The filesystem backend is imho the easiest and best option. It needs no extra setting up and works correctly accross multiple daemons on a single machine and can be made to work across multiple machines too. The memcached backend has the problem of not being able to expire multiple fragments via regexp expiration, which is necessary for the paginated caching. The distributed rails backend might also be a solution, especially if we should scale web serving across multiple machines, but I haven't yet tried that out to see how it works. Fragment caching is probably possible and sensible for some other parts too, but the diary list page (not the diary view page, which I only included as it was in the same controller) seemed by far the most worth while target for the moment to start experimenting with caching. I'll need to have a closer look at the action caching with layout => false |
Author: amm I am very much still learning about the basics of rails, and hadn't really come across the concept of layouts at all yet. It sounds like action caching without layouts would take care of a good portion of the dynamic parts of users being logged in. But the dynamic part of only having the option to add a comment in the "diary_entry/view.html.erb" and the option to add a new diary entry in the "diary_entry/list.html.erb" still depend on <% if user %> in the actual view. But that boolean variable might be handled by a more complex cache_path including the user variable too. So it might still be possible to use action caching and not need any extra code at all other than sweepers and the caches_action statement. |
Author: amm Looking at the benchmark numbers a bit more, and particularly the difference between the response time of way over a second from osm.org vs. 300ms from my local testbed, makes me wonder how much of that second is rendering time and how much is database query time. My suspicion is that a large portion might be DB query time, as I don't have anywhere close as many comments in my testbed as in the main DB, whereas the complexity of the rendered page with only 20 diary entires is probably comparable. Caching would obviously reduce both rendering time and db query time and thus would be preferable, but if the db query time can be reduced, that would still help every time the page isn't yet cached. I presume, the main DB query performed in the diary list controller is the "SELECT ... WHERE ("diary_entries"."visible" = 't' AND "users"."visible" = 't') ORDER BY created_at DESC LIMIT 20 OFFSET 0". Currently there isn't an index on created_at, which presumably means the query needs to do a full seq_scan on diary_entry plus a sort over all visible diary entries. Adding an index on created_at would allow to do a filtered idx_scan without needing an extra sort. Both of these seem to be confirmed by explain analyze on my limited number of entries in my test db, but as the (db) setup is so different to osm.org I don't know how much of those "benchmarks" can be transfered. |
Author: tom[at]compton.nu Replying to [comment:5 amm]:
The diff viewer won't show it because it's a new file. I had made the same mistake in that I applied the patch and then used svn diff forgetting that the new file wouldn't have been added to svn so wouldn't show up. |
Author: tom[at]compton.nu Replying to [comment:6 amm]:
Yes I realised that view sometimes vary if you are logged in without actually being dependent on who you are logged in as, so I added a logged in flag to the cache entries to cope with that. I'll attach my patch in a minute. |
Author: tom[at]compton.nu Replying to [comment:7 amm]:
I think we had probably never bothered with that index because the number of diary entries was so small it didn't really matter but you're right that it would be helpful now so I've added it (well three versions of it in fact) and typical DB times for the first page of diary entries have dropped from 2-300ms to about 5ms or so. |
Author: amm Your patch definitely looks more elegant than mine, so thanks for correcting it. However, I have noticed two issues with your version.
Otherwise it seems to work fine and correctly caches and expires the user diary pages |
Author: tom[at]compton.nu Replying to [comment:11 amm]:
This is arguably a bug in rails really, in that the translation of ':' to '.' is hidden inside the file based cache backend which means that non-regexp expiry respects it but regexp expiry doesn't. I've worked round it in my tree now anyway.
I think the best thing will be to replace the logged_in flag with a tristate parameter - logged_out/logged_in_as_somebody_else/logged_in_as_this_user. |
Author: amm
So I think with those two fixes the patch should work. If you want I can test the new patch again, or I would also be fine with it if you are happy with it. |
Author: amm With the issues now solved we discusses yesterday on IRC, I don't see any more problems and from my side, I think the patch is ok. I have also tested the patch in a single server multiple daemons environment on both a mongrel and thin "cluster" in production mode and didn't see any further problems so hopefully there won't be any problems when moving over to lighttpd and fastcgi either. |
Author: amm I'll close this ticket, as it seems to be working fine now on the main site. Thanks |
Reporter: amm
[Submitted to the original trac issue database at 3.02pm, Thursday, 31st December 2009]
Some webpages on OSM don't load as fast as would be ideal. One of these is the diary page. With typical load times of between 1 - 3 seconds, the time is by no means terrible, but still noticeable as a lag and so would be nice to reduce. Cacheing is one option and should help here as the the diary page is well suited for rails fragment caching. The diary list page changes relatively seldomly, and there aren't too many dynamical alternatives.
I have therefore attached a patch implementing a simple fragment caching for diary_entry/list.html.erb as a proof of concept to see if fragment caching can reduce load times and how much overhead it adds. So I would mainly be interested in comments to hear your opinion on this matter and if you think it is the right approach to improve the felt responsiveness of the website. It currently only does caching for the diary page, as apart from the GPX trace page, the diary page is the only one that has load times of over a second and it is the easiest page to try out caching.
To try and see how much potential this patch has, I have tried to do some benchmarks. All numbers were generated with apache-bench with 10 repetitions and a concurrency of 1.
First off, looking at the current numbers of how long it takes to load the diary page:
ab -n 10 -c1 http://www.openstreetmap.org/diary
Connection Times (ms)
min mean[+/-sd] median max
Connect: 38 57 20.1 59 100
Processing: 1219 1733 364.3 1790 2522
Waiting: 1078 1532 355.8 1605 2300
Total: 1264 1789 363.6 1870 2560
Those numbers are obviously somewhat variable depending on how busy the server is, but the minimum time is mostly stable. To see a bit better how how much of that time is due to general network, queueing and rails delay the equivalent call to a very simple page, the login page
ab -n 10 -c1 http://www.openstreetmap.org/login
Connection Times (ms)
min mean[+/-sd] median max
Connect: 34 42 6.8 41 55
Processing: 51 66 9.5 68 79
Waiting: 51 65 9.1 67 79
Total: 86 108 13.3 108 131
Now looking at the effect of caching the diary page on my local testbed (passenger).
The first run is with caching turned off
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 293 298 7.1 297 317
Waiting: 293 298 7.1 297 317
Total: 293 299 7.1 297 317
And now with caching turned on, it is
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 22 55 70.4 23 235
Waiting: 22 55 70.4 23 235
Total: 22 56 70.4 23 235
So caching seems to decrease the processing time by close to ten fold for the diary list page. Given that the original rendering times on the OSM server are about 3 times larger (presumably due to the more complex real diary entries than my few test entries) I would guess the effect is even larger on the OSM servers.
But as said at the top, this is mostly a RFC at the moment and as a basis to think about how best to reduce load times of the diary page.
The text was updated successfully, but these errors were encountered: