For an enterprise CMS, server clustering clearly is one of the major issues. Main criticism on the actual implementation in eZ Publish is directed towards the concept of storing binary data and even cache files in the database. This obviously raises additional scaling issues on the database layer and leads to ridiculous table sizes as mentioned by someone here and also recently discussed here and here.
From our point of view, there is another reason why misusing the database for binary storage is a no-go. From time to time, there will be the need to completely purge caches. Rebuilding all of them can take several minutes on big sites. Given two webservers with four processor cores each, the first eight concurrent users will be sufficient to completely exhaust system ressources with the cache rendering task. In other words, the situation between cache deletion and cache regeneration is no different from a DOS attack. It might be a legitimate question why of all things a cluster functionality is built in a way that makes cache deletion impossible when there is traffic.
Alternatives
In 2007, YMC started developing a "Distributed Cache" (DC). This was implemented as part of our integrated Social Media Suite "YMC Volano". Since August 2007, VolanoDC is in productive use in the YMC cluster. It also runs on several client installations that have the need of uninterrupted availability. Buffering through a reverse proxy or a CDN like Akamai or Limelight was considered to bypass downtimes, but then refused as it would be a remote solution for a problem that could be addressed directly.
In short, VolanoDC writes caches into the local file system of each webserver. For this, no especially fast hard disks are needed like in advanced MySQL servers; in most cases those of the webservers will remain mostly unused anyway. So, each webserver generates and uses its own caches, and caching is therefore redundant between them.
For keeping these caches in sync, VolanoDC introduces a new file handler that logs cache operations into a separate database table. In a minute cycle, each webserver checks this table and then clears the logged caches accordingly.It must be emphasized that this concept creates no single point of failure, as it is the case with central cache storage, be it in a database or elsewhere. Even race conditions (two servers try to write to the same cache file) are avoided by design.
Upgrading the system - anytime
The biggest advantage of the solution is the feasibility of complete cache clearing regardless of the actual server load. To achieve this, the first webserver advices the upstreamed load balancer to quit sending new requests. After completing all remaining requests, the web server starts complete cache regeneration, which can take some time. The other servers continue to serve from their existing caches. Upon completion, the first server dispatches two announcements, one to the load balancer to signal its renewed availability, one to the next server to trigger cache regeneration there.
For development purposes, it was helpful to setup a 1:1 simulation on virtual machines. From our experience, up to 20 virtual cluster nodes can run on one blade. We successfully tested VolanoDC with up to 100 web servers on such a simulation.
Distributed Caching as a strategy for server administration
Since the launch some helpful extensions were implemented. For example, it is now possible to safeley take out specific web servers of production for maintenance tasks. Monitoring of central system parameters (RAID status, bonding / trunking, MySQL replication, HDD capacity etc.) was added to the administration interface, which is now part of the extended eZ Publish backend (see screenshot). Constantly checking the cluster's availability from cheaply rented remote virtual servers is also part of the concept. Last but not least an Asterisk integration provides automatic alerts via telephone and SMS if predefined values of the mentioned parameters are exceeded.
Conclusion
If we had to decide again, would we throw all the solid advantages of eZ Publish away and switch to Drupal, just because it needs some unexpectected work to get the cluster running? Would you kill the horse just because someone fell from it? Of course not.
Verwandte Themen
Kommentare (7)
2. The database issue you describe has not been a practical problem so far, but naturally, it will be solved with MemCache as well. Additionally, we expect significant benefits by introducing a central upstream cache based on MemCache.
3. Loadbalancer: Persistent connections are implemented. This is needed anyway, as soon as the least bit of interactivity (comment function) is integrated. Otherwise the user would see his freshly created comment only with a certain delay, on the next reload, when the new server synced its cache, which definitely would be considered a malfunction.
4. Needed time frame: We considered parallelization of reasonable groups of servers (let's say 5 at a time in your example, resulting in 4x2 minutes), but postponed this in favor of MemCache.
We have also implemented a very granular selection of which caches are to be removed (much more detailed than standard eZ functionality). There are only few cases that make a complete removal of all caches necessary. For example, if translation wasn't changed, there is no need for flushing the translation cache. This reduces per-server cache regeneration time to acceptable values, far below 2 minutes in most cases.
Supposing you have 20 servers:
- the queries sent to the db are 20 times more than needed, even though they are not executed at the same time
- you need a load balancer with session stickiness, as a user that accesses site when server 2 is off, 1 updated and 3 not yet might receive content from server 1 and 3 from different requests and be victim to ghost effects
- the overall time for the update transaction is 20 x complete cache generation time (say 2 minutes). A long window for something to go wring in between...
Would it not have been more efficient to generate the cache files on one server and distribute them via rsync/ftp to the others?
It would be great to see increased efforts in addressing some of these common issues that are present throughout most deployments.
A couple of notes about the eZP cluster:
in the 4.1 release the full-cache-cleanup-of-death has been addressed in a very elegant way (might be incomplete or buggy, time will tell): the stale cache file continues being served to all clients until the new version is ready. This works very good on single-box setups too, as even in that case you would have previously had many concurrent server processes trying to rebuild the same cache file at the same time (or at least waiting for it to be ready). Integration with signaling the lb to start/stop accepting requests is not needed, but it could be added.
Second, the eZP cluster already stores cache files on the local disks of every participating node, and it does since version 4.0. In fact, you might define it a 2-level cache system.
While I would dissent on the capability of databases to manage huge tables full of blobs - as some rdbms are much better than others in this aspect, the net result is that while the db table holding data is in effect quite big, it gets hit relatively little.
For the next iterations of the cluster system, we are discussing usage of the database for storing only cache-file metadata, and a flexible, 2-tiered cache-file storage system (nfs, db or other)
we handle caches and binary files seperatly. Binary files (eg uploaded images) are held on a shared storage (redundant, powered by NFS), so there's no need in syncing these.
Pascal
Regards