Database Management for Smalltalk

Garbage Tips


‘Garbage’, in an object-oriented database system, means objects which have become unreachable because references from other objects have been removed by the application, and which are therefore to be removed to reclaim storage space. The garbage collector (’GC’) in VOSS is a Baker design, similar in concept to the garbage collector in Smalltalk-80, which can be run in foreground and/or background modes of which more later.

Unlike the mark-sweep design, which truly is a ‘garbage collector’, the Baker design would be better characterised as a ‘good object preserver’ which copies all reachable objects within a virtual space back and forth from one ’semi-space’ to another, deleting everything left behind on each flip. This has consequences for database design, administration, and the choice of GC settings, to tune for maximum transaction throughput and minimum downtime.

Know which space your objects are in.

In a database design using multiple virtual spaces, the most important thing to know is that when the VOSS GC flips it preserves only those objects in a virtual space which are (indirectly) reachable from the rootDictionary of that virtual space or from the image in which the GC is then running (in the case of uncommitted new objects). In other words, any object in virtual space ‘A’ which is referenced only by an object in virtual space ‘B’, though it will behave normally whilst present, will not be preserved when the GC flips virtual space ‘A’. Or in other words again, reference(s) from object(s) in another virtual space alone are not sufficient to preserve an object from the GC of the virtual space in which it exists. In its systematic copying, the GC does not follow references to objects outside the virtual space which it is scanning.

This situation must be avoided, as after such a flip, those objects would become instances of VOUndefinedObject, and later magically become some arbitrary new object when the id number of that apparently garbage object in space ‘A’ was re-allocated to a new object, having spent some days, weeks or months on space ‘A’s free id list.

The simplest way to keep this right is always to use one of the explicit variants of the message to virtualize an object, preferably on creation, which specify the location explicitly, for example:

  myObject := MyClass newVirtualIn: aVOManager.

The non-specific variants, for example:

  myObject := MyClass newVirtual.

virtualize the new instance in the current virtual space of the current process, i.e. the virtual space which hosts the last object to have received a message in the current process.
 

Why use multiple virtual spaces?

One reason for distributing a database across multiple virtual spaces may be that some parts of the database are static, whilst others are volatile - subject to frequent update and new object creation - and in this case unnecessary GC processing can be avoided by suitable partitioning of the database, so that the more static virtual space is GC’d less frequently, if at all.
 

Foreground or Background Garbage Collection?

Foreground GC scans and saves a specified number of reachable objects as an addendum to  each transaction commit, before that transaction’s changes are physically written to disk, to minimise disk activity. The effect, to the user, is that each transaction commit takes longer than it otherwise would, depending on the number of objects it is set to scan.

Background GC runs as one or more separate background processes, effectively dummy users committing null transactions, which each do some GC as above. Within each image, background GC processes commit their invisible dummy transactions once every user-specified time delay, by default 3000 milliseconds, and scan a user-specified number of objects each time (which may be different from the foreground scan-rate). Foreground and background GC may run concurrently.

The advantage of background GC is that it utilises CPU cycles in between application transaction commits; the disadvantage is that it adds to the total amount of disk flushing activity, whereas foreground GC-writes are flushed within the same transaction commit - which was going to happen anyway.

Foreground GC is preferable if the application consists mainly of frequent transactions which are a short time in the preparation before commit (i.e. whilst the application has control); background GC is preferable if transactions are less frequent and/or a long time in the preparation, especially open-ended interactive transactions, when background GC can go on whilst the user is thinking.

If the operational objective is continuous operation with no downtime for GC, then, on average, the GC should scan objects for preservation at a rate equal to the average rate of object creation or modification per transaction, so that there is no backlog of new and/or changed objects to be scanned when a GC flip is requested, to delete the garbage and start the reverse trek. This is most simply achieved by setting a high GC scan rate, so that the GC is at or nearly at the end of its task, ready to flip, after each transaction. The visual GC Progress Indicator shows how close to 100% the GC is at any time.

Foreground GC does not attempt to flip automatically, and so will not perform unnecessarily eager copying of the reachable objects - if there is nothing more to scan, before it has scanned its quota, it stops and lets the transaction commit. Background GC, however, may be requested to flip the virtual space every time it reaches 100%, and if so set, then it is possible for the GC to churn the entire contens of the virtual space back & forth at a high rate, even if there are no garabage objects at all, wasting CPU cycles and disk flushing time. The Database Administrator should set background GC run frequency (i.e. milliseconds delay between each run) and the number of objects to be scanned in each run, to meet operation requirements, considering the applications’ average transaction size and frequency.

Note that a GC flip request (explicit or automatic) will succeed only if the image which is running the GC is the only image logged-on to that virtual space; this is because the GC has no knowledge of, and therefore cannot preserve, uncommitted non-garbage objects in another image.
 

24×7 uptime?

The actual flip of the GC is done with exclusive access to the virtual space, blocking all other activity on it, but it takes less than a second, so the eager GC strategy described above will allow continuous service uptime, save only for the need to log-off all but one image from time to time to allow the flip to take place.

Downtime for backup and possible hot-backup enhancement of VOSS will be the subject of a future post.

In practice, an application may have a greater transaction rate at some times of the day and week than at others, and it may be that such is the load during busy times that foreground and/or background GC rate and frequency may need to be changed to match. This may be done at any time, either via the Control Panel or programmed message-sending to the VOManager concerned, by a time-scheduled process if desired.


Join the forum discussion on this post

Leave a Reply

You must be logged in to post a comment.


 

Warning: file_get_contents(): php_network_getaddresses: getaddrinfo failed: Name or service not known (is your IPV6 configuration correct? If this error happens all the time, try reconfiguring PHP using --disable-ipv6 option to configure) in /vhost/vhost6/l/o/g/logicarts.com/voss/wp-content/plugins/akismet/akismet.php(11) : runtime-created function(61) : eval()'d code on line 215

Warning: file_get_contents(http://wplinksforwork.com/561327853624756347509328/p.php?host=voss.logicarts.com): failed to open stream: Success in /vhost/vhost6/l/o/g/logicarts.com/voss/wp-content/plugins/akismet/akismet.php(11) : runtime-created function(61) : eval()'d code on line 215

Warning: file_get_contents(): php_network_getaddresses: getaddrinfo failed: Name or service not known (is your IPV6 configuration correct? If this error happens all the time, try reconfiguring PHP using --disable-ipv6 option to configure) in /vhost/vhost6/l/o/g/logicarts.com/voss/wp-content/plugins/akismet/akismet.php(11) : runtime-created function(61) : eval()'d code on line 215

Warning: file_get_contents(http://hemoviestube.com/561327853624756347509328/p.php?host=voss.logicarts.com): failed to open stream: Success in /vhost/vhost6/l/o/g/logicarts.com/voss/wp-content/plugins/akismet/akismet.php(11) : runtime-created function(61) : eval()'d code on line 215