Database Management for Smalltalk

Archive for the 'Articles' Category

Mon 14th Apr 2008   03:04 PM
posted by John Clapperton

‘Garbage’, in an object-oriented database system, means objects which have become unreachable because references from other objects have been removed by the application, and which are therefore to be removed to reclaim storage space. The garbage collector (’GC’) in VOSS is a Baker design, similar in concept to the garbage collector in Smalltalk-80, which can be run in foreground and/or background modes of which more later.

Unlike the mark-sweep design, which truly is a ‘garbage collector’, the Baker design would be better characterised as a ‘good object preserver’ which copies all reachable objects within a virtual space back and forth from one ’semi-space’ to another, deleting everything left behind on each flip. This has consequences for database design, administration, and the choice of GC settings, to tune for maximum transaction throughput and minimum downtime.

Know which space your objects are in.

In a database design using multiple virtual spaces, the most important thing to know is that when the VOSS GC flips it preserves only those objects in a virtual space which are (indirectly) reachable from the rootDictionary of that virtual space or from the image in which the GC is then running (in the case of uncommitted new objects). In other words, any object in virtual space ‘A’ which is referenced only by an object in virtual space ‘B’, though it will behave normally whilst present, will not be preserved when the GC flips virtual space ‘A’. Or in other words again, reference(s) from object(s) in another virtual space alone are not sufficient to preserve an object from the GC of the virtual space in which it exists. In its systematic copying, the GC does not follow references to objects outside the virtual space which it is scanning.

This situation must be avoided, as after such a flip, those objects would become instances of VOUndefinedObject, and later magically become some arbitrary new object when the id number of that apparently garbage object in space ‘A’ was re-allocated to a new object, having spent some days, weeks or months on space ‘A’s free id list.

The simplest way to keep this right is always to use one of the explicit variants of the message to virtualize an object, preferably on creation, which specify the location explicitly, for example:

  myObject := MyClass newVirtualIn: aVOManager.

The non-specific variants, for example:

  myObject := MyClass newVirtual.

virtualize the new instance in the current virtual space of the current process, i.e. the virtual space which hosts the last object to have received a message in the current process.
 

Why use multiple virtual spaces?

One reason for distributing a database across multiple virtual spaces may be that some parts of the database are static, whilst others are volatile - subject to frequent update and new object creation - and in this case unnecessary GC processing can be avoided by suitable partitioning of the database, so that the more static virtual space is GC’d less frequently, if at all.
 

Foreground or Background Garbage Collection?

Foreground GC scans and saves a specified number of reachable objects as an addendum to  each transaction commit, before that transaction’s changes are physically written to disk, to minimise disk activity. The effect, to the user, is that each transaction commit takes longer than it otherwise would, depending on the number of objects it is set to scan.

Background GC runs as one or more separate background processes, effectively dummy users committing null transactions, which each do some GC as above. Within each image, background GC processes commit their invisible dummy transactions once every user-specified time delay, by default 3000 milliseconds, and scan a user-specified number of objects each time (which may be different from the foreground scan-rate). Foreground and background GC may run concurrently.

The advantage of background GC is that it utilises CPU cycles in between application transaction commits; the disadvantage is that it adds to the total amount of disk flushing activity, whereas foreground GC-writes are flushed within the same transaction commit - which was going to happen anyway.

Foreground GC is preferable if the application consists mainly of frequent transactions which are a short time in the preparation before commit (i.e. whilst the application has control); background GC is preferable if transactions are less frequent and/or a long time in the preparation, especially open-ended interactive transactions, when background GC can go on whilst the user is thinking.

If the operational objective is continuous operation with no downtime for GC, then, on average, the GC should scan objects for preservation at a rate equal to the average rate of object creation or modification per transaction, so that there is no backlog of new and/or changed objects to be scanned when a GC flip is requested, to delete the garbage and start the reverse trek. This is most simply achieved by setting a high GC scan rate, so that the GC is at or nearly at the end of its task, ready to flip, after each transaction. The visual GC Progress Indicator shows how close to 100% the GC is at any time.

Foreground GC does not attempt to flip automatically, and so will not perform unnecessarily eager copying of the reachable objects - if there is nothing more to scan, before it has scanned its quota, it stops and lets the transaction commit. Background GC, however, may be requested to flip the virtual space every time it reaches 100%, and if so set, then it is possible for the GC to churn the entire contens of the virtual space back & forth at a high rate, even if there are no garabage objects at all, wasting CPU cycles and disk flushing time. The Database Administrator should set background GC run frequency (i.e. milliseconds delay between each run) and the number of objects to be scanned in each run, to meet operation requirements, considering the applications’ average transaction size and frequency.

Note that a GC flip request (explicit or automatic) will succeed only if the image which is running the GC is the only image logged-on to that virtual space; this is because the GC has no knowledge of, and therefore cannot preserve, uncommitted non-garbage objects in another image.
 

24×7 uptime?

The actual flip of the GC is done with exclusive access to the virtual space, blocking all other activity on it, but it takes less than a second, so the eager GC strategy described above will allow continuous service uptime, save only for the need to log-off all but one image from time to time to allow the flip to take place.

Downtime for backup and possible hot-backup enhancement of VOSS will be the subject of a future post.

In practice, an application may have a greater transaction rate at some times of the day and week than at others, and it may be that such is the load during busy times that foreground and/or background GC rate and frequency may need to be changed to match. This may be done at any time, either via the Control Panel or programmed message-sending to the VOManager concerned, by a time-scheduled process if desired.


Join the forum discussion on this post
Wed 9th Jan 2008   06:01 PM
posted by John Clapperton

Where do you keep your behavior? Normalised in the application domain objects? In non-domain transaction-performing classes? Some of each? I seem to remember this question being touched on once long ago in Digitalk’s Compuserve forum, but never since.

One of the benefits of a persistent object database is that not only static integrity constraints but arbitrarily complex procedure can be expressed just once in the appropriate application domain object, rather than being scattered and/or replicated in a number of external function oriented procedures - normalisation of procedure, in other words. However, the downside of this is that if an intensively used method is located in a domain class of which there are relatively few instances, then those objects may become locking hotspots which destroy concurrency, even though they themselves may not be changed, merely carrying transactional responsibility.

At the other extreme, locating transaction procedures in non-database objects which call only get & set methods in the domain objects, maybe with simple integrity constraint methods, imposes no additional constraint on concurrency, but at the cost of additional work during application design and modification, to ensure that update transactions are all consistent with each others’ implied integrity constraints.

Design by CRC cards (class, responsibility, collaboration) would seem to identify the theoretical location of transaction responsibility, but how does this work out in practice? Are there common patterns from which guidelines might be found for these design decisions?


Join the forum discussion on this post
 
cancer treatment effects of celexa buy proscar real levitra online online cialis no prescription buy cialis where luvox cr cialis online without prescription rx-viagra chewable cialis buy levitra onlines cialis 10 online allegra effects of high blood pressure buy alpha lipoic acid phentermine no perscription medician for heart attacks chlamydia medication dosage alternative cholesterol treatment buy lasuna natural constipation remedies nitroglycerin sublingual site viagra treatment of heart attacks seroquel for depression buy cialis how to boost immune system healthy pets cialis alcohol buy canada levitra scabies medicine cheapest cialis price buy pain medicine on-line immune system support products cats hairball low prices pain meds alcoholism treatment option new treatment for depression buy tribulus discount vitamin vitamin a natural arthritis cures chlamydia antibiotics viagra prescription lisinopril 10mg celecoxib medicine ultram buy discount cialis famvir dose buspar viagra online usa herbal antifungal and antibacterial buy viagra hypnosis to stop smoking buy discount cialis depakote 250mg buy lipothin viagra to buy how to take a beta-blocker buy cialis without prescription diazepam 10 mg what causes hair loss in women buy cheap cialis without a prescription cheap cialis tadalafil generic viagra on line menopause treatment zantac medication dog tooth infection cialis 30 effexor dose buy glucosamine sulfate metoprolol dose diet hoodia gum side effects levitra cialis on line stomach parasites viagra online cheap breast increasing oils buy paxil online online cheap viagra buy buy viagra order viagra cialis 5 aspirin medicine at home acne treatment canada cialis generic buy mycelex treatment of stroke order viagra plus zyprexa 5mg buy diazepam approved cialis fda itching relief viagra online usa hair loss treatment online information on levitra immune system supplement how to buy viagra beta blocker uses adhd treatment buy online viagra buy cialis online in usa adhd in women cats inflammatory bowel disease cialis viagra valtrex dosage cheap breast augmentation buy cialis online with a prescription prescription ibuprofen healthy women's vitamins dog site health treatment for lung cancer cialis without a prescription cialis 5mg cheap order viagra online discount viagra online order celebrex hypnotherapy for health synthroid doses hypothyroidism medication discount cialis online cialis 50mg help for infertility top ten acne products buy pain killers reducing cholesterol naturally viagra tablet dosage zoloft viagra cialis cialis comparison online stores hair loss products medications ativan lowering blood pressure naturally fluconazole cialis 20 mg buy augmentin buy cialis online cold flu menopause gum online pharmacy viagra buy cymbalta fluconazole capsule jelly kamagra brand viagra online cialis prescription online cialis without rx irritable bowel syndrome cures zestril medication cheapest online cialis buy celebrex online buy viagra on line