VirtualDictionarySet (a virtualized subclass of DictionarySet) is the workhorse collection class to use for aggregation of objects in a VOSS database. Typically, all the objects in a DictionarySet will be of the same class (e.g. Employee, Department, Project, Order, Invoice etc), but not necessarily.
DictionarySet allows its elements to be indexed by any number of single-valued and/or multi-valued key-selector unary messages (e.g. #employeeNumber (single-valued), #lastName (multi-valued), #deptNumber (a foreign key in Employees), #deptNumber (primary key in Departments) etc), as required by the application.
Complex queries are built by sending messages such as <#for: #lastName equals: ‘Smith’> to the DictionarySet, to which the DictionarySet returns a set of its elements meeting that criterion. There is an example of a complex query here.
The tip here, in the interests of efficiency, is not to “change the class” of the returned partial query sets (e.g. #asOrderedCollection) until after all the intersections and/or unions of partial query sets have been done to build the complex query - and even then there is rarely any need to do this.
VirtualDictionarySet>>for:equals: etc return a VOLogicalOrderedIdentitySet of virtual objects, which compares logical identity of the virtual objects it contains (and is adding) by comparing the object’s #voManager identity and its integer object #id, both of which are in the object’s proxy VORef (which is what is actually in the set), and therefore no #= message need be expensively forwarded to the instantiated object in the cache in the voManager.
Moreover, VOLogicalOrderedIdentitySet is hashed (on #id in the VORef), and needs to compare identity only when probing hash collisions. Therefore it executes #includes: much faster than OrderedCollection or other non-hashed collections, which must compare equality for each of the entire collection when looking for an absent object, and, on average, half the collection for each present object, forwarding #= to the instantiated object each time.
It is therefore very much faster when constructing complex queries to do all union and intersection of the partial query sets using the VOLogicalOrderedIdentitySets as returned by the partial query components.
Join the forum discussion
The previous release (3.145.01) introduced buffered transaction logging, in which a log archive daemon process archives the contents of the log buffer at specified intervals, increasing the maximum commit rate to 50 transactions per second on desktop hardware. This new release 3.145.02 allows for log archiving to be disabled, further increasing transaction throughput, for example up to 150 logged random create/inserts per second into a VirtualDictionary of 10 million objects (depending on cache settings, and with no concurrent garbage collection).
Log archiving, present from the earliest release of VOSS, saves the transaction log as a virtual object in a separate virtual space, thus available to an application for audit trail etc. However if this is not required then the new archive disable feature allows significantly higher performance. The log buffer file is thus no longer a buffer, it is the log, growing indefinitely instead of being emptied every 1500 milliseconds. However, backup procedure is unchanged, starting a new empty log which records the backup timestamps.
Rollforward recovery procedure is also unchanged, transparently first applying any archived log entries to the virtual space backup copies being rolled forward, followed by the entries in the log buffer.
VOSS 3.145.02 is available for download here
John
Join the forum discussion
When an object is added to a DictionarySet (or VirtualDictionarySet), it is added to each of the DictionarySet’s component AutoDictionaries, each of which sends its defined unary message selector to the object to obtain the key at which it is to be inserted.
If the object returns a VOKeySet or VOKeyCollection (which should also normally be a set of unique keys) then the object is added into that AutoDictionary at each of those keys; this may be useful, for example, if the DictionarySet elements are published books or papers, each of which may have been written by several contributing authors. The consequence of this, however, is that the DictionarySet would no longer be a set, as a book having three authors will be present three times, once at each author key, and this could cause unexpected behaviour in DictionarySet>>do: and other enumeration methods.
For this reason, when a DictionarySet is created it is automatically initialized with a baseDictionary on the key selector #yourself, which is thus a set, and DictionarySet>>do: etc. all operate on this baseDictionary.
This presents an opportunity for optimisation: since if it is known that none of the DictionarySet elements will ever return a VOKeySet or VOKeyCollection of keys, the baseDictionary is unnecessary and may be removed at any time, in an ordinary transaction, by the method DictionarySet>>removeBaseDictionary.
Subsequent transactions which add/remove objects to/from the DictionarySet will thus be faster.
When there is no baseDictionary, DictionarySet>>do: etc. operate on one of the component AutoDictionaries chosen at random.
jc
Join the forum discussion
Queries in a VOSS odbms are written in Smalltalk and may therefore be of arbitrary complexity, addressing an arbitrary semantic network of persistent objects. However, to simplify the most common kinds of queries, it is recommended that VirtualDictionarySet be used as the general-purpose Collection for the major aggregations of application entities in the database.
A DictionarySet may index its elements on any number of single-valued and/or multi-valued unary key selector messages, which may be added or removed at any time by ordinary (though potentially large) transactions, and DictionarySet uses these to provide efficient query-building methods which return subsets of its contents, allowing more complex queries to be built by union and intersection of these sets.
A previous article noted the optimisation possible by using #for:equalsNoCopy: which returns the actual virtual set within the VirtualDictionarySet, instead of #for:equals: which returns a copy of it, when there is no intention to add or remove elements to or from the returned answer set. This article concerns optimisation of queries using #for:between:and:.
The set returned by this method may be used to build queries in the usual way, by union and intersection with other sets, but it is as well to know that it is typically slower than #for:equals:. This is because, for example in the query: journeys for: #startDate between: aDate and: bDate, the answer set is constructed by finding (efficiently) the integer indexes of the (nearest etc) elements at the keys aDate and bDate and then enumerating the keys between those two, using integer index access each time, to add their values into the constructed answer set, rather than simply answering an existing set of multiple values at a key (or the element at a single-valued key in a constructed set by itself).
If, therefore, the partial answer set from #for:between:and: is large, it may be more efficient to intersect the sets returned by other parts of the query first, and enumerate that smaller result with #select: to return the subset of those elements which meet the range criterion.
jc
Join the forum discussion
