Login | Register
My pages Projects Community openCollabNet

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Catacomb] reason for dasl_resource and version_resource as separate tables?



Kai Pan wrote:

Hi Chris,
When we implemented Catacomb-v, we also consider the design choices of
keeping dasl_resource and version_resouce in one table and keeping them in
separate tables. Finally, we decided to put them in separate tables for 3
reasons.

First, we wanted to minimize the impact on the Catacomb source code
caused by the change of the table structure of 'dasl_resource'. If we make
'dasl_resource' and 'version_resource' the same table, say
'dasl_resource', the primary key of 'dasl_resource' will be changed,
because one more column 'version' will join the primary key and the
'serialno' column can not be 'auto_increment' type any more. This will
cause a lot of changes to the Catacomb source code and every SQL statement
related to 'dasl_resource' table should be changed.
Ah, excellent explanation, thanks much! Makes much more sense.
You could use a sequences (see the LAST_INSERT_ID() definition in http://www.mysql.com/documentation/mysql/bychapter/manual_Reference.html#Other_Functions for details) instead of an auto-increment column...There may be other reasons to do this, but it's always a design decision that I currently have no opinion on one way or another.)

Secondly, now that you mention the primary key being both the serialno and the version, version '0' could be used to signify a non-versioned document (and all versions start with 1.) Or, alternately, a really really large number (maxint?) if that seems more comfy.

Agreed, it would change much of the other SQL code.

Second, 'dasl_resource' and 'version_resource' tables have different
meaning. 'dasl_resource' stores the current information and content of a
resource, and 'version_resource' contains the change history of a
resource. The 'textcontent' field in the 'version_resource' table could
only contain the delta (difference) between two versions of a resource, if
we apply 'diff' algorithm in Catacomb in future, which differs from the
'textcontent' field in 'dasl_resource' table.

So I can continue to interact with dasl_resource normally unless I specifically need an older version of a resource.

Third,it is the performance reason. We assume that most resources are not
under version control in a Catacomb server. So, we keep the
'dasl_resource' table small and make the primary key more efficient by
separating 'dals_resource' and 'version_resource' tables. In this way, the
searching operation (dasl) and other operation (COPY, MOVE, PROPFIND,
etc.) will be faster.

Yes and no, when you are searching across all versions you'll have to search against twice as many tables.

If (a future version of) Subversion were installed on top of Catacomb, your assumption would have to be reversed. ;^)

I'll continue with my external file storage code now that I understand the reasoning behind the schema changes and hopefully I'll have code to check in next week or the week after.