Re: [Catacomb] external storage of document data
On Fri, 28 Feb 2003, Chris Knight wrote:
> All, we've implemented external storage of file contents for Catacomb
> and I'd like to discuss merging it back into the main Catacomb release.
> It's fairly straightforward, however there is one (philosophical?)
> decision to make. I'd like to hear your opinions and see if this
> activity is already underway by someone else.
> Should the names/organization in the filesystem mirror the structure in
> Catacomb? Another option is files can be stored in a hashed directory
> structure based on the document id.
> Advantages of mirroring the structure in the fs:
> * easy to understand
> * can be interacted with outside of DAV
> * performance of containers with many (thousands) resources and for
> very deep container hierarchies
> * hard to divide between filesystems (a useful operation as the DAV
> repository grows)
> * requires a filesystem operation for MOVE operations
> * how to store multiple versions of a document?
> * changes outside of the DAV interface should be mirrored back into DAV
> * limited to the character set restrictions of the fs (or need to store
> the filesystem name of the file)
> FYI, I used to work on a web-based document management system and we
> moved from mirroring the structure to using a hash structure to resolve
> problems that occurred when people did many move operations. Conversely,
> our Catacomb server we are working with here at NASA mirrors the
> structure into the filesystem (and I may work on a system to mirror back
> into Catacomb when changes are performed on the fs.)
I'm developing a web-based document management system at OSDL, and we're
going with a directory structure based on the document id, like you
describe. The lack of that capability in Catacomb has been a stumbling
block for us in being able to incorporate DAV support into it, so I'd
*love* to see it supported by Catacomb.
Specifically, I'm using a repository structure like this:
[file_num] is there for documents that are aggregates of multiple files
(such as a webpage and its img's, or a PDF plus attached spreadsheet).
The [base] filename is a filesystem-clean version of the filename, just
used for convenience.
I'm presently in the process of writing a "folder mapper" to enable
autogenerating navigatable "renders" of the folder trees. The idea
being to allow teams to sort and navigate their documents in a fashion
they're used to. Sort of like a traditional hierarchical file system,
except that a given document could show up in multiple locations in the
tree (sort of like symlinks).
The folder mapper, when it renders the directory structure, names the
files according to a user-specified document title, rather than using
the ID number approach used in the repository; in theory this means that
one could cause file naming to be done in manners consistant with the
capabilities of the particular file system being rendered to (e.g.,
using spaces on Windows but underscores instead on Linux), although in
practice I'm really only concerned with Linux filesystems.
I want to include branching capabilities in this but haven't put much
thought into that part of the design. (As usual, the issue is mostly
with regards to merging.)
I had initially wanted to provide this folder mapper to be accessible
through WebDAV (via Catacomb) but couldn't figure out how to implement
it (and ran into some other DAV issues, like lack of being able to
impose document/author-specific access control), so we've postponed
adding DAV support to the document system until support is available.
I've implemented it as a semi-stand alone perl module Document::Archive
which I'm hoping could be used independently for general purpose
document access; I've attempted to structure it in a fashion compatible
with being a DMS-oriented wrapper around PerlDAV or similar.
So, anyway, _strong_ encouragement from me on development along these
lines. I am approaching a release of our DMS with this folder mapper
feature hopefully/probably within a few weeks, and I'll be sure to
announce it here. I can give more information ahead of that, if there's