Login | Register
My pages Projects Community openCollabNet

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Catacomb] external storage of document data



On Mon, 3 Mar 2003, Chris Knight wrote:
> Bryce Harrington wrote:
> >I'm developing a web-based document management system at OSDL, and we're
> >going with a directory structure based on the document id, like you
> >describe.  The lack of that capability in Catacomb has been a stumbling
> >block for us in being able to incorporate DAV support into it, so I'd
> >*love* to see it supported by Catacomb.
> >
> >Specifically, I'm using a repository structure like this:
> >
> >   /[doc_id]/[doc_rev]/[language]/[file_num],[base].[ext]
> >
> >
> Actually, what about having a function in dbms.c that looks like:
>
> int generate_filepath(char **path, dav_repos_resource *r) // note,
> should run apr_file_mkdirs
>
> Which you could then come up with your own conventions. I'd go even
> further with:
>
> /[doc_id % 100]/[doc_id / 100]/[doc_id]/[doc_rev]...
>
> This "stripes" the resources across a set of 100 top-level and 10,000
> second-level directories (100 per top-level.) Want to move ~45% of the
> total space to another drive? Move the top level directories 00 through
> 44 (and install symbolic links or alter the generate_filepath?)

I've thought a bit about doing something like this.  I decided against
it for now as a matter of simplicity.  Also, Linux can handle several
thousand files/subdirs per dir (some file systems allow even more).  As
far as disk space goes, even if you assume 10mb average per doc (our
average is actually around 100k), 10,000 documents is only 100g of
space.

But if it did become necessary to divide out documents across multiple
files - and you couldn't do it in hardware via SAN or something - then
something more flexible than having all the subdirs in a single dir
would be necessary, and an approach like you suggest would be valuable.
Maybe I can figure out a way to make it configurable, so an admin can
(re-)define the pathing strategy to suit their needs.

Bryce