Following the previous incursions in Oracle and their own documentation, we decided it was useful to attach schemas to the inserted data, just because only XML with schemas attached can be indexed. When you have some million entries, indexing seems like a good idea (once you decide what to index).

In Oracle, you have DBMS_XMLSCHEMA.registerSchema() to do just that. It has a bunch of parameters which are thoroughly described in their documentation. Suffices to say that you can choose to generate tables, objects or java beans off the loaded schemas. An example of call can be:

dbms_xmlschema.registerSchema(
    schemaURL => 'mySchema.xsd',
    schemaDoc => '<schema document as a string>',
    local => true,
    genTypes => false,
    genTables => true,
    enableHierarchy => dbms_xmlschema.ENABLE_HIERARCHY_NONE
);

In this case, we generate the associated tables but we don't generate custom types.

A Concrete Example

As mentioned elsewhere, XSDs can have dependencies and Oracle supports this (at some extent). My list of schemas in dependency order (i.e. the basic ones are at the beginning) is defined by:

SCHEMA_ORDER = [
  'xocs-ani512-xml.xsd',
  'xocs-ani512-ani-ce.xsd',
  'xocs-ani512-ce.xsd',
  'xocs-ani512-cto.xsd',
  'xocs-ani512-ait.xsd',
  'xocs-ani512.xsd',
]

When actually writing the loader code, I've decided that using PL/SQL throughout the DB-related code would be preferred. This is also quite feasible because I'm not performing selects to require sequential access to data. To this effect, my insertion string looks like:

INSERT_STR_MULTI = """
declare
    vSCHEMA clob;
begin
    %(segments)s

  dbms_xmlschema.registerSchema(
    schemaURL => '%(name)s',
    schemaDoc => vSCHEMA,
    local => true,
    genTypes => false,
    genTables => true,
    enableHierarchy => dbms_xmlschema.ENABLE_HIERARCHY_NONE
  );

end;
    """

This string is using the approach described in this previous article.

The whole process of loading them is:

for xsd_file in SCHEMA_ORDER:
    xsd_path = os.path.join('/my/xsd/dir', xsd_file)

    # Add the schema
    #
    try:
        print "Insert [%s] schema" % xsd_file
        xsd_content = open(xsd_path).read()

        if len(xsd_content) > 3000:
            print "... Large XML file. Splitting"
            result = ""
            while len(xsd_content) > 3000:

                result += "vSCHEMA := CONCAT(vSCHEMA, '%s');\n" % xsd_content[:3000]
                xsd_content = xsd_content[3000:]

            result += "vSCHEMA := CONCAT(vSCHEMA, '%s');\n" % xsd_content

            insert_str = INSERT_STR_MULTI % \
                {
                    'name': xsd_file,
                    'segments': result,
                }
        else:
            insert_str = INSERT_STR % \
                {
                    'name': xsd_file,
                    'content': xsd_content,
                }

        cursor.execute(insert_str)

        connection.commit()
    except Exception as e:
        print "Fail to add [%s]" % xsd_path, e

Several explanations:

  • The loop iterates through the list of schemas and loads them sequentially
  • The try ... catch code is designed to show me if any XSD fails (e.g. missing dependency) without taking the whole program down.
  • The inner code contains the logic for splitting large schemas

References

HTH,