One of the things I had to do is to build a script to upload data (e.g. the content of a list of files) into a database with a single table. My table has 3 columns:
- ID - a 20-character string identifier with PK properties (unique)
- ARTICLE- an XMLType element with the article details
- CITEDBY - an XMLType element with citations data
The problem is that some of the data is already in the database and it may only need updating. Following this question, the selected strategy is to perform the update and, if the record does not exist, then insert it. The SQL code equivalent is (from here):
UPDATE tablename
SET val1 = in_val1,
val2 = in_val2
WHERE val3 = in_val3;
IF ( sql%rowcount = 0 )
THEN
INSERT INTO tablename
VALUES (in_val1, in_val2, in_val3);
END IF;
If this were only SQL, everything would be OK. However, the requirement is to perform the operation from python, so I can only do one statement at a time. Moreover, I can't do an IF statement. To be a bit DRY (just in case I need to reuse this somewhere else), I've created a simple function:
def insert_or_update(connection, identifier, article, citations=None):
cursor = connection.cursor()
Update
First attempt is to update a record, if it exists:
cursor.setinputsizes(cx_Oracle.CLOB, cx_Oracle.CLOB, 20)
cursor.prepare(UPDATE_STATEMENT)
cursor.execute(None, (article, citations, identifier))
connection.commit()
My update statement is:
update ARTICLE_XML_DATA set
article = XMLType(
xmlData => :1,
schema => 'article-description.xsd',
validated => 0,
wellformed => 0
),
citedby = XMLType(
xmlData => :2
)
WHERE id = :3
To make usable in the python script, you need to wrap it into UPDATE_STATEMENT=""" ... """.
You can see the statement will have an effect only if the table contains a record with the required identifier. Otherwise, nothing happens.
Insert
Since I can't perform a composite statement (with an IF) without moving to PL/SQL, I need to emulate this. The simplest way to do it is to perform an INSERT anyway and catch the error if the record is already there. The resulting code is described below:
try:
cursor.setinputsizes(cx_Oracle.CLOB, cx_Oracle.CLOB, 20)
cursor.prepare(INSERT_STATEMENT)
cursor.execute(None, (article, citations, identifier))
connection.commit()
print >> sys.stderr, "++ New Entry:", identifier
except cx_Oracle.IntegrityError, e:
if ("%s" % e.message).startswith('ORA-00001:'):
print >> sys.stderr, "Entry already there:", identifier
else:
raise e
As you can see, I'm catching the IntegrityError and check if it's for a duplicate entry (my identifier field has a primary key -- unique -- constraint). If yes, then I'm just noting it. Otherwise, I'm propagating the error further.
The insert string is very similar to the update string above:
INSERT into ARTICLE_XML_DATA(article, citedby, id)
values (
XMLType(
xmlData => :1,
schema => 'article-description.xsd',
validated => 0,
wellformed => 0
),
xmltype(:2),
:3
)
Then, at the end I perform a cursor closing (to avoid the max cursors error):
finally:
cursor.close()
Function
The full function contents is:
def insert_or_update(connection, identifier, article, citations=None):
cursor = connection.cursor()
cursor.setinputsizes(cx_Oracle.CLOB, cx_Oracle.CLOB, 20)
cursor.prepare(UPDATE_STATEMENT)
cursor.execute(None, (article, citations, identifier))
connection.commit()
try:
cursor.setinputsizes(cx_Oracle.CLOB, cx_Oracle.CLOB, 20)
cursor.prepare(INSERT_STATEMENT)
cursor.execute(None, (article, citations, identifier))
connection.commit()
print >> sys.stderr, "++ New Entry:", identifier
except cx_Oracle.IntegrityError, e:
if ("%s" % e.message).startswith('ORA-00001:'):
print >> sys.stderr, "Entry already there:", identifier
else:
raise e
finally:
cursor.close()
Some notes
- The function can be further enhanced to allow for flexible arguments, different strings
- You could perform the PL/SQL statements in a more complex statement
Member discussion: