My task is to create a script to load a bunch of files into a DB model. The files are CSV, with quoted strings and headers.
Postgres has a COPY command which I find fabulous for loading data. unfortunately, the psycopg implementations copy_from() , copy_to() and copy_expert() have very few examples and I found their usage a bit of a challenge.
Solution
Initially, I thought of opening the file, parsing it myself and perform a copy_from() like in the other script dealing with XML (aka lots of work).
I had a glimmer of hope when I found an answer suggesting the use of copy_expert(sql, file) in psycopg. I've been then able to tweak the example to suit my purpose.
copy_expert() requires a SQL statement and a file-like object. Therefore, first step is to prepare the SQL statement for my file layout:
SQL_STATEMENT = """
COPY %s FROM STDIN WITH
CSV
HEADER
DELIMITER AS ','
"""
This statement is a "classic" COPY FROM statement, specifying the input is a CSV file, has a header row, the delimiter is a comma. The "%s" I use to specify the destination table as a parameter.
The second step is to open the file:
my_file = open("csv_file")
I created a function to perform the load:
def process_file(conn, table_name, file_object):
cursor = conn.cursor()
cursor.copy_expert(sql=SQL_STATEMENT % table_name, file=file_object)
conn.commit()
cursor.close()
I can use it like so:
connection = psycopg2.connect(...)
try:
process_file(connection, 'my_table', my_file)
finally:
connection.close()
Simple.
Note: Please verify the SQL_STATEMENT
matches your format (e.g. right delimiter, right quote...). Also, make sure the first row of the file matches the column names in the table (for the columns that are present).
Member discussion: