all

Simulate UNION with BLOBs

Laur Ivan

Nov 21, 2016 • 2 min read

TL;DR: If you have BLOBs, you need to use UNION ALL. If you use UNION ALL, you need to partition off a criterion and select the first row.

The other day I had to figure out how UNION works in Oracle, because we're retrofitting an application with extra features. Assuming I have 2 tables(table_1 and table_2) with identical structures:

"PROP_ID" NVARCHAR2(25),
"SEQ_NUM" NUMBER(*,0),
"FILE_NAME" NVARCHAR2(256),
"DESCRIPTION" NVARCHAR2(256),
"FILE_CONTENT" BLOB,
PRIMARY KEY ("PROP_ID", "SEQ_NUM")

I need to perform an UNION to access both tables'fields:

SELECT prop_id, seq_num, description, file_name, file_content, 'W' src
    FROM table_1
UNION
SELECT prop_id, seq_num, description, file_name, file_content, 'S' src
    FROM table_2

Unfortunately, I've found out that you can't just do UNION when you have BLOB data because a simple UNION would eliminate duplicates. So... I needed a way to fake that. The best I could come up with is:

/*
  Eliminate duplicates SELECT (3)
  */
SELECT prop_id, seq_num, description, file_name, file_content,
  length(file_content) AS content_length, src, RW FROM (
  /*
    Partition SELECT (2)
    */
  SELECT prop_id, seq_num, description, file_name, file_content, src,
    row_number() OVER (PARTITION BY file_name ORDER BY src ) RW FROM (
    /*
      Union SELECT (1)
      */
    SELECT prop_id, seq_num, description, file_name, file_content, 'W' src
      FROM we_proposal_attachments
    UNION ALL
    SELECT prop_id, seq_num, description, file_name, file_content, 'S' src
      FROM we_sap_proposal_attachments
  )
) WHERE RW = 1

Here's what happens:

Using UNION ALL will get us duplicated rows (i.e. duplicated file names)
In Union SELECT (1) I'm adding a column src to identify the source (W from table_1 and S from table_2 respectively)
In Partition SELECT (2) I'm partitioning by file_name (the key we're interested in) and create yet another column RW
In Eliminate duplicates SELECT (3) I'm picking up the first row of the duplicates.

The main effects are:

we can perform an UNION-like statement with tables containing BLOB fields, where our uniqueness criteria don't imply the BLOB itself
We can prioritise results from either table. E.g. I know table_2 has newer versiond of the file and I'd prefer having the newer version available.

Bonus

You can apply the prioritisation also with simple UNION statements, particularly if you have odd fields like length(file_content) which would render the duplicate removal useless:

SELECT prop_id, seq_num, description, file_name, content_length, src, RW
FROM (
  SELECT prop_id, seq_num, description, file_name, content_length, src,
    row_number() OVER (PARTITION BY file_name ORDER BY src) RW
  FROM (
    SELECT prop_id, seq_num, description, file_name,
      length(file_content) AS content_length, 'W' src
      FROM table_1
    UNION
    SELECT prop_id, seq_num, description, file_name,
      length(file_content) AS content_length, 'S' src
      FROM table_2
  )
) WHERE RW = 1

HTH,