Wednesday, May 06, 2009

Locating a Corrupt PDF (or many types of corrupt files) in a BLOB

I recently had a client pose the following question to me.
We have a system that generates pdf's based upon data from our Oracle database. The pdf's are then stored in a BLOB column in a table. We recently realized that at least one of those pdf's was corrupt. Is there a way within Oracle to determine if others are corrupt?

I initially thought about trying to write a script that would pull each blob, run it through some kind of pdf validator and write a log entry if it were bad. Hmmm, use plsql and call out to a validator on the O/S? Write a java routine to do the same thing? What validator would I use?

Then I had an idea--Oracle Text (formerly Intermedia and Context). I could create a context index on the blob column and see if there are any errors.

CREATE INDEX myIndex ON my_table(my_blob) INDEXTYPE IS ctxsys.context;

If the Oracle Text engine is unable to index the blob it writes a log entry to the view ctx_user_index_errors including the rowid. Depending on the database version you might have issues with very recent versions of the pdf, but that was not an issue in our case as every pdf was created by the same pdf generator.

I think Oracle Text is underused. This is just one more example of how to make it work for you.

Saturday, May 02, 2009

Oracle SQL Delimiter

It's been a while since I made a real post, one that has some technical merit. This one just barely qualifies.

There is a nice classic report type "Function Returning a SQL Query" which allows you to write a block of code that returns a select statement. That select statement is the basis for the report. I often use it to simplify building a where clause. It might look something like this:

declare
l_q varchar2(32767); -- This is the return variable. It will hold a select statement

begin
l_q := 'select d.name department, e.name employee
from dept d, emp e
where d.deptno = e.deptno ';

if :P1_DEPTNO is not null then
l_q := l_q || ' and d.deptno = :P1_DEPTNO ';
end if;

-- insert more if statements like above

return l_q;
end;


With Interactive Reports this has become less likely, but it still happens.

Sometimes, though, the query is a bit more complicated, something like this:

begin
l_q := 'select d.name department, e.name employee,
to_date(to_char(e.hiredate,''yyyymm'' ||''01''),''yyyymmdd'') first_day_of_month
from dept d, emp e
where d.deptno = e.deptno
and e.status in (''NEW'', ''PENDING'', ''FOO'')
and d.status = ''NEW'' ';

if :P1_DEPTNO is not null then
l_q := l_q || ' and d.deptno = :P1_DEPTNO ';
end if;

-- insert more if statements like above

return l_q;
end;


You get the idea, a lot of strings and a lot of double ticks (''). There is an easier way, though: a custom SQL Delimiter. Usually a single tick (') indicates the start of a string. If you use a tick within it, you need two ticks ('') to escape it (see code above). But you can avoid that with a custom delimiter. You turn it on with q', that's all, just the letter q followed by a tick. The trick is that the very next character is the delimiter, and a few are special. You end the statement with followed by a tick. Examples are the easiest, so here are a few:

1. l_q := q'! I don't need to escape the ' in the word don't !';
In this case the ! is the special character. q'! starts the string and !' ends it.

2. l_q = q'{ I don't need to escape the ' in the word don't }';
This is a little different. If your special character is { then you use }' to turn it off. It is pretty obvious which are special: [], {}, <>.

So, the code above becomes:

begin
l_q := q'{ select d.name department, e.name employee,
to_date(to_char(e.hiredate,'yyyymm' ||'01'),'yyyymmdd') first_day_of_month
from dept d, emp e
where d.deptno = e.deptno
and e.status in ('NEW', 'PENDING', 'FOO')
and d.status = 'NEW' }';

if :P1_DEPTNO is not null then
l_q := l_q || ' and d.deptno = :P1_DEPTNO ';
end if;

-- insert more if statements like above

return l_q;
end;


You can use this just about anywhere in sql or pl/sql where you want a string.