Wednesday, May 06, 2009

Locating a Corrupt PDF (or many types of corrupt files) in a BLOB

I recently had a client pose the following question to me.
We have a system that generates pdf's based upon data from our Oracle database. The pdf's are then stored in a BLOB column in a table. We recently realized that at least one of those pdf's was corrupt. Is there a way within Oracle to determine if others are corrupt?

I initially thought about trying to write a script that would pull each blob, run it through some kind of pdf validator and write a log entry if it were bad. Hmmm, use plsql and call out to a validator on the O/S? Write a java routine to do the same thing? What validator would I use?

Then I had an idea--Oracle Text (formerly Intermedia and Context). I could create a context index on the blob column and see if there are any errors.

CREATE INDEX myIndex ON my_table(my_blob) INDEXTYPE IS ctxsys.context;

If the Oracle Text engine is unable to index the blob it writes a log entry to the view ctx_user_index_errors including the rowid. Depending on the database version you might have issues with very recent versions of the pdf, but that was not an issue in our case as every pdf was created by the same pdf generator.

I think Oracle Text is underused. This is just one more example of how to make it work for you.

4 comments:

Patrick Wolf said...

Anton,

that's an interesting usage of Oracle Text :-) Nice!

Patrick

Anton Nielsen said...

Thanks Patrick. I always appreciate an atta-boy, but especially from you!

I keep wanting to put together a post on common context usages, specifically the contains clause. Maybe I'll start with a blog post with some examples and then solicit input. Then I can put together a consolidated post. So, start thinking about your best contains clauses!

Thanks again,

Anton

Patrick Wolf said...

Carsten Czarski from Oracle Germany runs a very good blog about Oracle Text at http://oracle-text-de.blogspot.com/ Too bad that it's in German, but a translation services should do a good job.

Patrick

german website translation said...

Thanks for this great sharing.


german website translation