Tuesday, April 18, 2017

XML Parsing failed with LPX-00261: invalid URL


As promised in my last post, I'll address the parsing error you get when you attempt to convert the text returned by the NCBI efetch utilities into an XMLType. The Oracle XML parser within the database treats the XML comment "<!DOCTYPE TaxaSet PUBLIC "-//NLM//DTD Taxon, 14th January 2002//EN" "https://www.ncbi.nlm.nih.gov/entrez/query/DTD/taxon.dtd">" as an indicator to validate the XML against the DTD located at "https://www.ncbi.nlm.nih.gov/entrez/query/DTD/taxon.dtd". In many, maybe even most, cases this is fine. If you are generating an XML payload you may want to validate it before sending it out to avoid the embarrassment of sending an invalid message. If you are retrieving it from the source, though, odds are good it will be valid and validating it against the DTD may just be extra overhead. It's also possible that the XML parser doesn't understand the DTD. At least one version of the database has a bug related to exactly how that comment is formatted. In my case there are at least two issues related to validating against the DTD, and I have no need for it. Below is the query and the associated error.

select xmltype(
'<?xml version="1.0" ?>
<!DOCTYPE TaxaSet PUBLIC "-//NLM//DTD Taxon, 14th January 2002//EN" "https://www.ncbi.nlm.nih.gov/entrez/query/DTD/taxon.dtd">
<TaxaSet><Taxon>
    <TaxId>33208</TaxId>
    <ScientificName>Metazoa</ScientificName>
    <OtherNames>
        <GenbankCommonName>metazoans</GenbankCommonName>
        <BlastName>animals; animals</BlastName>
        <Synonym>Animalia</Synonym>
        <CommonName>multicellular animals</CommonName>
    </OtherNames>
    <ParentTaxId>33154</ParentTaxId>
    <Rank>kingdom</Rank>
    <Division>Invertebrates</Division>
    <GeneticCode>
        <GCId>1</GCId>
        <GCName>Standard</GCName>
    </GeneticCode>
    <MitoGeneticCode>
        <MGCId>1</MGCId>
        <MGCName>Standard</MGCName>
    </MitoGeneticCode>
    <Lineage>cellular organisms; Eukaryota; Opisthokonta</Lineage>
    <LineageEx>
        <Taxon>
            <TaxId>131567</TaxId>
            <ScientificName>cellular organisms</ScientificName>
            <Rank>no rank</Rank>
        </Taxon>
        <Taxon>
            <TaxId>2759</TaxId>
            <ScientificName>Eukaryota</ScientificName>
            <Rank>superkingdom</Rank>
        </Taxon>
        <Taxon>
            <TaxId>33154</TaxId>
            <ScientificName>Opisthokonta</ScientificName>
            <Rank>no rank</Rank>
        </Taxon>
    </LineageEx>
    <CreateDate>1995/02/27 09:24:00</CreateDate>
    <UpdateDate>2017/02/16 16:52:33</UpdateDate>
    <PubDate>1992/05/26 01:00:00</PubDate>
</Taxon>
</TaxaSet>') the_xml
  from dual;


ORA-31011: XML parsing failed 
ORA-19213: error occurred in XML processing at lines 2 
LPX-00261: invalid URL https://www.ncbi.nlm.nih.gov/entrez/query/DTD/taxon.dtd

Fortunately, you can instruct the database to NOT validate.

alter session set events='31156 trace name context forever, level 2'

will instruct the database to skip the validation. You can then use xmlTable, xmlQuery, etc. without receiving an error.

Note: you may need to "grant alter session" to the user in order for this to work, particularly if using an execute immediate within a package to do the alter session command.




Saturday, April 15, 2017

The Man in the Middle

I recently ran into an Oracle database bug (version 12c). It's definitely a bug, and there may even be a patch for it, maybe. We've all been in this position, though. There's definitely a bug. We've identified something similar in the Oracle support site bug database. There's a patch for it. How long will it take to get the patch installed? Can I even install the patch on an Oracle Database Appliance (or Exadata or whatever other special circumstance)? What happens if the patch doesn't fix the specific issue?

Here is my scenario. I need to interact with the National Institutes of Health taxonomy database. It has REST based services--check them out:

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=taxonomy&term=Metazoa&usehistory=n

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=taxonomy&id=33208

So, super easy, I can use apex_web_service.make_rest_request to make the request, parse the response and do some science!

Let's see what happens:

select 
  apex_web_service.make_rest_request(
        p_url => 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=taxonomy&id=33208',
        p_http_method => 'GET'
        ) taxonomy_info
        from dual
        ;

ORA-29273: HTTP request failed
ORA-28860: Fatal SSL error
ORA-06512: at "SYS.UTL_HTTP", line 368
ORA-06512: at "SYS.UTL_HTTP", line 1118
ORA-06512: at "APEX_050100.WWV_FLOW_WEB_SERVICES", line 636
ORA-06512: at "APEX_050100.WWV_FLOW_WEB_SERVICES", line 760
ORA-06512: at "APEX_050100.WWV_FLOW_WEBSERVICES_API", line 236
ORA-06512: at line 1
29273. 00000 -  "HTTP request failed"
*Cause:    The UTL_HTTP package failed to execute the HTTP request.
*Action:   Use get_detailed_sqlerrm to check the detailed error message.
           Fix the error and retry the HTTP request.

Now, I know that I've used this same kind of call from this same database before, so I do a little more digging and try this:

select 
  apex_web_service.make_rest_request(
        p_url => 'https://oracle.com',
        p_http_method => 'GET'
        )
        from dual
        ;

ORA-29273: HTTP request failed
ORA-29024: Certificate validation failure
ORA-06512: at "SYS.UTL_HTTP", line 1258
ORA-06512: at "APEX_050100.WWV_FLOW_WEB_SERVICES", line 717
ORA-06512: at "APEX_050100.WWV_FLOW_WEB_SERVICES", line 760
ORA-06512: at "APEX_050100.WWV_FLOW_WEBSERVICES_API", line 236
ORA-06512: at line 1
29273. 00000 -  "HTTP request failed"
*Cause:    The UTL_HTTP package failed to execute the HTTP request.
*Action:   Use get_detailed_sqlerrm to check the detailed error message.
           Fix the error and retry the HTTP request.

It's a different error, but still and error! Now I'm getting concerned. I assume it has to do with using https because both errors reference SSL/TLS. I try a variety of http and https sites. Most work fine, but occasionally I get one of the two errors above. I think they are actually two different errors. The second can probably be fixed by adding some certs to the db wallet, but the first, the one I really care about, appears to need a database patch: Patch 24666032. I'm not sure that would fix it, and for a variety of reasons, I can't get the patch installed and tested in a timely fashion.

But...the scientists have work to do and I don't want to stand in the way of science. So, I developed a hack. Please don't judge me. Of course, the title of this post has already given it away. It's seems so obvious when you already know the answer, but it took some pondering to come up with it: the man in the middle. While I am the man in the middle, between the scientists and the data, I need another man in the middle to solve the problem. That MITM is an Apache Reverse Proxy.

As I mentioned, I know this database can use apex_web_service.make_rest_request, definitely with my own internal development web server. I set up a reverse proxy on our internal apache server to route the request to the NIH REST server.


  1. Added an entry to our internal DNS server
    [internal IP address]   ncbi.concept2completion.com
  2. Created a virtual host entry on our internal Apache web server and restarted
    <VirtualHost *:443>
       ServerName ncbi.concept2completion.com
     
       ProxyRequests on
       SSLProxyEngine on
       ProxyPass / https://eutils.ncbi.nlm.nih.gov/
       ProxyPassReverse / https://eutils.ncbi.nlm.nih.gov/
    </VirtualHost>
  3. Pointed the apex_web_service.make_rest_request to our internal server.

select 
  apex_web_service.make_rest_request(
        p_url => 'https://ncbi.concept2completion.com/entrez/eutils/efetch.fcgi?db=taxonomy&id=33208',
        p_http_method => 'GET'
        ) taxonomy_info
        from dual
        ;


<?xml version="1.0" ?>
<!DOCTYPE TaxaSet PUBLIC "-//NLM//DTD Taxon, 14th January 2002//EN" "https://www.ncbi.nlm.nih.gov/entrez/query/DTD/taxon.dtd">
<TaxaSet><Taxon>
    <TaxId>33208</TaxId>
    <ScientificName>Metazoa</ScientificName>
    <OtherNames>
        <GenbankCommonName>metazoans</GenbankCommonName>
        <BlastName>animals; animals</BlastName>
        <Synonym>Animalia</Synonym>
        <CommonName>multicellular animals</CommonName>
    </OtherNames>
    <ParentTaxId>33154</ParentTaxId>
    <Rank>kingdom</Rank>
    <Division>Invertebrates</Division>
    <GeneticCode>
        <GCId>1</GCId>
        <GCName>Standard</GCName>
    </GeneticCode>
    <MitoGeneticCode>
        <MGCId>1</MGCId>
        <MGCName>Standard</MGCName>
    </MitoGeneticCode>
    <Lineage>cellular organisms; Eukaryota; Opisthokonta</Lineage>
    <LineageEx>
        <Taxon>
            <TaxId>131567</TaxId>
            <ScientificName>cellular organisms</ScientificName>
            <Rank>no rank</Rank>
        </Taxon>
        <Taxon>
            <TaxId>2759</TaxId>
            <ScientificName>Eukaryota</ScientificName>
            <Rank>superkingdom</Rank>
        </Taxon>
        <Taxon>
            <TaxId>33154</TaxId>
            <ScientificName>Opisthokonta</ScientificName>
            <Rank>no rank</Rank>
        </Taxon>
    </LineageEx>
    <CreateDate>1995/02/27 09:24:00</CreateDate>
    <UpdateDate>2017/02/16 16:52:33</UpdateDate>
    <PubDate>1992/05/26 01:00:00</PubDate>
</Taxon>

</TaxaSet>


A hack? Yes, but done in the name of science.

Stay tuned for my next post on how to solve the parsing issue with the XML shown above. Also, we now have a pl/sql API for interacting with the NIH efetch service--it's extensive and provides access to the taxonomy lineage, synonyms, etc. Contact me if you are interested in the API.

Wednesday, January 04, 2017

ORDS Quirks - slashes and feeds

Returning to the stated purpose of this blog, "specific solutions to isolated problems," I've run into a few quirks with Oracle REST Data Services recently. The first is simply figuring out exactly what version of ORDS I am running. There are (at least) two components, the mid-tier java (ords.war) and the database metadata and packages (in ORDS_METADATA). Does anyone know the best way to get these? (Edit: See comments for answer to this.)

Next, I've run into quirks around exactly where to put slashes. In an earlier post (http://c2anton.blogspot.com/2016/06/super-quick-oracle-rest-service-with.html) I demonstrated how to use the APIs to create ORDS modules, templates and handlers. I tried to use the exact commands in a newer ORDS version and ran into issues when using bind variables. I kept getting THE "ORACLE REST DATA SERVICES" banner along with "404 not found" whenever I created a template and handler with a bind variable.



This used to work:

  1. begin
  2.   ORDS.DEFINE_MODULE(
       p_module_name    => 'antonModule',
       p_base_path      => '/antonmodule',
       p_items_per_page => 25,
       p_status         => 'PUBLISHED',
       p_comments       => NULL );

      commit;
    end;

    begin  
      ORDS.DEFINE_TEMPLATE(
       p_module_name => 'antonModule',
       p_pattern     => '/sqltest/{abc}/{def}',
       p_priority    => 0,
       p_etag_type   => 'HASH',
       p_etag_query  => NULL,
       p_comments    => NULL );
    
      commit;
    end;

But in the new version I have to put a trailing slash in define_module ( p_base_path => '/antonmodule/' ) and remove it from the beginning of p_pattern in define_template (  p_pattern => 'sqltest/{abc}/{def}'   ).

It took my more time that I wish to admit to figure that out.

*****************************

The next issue is with the feed handler. The feed should create a $ref of this format

https://mymachine/ords/myuser/mymodule/myfeed/{uid}

but it is creating this
https://mymachine/ords/myuser/mymodule/{uid}

Maybe I just need to upgrade (edit: I upgraded to latest ORDS, 3.0.9. I still have the same issues.), but yikes, these two issues have made for a long day today. Hopefully this helps someone spend a little less time on this than I did :)