Often the cause is a malformed XML file, with a common problem being a blank line accidentally inserted before the required first line of the xml:
[ A blank line here will cause the error ] <?xml version-"1.0"?>While installing the search engine "Nutch," I got this error as shown:
ellensmac:- ellen$ /Users/ellen/Sites/apache-nutch-1.1/bin/nutch inject crawl/crawldb urls
[Fatal Error] nutch-site.xml:7:6: The processing instruction target matching "[xX][mM][lL]" is not allowed.
Exception in thread "main" java.lang.RuntimeException: org.xml.sax.SAXParseException: The processing instruction target matching "[xX][mM][lL]" is not allowed.
at org.apache.hadoop.conf.Configuration.loadResource (Configuration.java:1168 )
at org.apache.hadoop.conf.Configuration.loadResources (Configuration.java:1040 )
at org.apache.hadoop.conf.Configuration.getProps (Configuration.java:980 )
at org.apache.hadoop.conf.Configuration.set (Configuration.java:405)
at org.apache.hadoop.conf.Configuration.setBoolean (Configuration.java:585)
at org.apache.hadoop.util.GenericOptionsParser.processGeneralOptions (GenericOptionsParser.java:290)
at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions (GenericOptionsParser.java:375)
at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:153 )
at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:138 )
at org.apache.hadoop.util.ToolRunner.run (ToolRunner.java:59 )
at org.apache.nutch.crawl.injector.main (Injector.java:231 )
Caused by: org.xml.sax.SAXParseException: The processing instruction target matching" [xX ] [mM] [lL ]" is not allowed .
The first line of my file appeared fine, and I didn't immediately see a problem with the XML, so I turned to Firefox. If your XML file appears fine at first glance, a good way to spot any errors is to validate the file by viewing it in Firefox. Firefox will check the file and point out any errors.
Firefox showed that I had accidentally inserted extra text (highlighted in pink) at lines 6 and 7. Line 7 is the start of an extra XML declaration, which should only be at the start of the document. When the extra lines were removed, the command ran without error.
Posted by ellen at August 27, 2010 04:49 PM | TrackBack