[TR-220] SimpleXMLCollection raise null pointer exception if document contains doctype with same the name than xml.doctag Created: 14/Nov/12  Updated: 14/Nov/12  Resolved: 14/Nov/12

Status: Resolved
Project: Terrier Core
Component/s: .indexing
Affects Version/s: 3.5
Fix Version/s: 3.6

Type: Bug Priority: Major
Reporter: Nicolas Faessel Assignee: Craig Macdonald
Resolution: Fixed  
Labels: None

Attachments: File doctype.patch     XML File TestSimpleXMLCollection-patched.xml     XML File TestSimpleXMLCollection-terrier3.5.xml    

 Description   
See : http://terrier.org/forum//read.php?3,1669

NPE occurs when a document has a DOCTYPE placed before the root element, with the same name than the root element,
and the xml.doctag property is set with this name.

In SimpleXMLDocument, the method findDocumentElement(Node n) only checks the name of the node n :
if (DocumentElements.contains(n.getNodeName().toLowerCase()) {...}
and if true, tries to get all the attributes of n.
But if n is a doctype element, it doesn't have any attribute.

My workaround is to check if n is not a DOCUMENT_TYPE element (can be a DOCUMENT_NODE or an ELEMENT_NODE).

Regards,
Nicolas




 Comments   
Comment by Craig Macdonald [ 14/Nov/12 ]

Hi Nicolas,

Thanks for your report. I have tried, unsuccessfully, to reproduce this problem in the JUnit test for SimpleXMLCollection (TestSimpleXMLCollection).

	@Test public void testSingleTermSingleDocumentWithDocType() throws Exception
	{
		ApplicationSetup.setProperty("xml.doctag", "body");
		ApplicationSetup.setProperty("xml.terms", "body");
		SimpleXMLCollection c = getCollection("<?xml version=\"1.0\"?><!DOCTYPE html><body>test</body>");
		assertTrue(c.nextDocument());
		Document d = c.getDocument();
		assertNotNull(d);
		assertFalse(d.endOfDocument());
		String t = d.getNextTerm();
		assertEquals("test", t);
		assertTrue(d.endOfDocument());
		assertFalse(c.nextDocument());
		assertTrue(c.endOfCollection());
	}
	

Can you revise your patch with a test case that does identify the problem?

Craig

Comment by Nicolas Faessel [ 14/Nov/12 ]

To reproduce this problem, !DOCTYPE name must be the same than xml.doctag (in the previous test, you must use <!DOCTYPE body> instead of <!DOCTYPE html>).
The following test reproduce the problem :

@Test public void testSingleTermSingleDocumentWithDocType() throws Exception
	{
		ApplicationSetup.setProperty("xml.doctag", "test-doctype");
		ApplicationSetup.setProperty("xml.terms", "test-doctype");
		SimpleXMLCollection c = getCollection("<?xml version=\"1.0\"?><!DOCTYPE test-doctype><test-doctype>test</test-doctype>");
		assertTrue(c.nextDocument());
		Document d = c.getDocument();
		assertNotNull(d);
		assertFalse(d.endOfDocument());
		String t = d.getNextTerm();
		assertEquals("test", t);
		assertTrue(d.endOfDocument());
		assertFalse(c.nextDocument());
		assertTrue(c.endOfCollection());
	}
Comment by Nicolas Faessel [ 14/Nov/12 ]

JUnit tests results.

Comment by Craig Macdonald [ 14/Nov/12 ]

Perfect, I will give a look.

Thanks

Craig

Comment by Craig Macdonald [ 14/Nov/12 ]

Committed, r3677.

Thanks Nicolas!

Generated at Wed Dec 13 10:59:23 GMT 2017 using JIRA 7.1.1#71004-sha1:d6b2c0d9b7051e9fb5e4eb8ce177ca56d91d7bd8.