public class SimpleMedlineXMLCollection extends SimpleXMLCollection
Properties:
Modifier and Type | Field and Description |
---|---|
protected int |
currentFileDocCounter
The number of documents processed in the current XML file.
|
String |
docEndTag
The end tag of documents in the XML files.
|
String |
docTag
The tag of documents in the XML files.
|
String |
EOL
The end of line string.
|
String |
fileEndTag
The tag indicating the end of an XML file.
|
String |
fileTag
The tag indicating the start of an XML file.
|
protected int |
NUMBER_OF_DOCS_IN_BUFFER
The number of documents to process per iteration.
|
bReformXML, dbFactory, dBuilder, DocIDBlacklist, DocIdIsAttribute, DocIdLocation, DocumentElements, Documents, DocumentTags, ELEMENT_ATTR_SEPARATOR, EOC, FilesToProcess, logger, PropertiesInAttibutes, PropertyElements, TermElements, TermsInAttributes, thisDoc, xmlDoc
Constructor and Description |
---|
SimpleMedlineXMLCollection()
The default constructor.
|
SimpleMedlineXMLCollection(List<String> files,
String ignored1,
String BlacklistSpecFilename,
String ignored2)
Constructor required by TRECIndexing
|
SimpleMedlineXMLCollection(String CollectionSpecFilename,
String BlacklistSpecFilename)
An alternative constructor.
|
SimpleMedlineXMLCollection(String CollectionSpecFilename,
String ignored1,
String BlacklistSpecFilename,
String ignored2)
Constructor required by TRECIndexing
|
Modifier and Type | Method and Description |
---|---|
protected boolean |
openNextFile()
Parse through up to a limited number of documents in the XML file.
|
close, endOfCollection, findDocumentElement, getDocument, hasNext, initialiseParser, initialiseTags, loadBlacklist, main, next, nextDocument, remove, reset
protected int currentFileDocCounter
public final String docTag
public final String docEndTag
public final String fileTag
public final String fileEndTag
public final String EOL
protected final int NUMBER_OF_DOCS_IN_BUFFER
public SimpleMedlineXMLCollection()
public SimpleMedlineXMLCollection(String CollectionSpecFilename, String BlacklistSpecFilename)
CollectionSpecFilename
- The name of the file containing the location of XML files in the collection.BlacklistSpecFilename
- The name of the file containing the location of the blacklisted XML files
in the collection.public SimpleMedlineXMLCollection(String CollectionSpecFilename, String ignored1, String BlacklistSpecFilename, String ignored2)
protected boolean openNextFile()
openNextFile
in class SimpleXMLCollection
Terrier Information Retrieval Platform 5.1. Copyright © 2004-2019, University of Glasgow