|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.terrier.indexing.SimpleXMLCollection org.terrier.indexing.SimpleMedlineXMLCollection
public class SimpleMedlineXMLCollection
Initial implementation of a class that generates a Collection with Documents from a series of XML files in the Medline format. It process a limited number of documents in an XML file to avoid OutOfMemory problem in case the XML file is too large.
Properties:<ul>
Field Summary | |
---|---|
protected int |
currentFileDocCounter
The number of documents processed in the current XML file. |
java.lang.String |
docEndTag
The end tag of documents in the XML files. |
java.lang.String |
docTag
The tag of documents in the XML files. |
java.lang.String |
EOL
The end of line string. |
java.lang.String |
fileEndTag
The tag indicating the end of an XML file. |
java.lang.String |
fileTag
The tag indicating the start of an XML file. |
protected int |
NUMBER_OF_DOCS_IN_BUFFER
The number of documents to process per iteration. |
Fields inherited from class org.terrier.indexing.SimpleXMLCollection |
---|
bReformXML, dbFactory, dBuilder, DocIDBlacklist, DocIdIsAttribute, DocIdLocation, DocumentElements, Documents, DocumentTags, ELEMENT_ATTR_SEPARATOR, EOC, FilesToProcess, logger, TermElements, TermsInAttributes, thisDoc, xmlDoc |
Constructor Summary | |
---|---|
SimpleMedlineXMLCollection()
The default constructor. |
|
SimpleMedlineXMLCollection(java.lang.String CollectionSpecFilename,
java.lang.String BlacklistSpecFilename)
An alternative constructor. |
Method Summary | |
---|---|
protected boolean |
openNextFile()
Parse through up to a limited number of documents in the XML file. |
Methods inherited from class org.terrier.indexing.SimpleXMLCollection |
---|
close, endOfCollection, findDocumentElement, getDocument, hasNext, initialiseParser, initialiseTags, main, next, nextDocument, remove, reset |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected int currentFileDocCounter
public final java.lang.String docTag
public final java.lang.String docEndTag
public final java.lang.String fileTag
public final java.lang.String fileEndTag
public final java.lang.String EOL
protected final int NUMBER_OF_DOCS_IN_BUFFER
Constructor Detail |
---|
public SimpleMedlineXMLCollection()
public SimpleMedlineXMLCollection(java.lang.String CollectionSpecFilename, java.lang.String BlacklistSpecFilename)
CollectionSpecFilename
- The name of the file containing the location of XML files in the collection.BlacklistSpecFilename
- The name of the file containing the location of the blacklisted XML files
in the collection.Method Detail |
---|
protected boolean openNextFile()
openNextFile
in class SimpleXMLCollection
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |