|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.terrier.indexing.SimpleXMLCollection
org.terrier.indexing.SimpleMedlineXMLCollection
public class SimpleMedlineXMLCollection
Initial implementation of a class that generates a Collection with Documents from a series of XML files in the Medline format. It process a limited number of documents in an XML file to avoid OutOfMemory problem in case the XML file is too large.
Properties:<ul>
| Field Summary | |
|---|---|
protected int |
currentFileDocCounter
The number of documents processed in the current XML file. |
java.lang.String |
docEndTag
The end tag of documents in the XML files. |
java.lang.String |
docTag
The tag of documents in the XML files. |
java.lang.String |
EOL
The end of line string. |
java.lang.String |
fileEndTag
The tag indicating the end of an XML file. |
java.lang.String |
fileTag
The tag indicating the start of an XML file. |
protected int |
NUMBER_OF_DOCS_IN_BUFFER
The number of documents to process per iteration. |
| Fields inherited from class org.terrier.indexing.SimpleXMLCollection |
|---|
bReformXML, dbFactory, dBuilder, DocIDBlacklist, DocIdIsAttribute, DocIdLocation, DocumentElements, Documents, DocumentTags, ELEMENT_ATTR_SEPARATOR, EOC, FilesToProcess, logger, TermElements, TermsInAttributes, thisDoc, xmlDoc |
| Constructor Summary | |
|---|---|
SimpleMedlineXMLCollection()
The default constructor. |
|
SimpleMedlineXMLCollection(java.lang.String CollectionSpecFilename,
java.lang.String BlacklistSpecFilename)
An alternative constructor. |
|
| Method Summary | |
|---|---|
protected boolean |
openNextFile()
Parse through up to a limited number of documents in the XML file. |
| Methods inherited from class org.terrier.indexing.SimpleXMLCollection |
|---|
close, endOfCollection, findDocumentElement, getDocument, hasNext, initialiseParser, initialiseTags, main, next, nextDocument, remove, reset |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
protected int currentFileDocCounter
public final java.lang.String docTag
public final java.lang.String docEndTag
public final java.lang.String fileTag
public final java.lang.String fileEndTag
public final java.lang.String EOL
protected final int NUMBER_OF_DOCS_IN_BUFFER
| Constructor Detail |
|---|
public SimpleMedlineXMLCollection()
public SimpleMedlineXMLCollection(java.lang.String CollectionSpecFilename,
java.lang.String BlacklistSpecFilename)
CollectionSpecFilename - The name of the file containing the location of XML files in the collection.BlacklistSpecFilename - The name of the file containing the location of the blacklisted XML files
in the collection.| Method Detail |
|---|
protected boolean openNextFile()
openNextFile in class SimpleXMLCollection
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||