public class WARC10Collection extends WARC018Collection
currentDocumentBlobLength, warc_crawldate_header, warc_docno_header, warc_url_headercurrentFilename, desiredEncoding, DocProperties, documentClass, documentsInThisFile, eoc, eof, FileNumber, FilesToProcess, forceUTF8, is, logger, SkipFile, tokeniser| Constructor and Description | 
|---|
| WARC10Collection() | 
| WARC10Collection(InputStream input) | 
| WARC10Collection(String CollectionSpecFilename) | 
| Modifier and Type | Method and Description | 
|---|---|
| boolean | nextDocument()Move the collection to the start of the next document. | 
| protected void | processRedirect(String source,
               String target) | 
getDocid, getDocument, parseHeaders, readLineclose, endOfCollection, extractCharset, hasNext, loadDocumentClass, next, openNewFile, openNextFile, readCollectionSpec, resetclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitendOfCollection, resetpublic WARC10Collection()
public WARC10Collection(InputStream input)
public WARC10Collection(String CollectionSpecFilename)
public boolean nextDocument()
nextDocument in interface CollectionnextDocument in class WARC018CollectionTerrier Information Retrieval Platform4.1. Copyright © 2004-2015, University of Glasgow