public class WARC10Collection extends WARC018Collection
currentDocumentBlobLength, desiredEncoding, DocProperties, documentClass, documentsInThisFile, eoc, eof, FileNumber, FilesToProcess, forceUTF8, is, logger, tokeniser, warc_crawldate_header, warc_docno_header, warc_url_header
Constructor and Description |
---|
WARC10Collection() |
WARC10Collection(InputStream input) |
WARC10Collection(String CollectionSpecFilename) |
Modifier and Type | Method and Description |
---|---|
boolean |
nextDocument()
Move the collection to the start of the next document.
|
protected void |
processRedirect(String source,
String target) |
close, endOfCollection, getDocid, getDocument, hasNext, loadDocumentClass, next, openNextFile, parseHeaders, readCollectionSpec, readLine, reset
public WARC10Collection()
public WARC10Collection(InputStream input)
public WARC10Collection(String CollectionSpecFilename)
public boolean nextDocument()
nextDocument
in interface Collection
nextDocument
in class WARC018Collection
Terrier 4.0. Copyright © 2004-2014 University of Glasgow