public class CrawlStrategy
extends edu.uci.ics.crawler4j.crawler.WebCrawler
Constructor and Description |
---|
CrawlStrategy() |
Modifier and Type | Method and Description |
---|---|
void |
init() |
boolean |
shouldVisit(edu.uci.ics.crawler4j.crawler.Page page,
edu.uci.ics.crawler4j.url.WebURL url)
Check to see if the page is on the specified host
|
void |
visit(edu.uci.ics.crawler4j.crawler.Page page)
Get the page and make a Terrier document from it
|
getMyController, getMyId, getMyLocalData, getThread, handlePageStatusCode, handleUrlBeforeProcess, init, isNotWaitingForNewURLs, onBeforeExit, onContentFetchError, onPageBiggerThanMaxSize, onParseError, onStart, onUnexpectedStatusCode, run, setThread
public void init()
public boolean shouldVisit(edu.uci.ics.crawler4j.crawler.Page page, edu.uci.ics.crawler4j.url.WebURL url)
shouldVisit
in class edu.uci.ics.crawler4j.crawler.WebCrawler
public void visit(edu.uci.ics.crawler4j.crawler.Page page)
visit
in class edu.uci.ics.crawler4j.crawler.WebCrawler
Terrier Information Retrieval Platform4.1. Copyright © 2004-2015, University of Glasgow