public class CrawlStrategy
extends edu.uci.ics.crawler4j.crawler.WebCrawler
| Constructor and Description |
|---|
CrawlStrategy() |
| Modifier and Type | Method and Description |
|---|---|
void |
init() |
boolean |
shouldVisit(edu.uci.ics.crawler4j.crawler.Page page,
edu.uci.ics.crawler4j.url.WebURL url)
Check to see if the page is on the specified host
|
void |
visit(edu.uci.ics.crawler4j.crawler.Page page)
Get the page and make a Terrier document from it
|
getMyController, getMyId, getMyLocalData, getThread, handlePageStatusCode, handleUrlBeforeProcess, init, isNotWaitingForNewURLs, onBeforeExit, onContentFetchError, onPageBiggerThanMaxSize, onParseError, onStart, onUnexpectedStatusCode, run, setThreadpublic void init()
public boolean shouldVisit(edu.uci.ics.crawler4j.crawler.Page page,
edu.uci.ics.crawler4j.url.WebURL url)
shouldVisit in class edu.uci.ics.crawler4j.crawler.WebCrawlerpublic void visit(edu.uci.ics.crawler4j.crawler.Page page)
visit in class edu.uci.ics.crawler4j.crawler.WebCrawlerTerrier Information Retrieval Platform4.1. Copyright © 2004-2015, University of Glasgow