public class TagSet extends Object
Modifier and Type | Field and Description |
---|---|
protected HashSet<String> |
blackList
The set of tags to skip.
|
protected String |
blackListTags
A comma separated list of tags to skip.
|
protected boolean |
caseSensitive
is this TagSet case sensitive.
|
protected String |
docTag
The tag that is used for denoting the beginning of a
document.
|
static String |
EMPTY_TAGS
A prefix for an empty set of tags, that is a set of tags
that are not defined in the properties file.
|
static String |
FIELD_TAGS
The prefix for the tags to consider as fields, during indexing.
|
protected String |
idTag
The tag that is used as a unique identifier.
|
static String |
TREC_DOC_TAGS
The prefix for the TREC document tags.
|
static String |
TREC_EXACT_DOC_TAGS
The prefix for the TREC document exact tags.
|
static String |
TREC_PROPERTY_TAGS
The prefix for the TREC property tags.
|
static String |
TREC_QUERY_TAGS
The prefix for the TREC topic tags.
|
protected HashSet<String> |
whiteList
The set of tags to process.
|
protected int |
whiteListSize
Size of whiteList hashset
|
protected String |
whiteListTags
A comma separated list of tags to process.
|
Constructor and Description |
---|
TagSet(String prefix)
Constructs the tag set for the given prefix,
by reading the corresponding properties from
the properties file.
|
Modifier and Type | Method and Description |
---|---|
String |
getDocTag()
Return the document delimiter tag.
|
String |
getIdTag()
Return the id tag.
|
String |
getTagsToProcess()
Returns a comma separated list of tags to process
|
String |
getTagsToSkip()
Returns a comma separated list of tags to skip
|
boolean |
hasWhitelist()
Returns true if whiteListSize > 0.
|
boolean |
isCaseSensitive()
Returns true if this tag set has been specified as case-sensitive
|
boolean |
isDocTag(String tag)
Checks whether the given tag indicates
the limits of a document.
|
boolean |
isIdTag(String tag)
Checks whether the given tag is a
unique identifier tag, that is the document
number of a document, of the identifier of a
topic.
|
boolean |
isTagToProcess(String tag)
Checks whether the tag should be processed.
|
boolean |
isTagToSkip(String tag)
Checks whether a tag should be skipped.
|
public static final String EMPTY_TAGS
public static final String TREC_DOC_TAGS
public static final String TREC_EXACT_DOC_TAGS
public static final String TREC_QUERY_TAGS
public static final String TREC_PROPERTY_TAGS
public static final String FIELD_TAGS
protected final int whiteListSize
protected String whiteListTags
protected String blackListTags
protected String idTag
protected String docTag
protected boolean caseSensitive
public TagSet(String prefix)
prefix
- the common prefix of the properties to read.public boolean hasWhitelist()
public boolean isTagToProcess(String tag)
tag
- String the tag to check.public boolean isTagToSkip(String tag)
tag
- the tag to check.public boolean isIdTag(String tag)
tag
- String the tag to check.public boolean isDocTag(String tag)
tag
- String the tag to check.public boolean isCaseSensitive()
public String getTagsToProcess()
public String getTagsToSkip()
public String getIdTag()
public String getDocTag()
Terrier Information Retrieval Platform4.1. Copyright © 2004-2015, University of Glasgow