public class TagSet extends Object
Modifier and Type | Field and Description |
---|---|
protected HashSet<String> |
blackList
The set of tags to skip.
|
protected int |
blackListSize
Size of whiteList hashset
|
protected String |
blackListTags
A comma separated list of tags to skip.
|
protected boolean |
caseSensitive
is this TagSet case sensitive.
|
protected String |
docTag
The tag that is used for denoting the beginning of a
document.
|
static String |
EMPTY_TAGS
A prefix for an empty set of tags, that is a set of tags
that are not defined in the properties file.
|
static String |
FIELD_TAGS
The prefix for the tags to consider as fields, during indexing.
|
protected String |
idTag
The tag that is used as a unique identifier.
|
static String |
TREC_DOC_TAGS
The prefix for the TREC document tags.
|
static String |
TREC_EXACT_DOC_TAGS
The prefix for the TREC document exact tags.
|
static String |
TREC_PROPERTY_TAGS
The prefix for the TREC property tags.
|
static String |
TREC_QUERY_TAGS
The prefix for the TREC topic tags.
|
protected HashSet<String> |
whiteList
The set of tags to process.
|
protected int |
whiteListSize
Size of whiteList hashset
|
protected String |
whiteListTags
A comma separated list of tags to process.
|
Constructor and Description |
---|
TagSet(String prefix)
Constructs the tag set for the given prefix,
by reading the corresponding properties from
the properties file.
|
Modifier and Type | Method and Description |
---|---|
String |
getDocTag()
Return the document delimiter tag.
|
String |
getIdTag()
Return the id tag.
|
String |
getTagsToProcess()
Returns a comma separated list of tags to process
|
String |
getTagsToSkip()
Returns a comma separated list of tags to skip
|
boolean |
hasWhitelist()
Returns true if whiteListSize > 0.
|
boolean |
isCaseSensitive()
Returns true if this tag set has been specified as case-sensitive
|
boolean |
isDocTag(String tag)
Checks whether the given tag indicates
the limits of a document.
|
boolean |
isIdTag(String tag)
Checks whether the given tag is a
unique identifier tag, that is the document
number of a document, of the identifier of a
topic.
|
boolean |
isTagToProcess(String tag)
Checks whether the tag should be processed.
|
boolean |
isTagToSkip(String tag)
Checks whether a tag should be skipped.
|
public static final String EMPTY_TAGS
public static final String TREC_DOC_TAGS
public static final String TREC_EXACT_DOC_TAGS
public static final String TREC_QUERY_TAGS
public static final String TREC_PROPERTY_TAGS
public static final String FIELD_TAGS
protected final int whiteListSize
protected final String whiteListTags
protected final int blackListSize
protected final String blackListTags
protected final String idTag
protected final String docTag
protected final boolean caseSensitive
public TagSet(String prefix)
prefix
- the common prefix of the properties to read.public boolean hasWhitelist()
public boolean isTagToProcess(String tag)
tag
- String the tag to check.public boolean isTagToSkip(String tag)
tag
- the tag to check.public boolean isIdTag(String tag)
tag
- String the tag to check.public boolean isDocTag(String tag)
tag
- String the tag to check.public boolean isCaseSensitive()
public String getTagsToProcess()
public String getTagsToSkip()
public String getIdTag()
public String getDocTag()
Terrier Information Retrieval Platform 5.1. Copyright © 2004-2019, University of Glasgow