Class FlatJSONDocument

  • All Implemented Interfaces:
    Document

    public class FlatJSONDocument
    extends java.lang.Object
    implements Document
    This is a Terrier Document implementation of a document stored in JSON format. It assumes that a single JSON document has at least a single attribute called 'text' that contains the text of the document. Fields: This implementation supports a single field named 'TEXT' by default. FieldTags.process is a comma delimited list of properties to use as fields. Meta-Data: During the parsing process, the properties of each FlatJSONDocument is decorated with document meta-data. This decoration process is performed by 'flattening' the layered structure of the JSON object and its sub-attributes into individual properties. For property naming, attributes in different layers are connected with a dot '.', e.g. user.name
    Since:
    5.1
    Author:
    Richard McCreadie and Saul Vargas
    • Constructor Summary

      Constructors 
      Constructor Description
      FlatJSONDocument​(com.google.gson.JsonObject json)  
      FlatJSONDocument​(java.lang.String rawJson)  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      boolean endOfDocument()
      Returns true when the end of the document has been reached, and there are no other terms to be retrieved from it.
      java.util.Map<java.lang.String,​java.lang.String> getAllProperties()
      Returns the underlying map of all the properties defined by this Document.
      java.util.Set<java.lang.String> getFields()
      Returns a list of the fields the current term appears in.
      java.lang.String getNextTerm()
      Gets the next term of the document.
      java.lang.String getProperty​(java.lang.String name)
      Allows access to a named property of the Document.
      java.io.Reader getReader()
      Returns a Reader object so client code can tokenise the document or deal with the document itself.
      protected void initalize​(java.lang.String rawJson)  
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • properties

        protected java.util.Map<java.lang.String,​java.lang.String> properties
      • tokens

        public java.lang.String[][] tokens
      • fieldQueue

        protected java.util.List<java.lang.String> fieldQueue
      • fieldsToProcess

        protected java.lang.String[] fieldsToProcess
      • fieldIndex

        protected int fieldIndex
      • tokenIndex

        protected int tokenIndex
      • remainingTokens

        protected int remainingTokens
    • Constructor Detail

      • FlatJSONDocument

        public FlatJSONDocument​(com.google.gson.JsonObject json)
      • FlatJSONDocument

        public FlatJSONDocument​(java.lang.String rawJson)
                         throws com.fasterxml.jackson.core.JsonParseException,
                                com.fasterxml.jackson.databind.JsonMappingException,
                                java.io.IOException
        Throws:
        com.fasterxml.jackson.core.JsonParseException
        com.fasterxml.jackson.databind.JsonMappingException
        java.io.IOException
    • Method Detail

      • initalize

        protected void initalize​(java.lang.String rawJson)
      • endOfDocument

        public boolean endOfDocument()
        Description copied from interface: Document
        Returns true when the end of the document has been reached, and there are no other terms to be retrieved from it.
        Specified by:
        endOfDocument in interface Document
        Returns:
        boolean true if there are no more terms in the document, otherwise it returns false.
      • getAllProperties

        public java.util.Map<java.lang.String,​java.lang.String> getAllProperties()
        Description copied from interface: Document
        Returns the underlying map of all the properties defined by this Document.
        Specified by:
        getAllProperties in interface Document
      • getFields

        public java.util.Set<java.lang.String> getFields()
        Description copied from interface: Document
        Returns a list of the fields the current term appears in.
        Specified by:
        getFields in interface Document
        Returns:
        HashSet a set of the terms that the current term appears in.
      • getNextTerm

        public java.lang.String getNextTerm()
        Description copied from interface: Document
        Gets the next term of the document. NB:Null string returned from getNextTerm() should be ignored. They do not signify the lack of any more terms. endOfDocument() should be used to check that.
        Specified by:
        getNextTerm in interface Document
        Returns:
        String the next term of the document. Null returns should be ignored.
      • getProperty

        public java.lang.String getProperty​(java.lang.String name)
        Description copied from interface: Document
        Allows access to a named property of the Document. Examples might be URL, filename etc.
        Specified by:
        getProperty in interface Document
        Parameters:
        name - Name of the property. It is suggested, but not required that this name should not be case insensitive.
      • getReader

        public java.io.Reader getReader()
        Description copied from interface: Document
        Returns a Reader object so client code can tokenise the document or deal with the document itself. Examples might be extracting URLs, language detection.
        Specified by:
        getReader in interface Document