Uploaded image for project: 'Terrier Core'
  1. Terrier Core
  2. TR-252

Update Apache POI versions to parse newer Word/Excel/Powerpoint files

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 3.0, 3.5
    • Fix Version/s: 3.6
    • Component/s: None
    • Labels:
      None

      Description

      We can't index .xlsx .docx .pptx etc documents, but Apache POI can.

      Moreover, our ppt indexing includes terms like "Click here to edit the title", even though this isn't visible in the presentation itself (it is coming from the slide master?).

      Finally, Apache POI seems to have delivered improved interfaces for extracting text from Microsoft Office files. Perhaps we can use their newer interfaces.



        Attachments

          Activity

            People

            • Assignee:
              craigm Craig Macdonald
              Reporter:
              craigm Craig Macdonald
            • Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: