Introduction To Information Retrieval

Authors: Christopher D. Manning, Prabhakar Raghavan, Hinrich Schutze

Chapter 10 - XML Retrieval

Page 178

  • Relational databases involve searching structured data
  • Information retrieval (IR) is the searching of unstructured “raw” text without markup, or tagging
  • Table 10.1 summarizes differences in searching structured data vs. unstructured data
    • XQUERY - Page 197 - good candidate to be the standard for structured queries
  • structured data can be represented as structured documents searched with structured retrieval - good for searching “digital libraries, patent databases, blogs, text with persons and entities tagged…. and files from office suites saving as marked up text”
    • Structured queries work well for questions that dont work well with unranked retrieval
      • Boolean queries return lots of results without ranking the most relevant first
      • users may not be aware of which elements are structured and can be used in queries (example: country:Vatican)
