Oracle's Universal Server: The Context Server Option   By Steve Roti


Are you confused about exactly what Oracle's Universal Server is? This first part in our series of articles looking at the options available with the Universal Server starts by examining the ConText in-depth.

In the latter half of the 1990s, with relational database servers firmly established as the product of choice for storing large bodies of alphanumeric data, both users and vendors are looking for ways to extend their investments into new areas of database applications. The current hot topic is the "universal server," and as is often the case in the computer industry, the meaning of this term is evolving over time. The two most common conceptions of a universal server are as a relational database server with support for extended datatypes, and as an object-relational database server providing persistent storage for complex objects.
In this series of articles, I will discuss the various options available with Oracle's Universal Server, Enterprise and Workgroup editions, including the ConText Option, Spatial Data Option, and others as they become available. In this, the first part of this series, I look at the currently shipping version of Oracle7 Server Release 7.3 for Windows NT, and I focus on the ConText Option, which is in wide use now.

What is Oracle's Universal Server?

If you're looking for a full-featured object-relational server, you'll have to wait until Oracle8 ships later in 1997. Oracle7 Release 7.3, the first release in the Oracle Universal Server architecture, is still primarily a relational server with a variety of optional components that provide support for storing, retrieving, and handling complex datatypes such as text, spatial data, images, audio, and video.
The full contents of the Oracle7 Release 7.3 for Windows NT CD are shown in the accompanying sidebar "A Laundry List for Oracle's Universal Server" . The two options that are most relevant to the discussion of universal servers are the ConText Option and the Spatial Data Option. (Note that although these two options are included in the Release 7.3 CD, they are separately licensable products.) Additional relevant options that ship separately from Release 7.3 are the Oracle Messaging Option and the Oracle Video Option. Briefly, the Oracle Messaging Option provides a centralized repository for email, documents, and calendaring/scheduling information, as well as providing directory services. The Oracle Video Option supports simultaneous access to realtime video and other corporate information from within the same application.
In this article, I focus on the ConText Option because it is the most widely used Universal Server extended datatype. From a product architecture standpoint, it's important to understand that the ConText Option runs as a separate text-management server that provides data-manipulation services on textual data at the request of the Oracle7 Server. With this in mind, let's take a look at this option in more detail.

The ConText Option

The ConText Option integrates textual data with traditional structured datatypes in an Oracle7 database and enables text queries from most tools that use SQL or PL/SQL. Although text data is stored in the same database as structured data, text queries and data-manipulation requests are handled by a separate ConText Server that works in concert with the Oracle Server. Figure 1 shows the relationship between an Oracle Server process and a ConText Server process communicating through the system global area.


Figure 1
The ConText server process is separate from the Oracle Server process(es). When the user enters a text query, the Oracle Server puts an entry in the text request queue, the ConText server processes the queued request, and the results are returned to the user through the Oracle Server response.


The ConText Option supports both text queries and linguistic queries. Text queries are word and phrase searches performed on the full document text using the CONTAINS function, whereas linguistic queries return theme and/or Gist information, which is pregenerated and stored as structured data in linguistic output tables.
Before you can perform text queries and linguistic queries, the database columns used to store text must be defined as text columns, textual data must be entered or imported into the columns, and text indexes and linguistic output must be generated for the columns. For example, the following SQL statement creates a table with a text column:

CREATE TABLE books (
  title  VARCHAR2(30),
  text   LONG NOT NULL,
  PRIMARY KEY (title));

Oracle recommends using CHAR, VARCHAR2, LONG, or LONG RAW as the datatype for a text column. The example above uses LONG because it can hold up to 2GBs of data, whereas CHAR is limited to 255 bytes and VARCHAR2 is limited to 2000 bytes. After you create the table, explicitly identify the columns that you want to use to store text by creating what the ConText Option refers to as a "policy" for each column. A policy defines the storage method and preferences that the ConText Server uses to create a text index for the column. The default indexing preferences are:

 

For example, the following call to the CTX_DDL.CREATE_POLICY procedure creates a default policy named BOOKS_POLICY for the text column in the BOOKS table owned by the CTXSYS user:

EXECUTE ctx_ddl.create_policy(
'books_policy',
'ctxsys.books.text');

You can load text data into the table using SQL*Plus, SQL*Loader,ctxload (a batch loader supplied with the ConText Option), or a custom application. The ctxload utility works only with LONG and LONG RAW columns.

 

Text Queries

In order to perform text queries, you must first create a text index for each text column. The following call to the CTX_DDL.CREATE_INDEX procedure creates a text index using the preferences previously specified in the BOOKS_POLICY:

EXECUTE ctx_ddl.create_index('books_policy');

After creating text indexes , you can use the CONTAINS function in the WHERE clause of a SQL SELECT statement to find documents that contain target words. In Listing 1, the string "summer | fence" is a query expression that instructs the ConText Server to return documents that contain the words "summer" or "fence."
The SCORE function returns a value based on the number of occurrences of the words in the document (each occurrence has a value of 10). When the words are ORed together in a query expression, the maximum score of the two words is returned; when they are ANDed together, the minimum score is returned. In the previous example, Independence Day has a score of 10 because the word summer occurs once in the document, whereas The Sound and the Fury has a score of 50 because "fence" occurs five times (only the first occurrence is shown because the SET LONG 90 statement displays only the first 90 characters of the document).
In addition to AND and OR operators, you can use many additional operators and special characters in ConText query expressions. ACCUMULATE and MINUS are logical operators that work similarly to AND and OR. ACCUMULATE returns a score by adding the value of its operands, and MINUS returns the score of the right operand subtracted from the left.
The proximity operator NEAR calculates a score based on how close words are to one another rather than by how often they appear in the document. The score is 100 when the query terms are adjacent. For example, the following query returns documents that contain the words "small" and "town," with the highest score going to the document in which they appear closest together:

SELECT SCORE(0), title, text
FROM   books
WHERE  CONTAINS(text, 'small NEAR town') > 0
ORDER BY SCORE(0);

The expansion operators expand the number of query terms based on a single word or phrase supplied by the user. The STEM operator (represented in a query expression by the symbol "$") expands the list of words to include all words having the same stem or root word. For example, "$sing" expands into "sing," "sang", and "sung." The FUZZY operator (represented by the symbol "?") expands the list of words to include all words with similar spelling. For example, "?read" expands into "lead" and "real." The SOUNDEX operator (represented by the symbol "!") expands the list of words to include all words that sound the same. It uses the same logic as the SQL SOUNDEX function to search for words that have a similar sound. Listing 2 shows a FUZZY query on the word "door" returning text containing both "door" and "floor."
The control operators regulate the results returned by the query expression operand on the left based on the value of a qualifier operand on the right. The WEIGHT operator assigns a relative importance (0.1 to 10) to an operand. The THRESHOLD operator returns documents only if their score is greater than a specified threshold value. The MAX operator returns the specified number of highest-scoring documents. Listing 3 shows a query with the word "day" weighted twice as heavily as the word "year" and a threshold of 10. In other words, text with one occurrence of "day" will be returned, but two occurrences of "year" are required to pass the threshold.
Wildcard characters can be used in query expressions to expand word searches into pattern searches. The "_" and "%" characters work just as they do with the SQL LIKE operator, matching one and any number of characters, respectively.
Taken together, all of these text-query operators give you a wide range of flexibility in performing full-text searches of documents. However, it's worth noting that you can use the CONTAINS function only in the WHERE clause of a SELECT statement. It cannot be used in an UPDATE, INSERT, or DELETE statement.

 

Linguistic Queries

In order to perform linguistic queries on documents, you must create linguistic output tables and then generate the output for the text column. Oracle can generate two types of linguistic output: themes (document classifications) and Gists (text summaries). Figure 2 shows the relationship between the tables containing the document text and the linguistic output tables.


Figure2
You use the ConText Server to generate linguistic output from text documents and store it as structured data in output tables. Then you use the Oracle7 Server to query the tables directly, providing fast access to themes and Gists.

The ConText Server has a large knowledgebase of the English language that it uses to classify a document by its major themes. Themes present a profile of the main subjects or topics of a document. In essence, they provide a quick snapshot of what the document is about. Up to 16 themes are generated for each document, and each theme is assigned a relative weight that measures the strength of the theme relative to the other themes in the document.
A Gist is a summary of a document and consists of selected paragraphs that reflect the themes of the document. Whole paragraphs are used to create Gists in order to maintain continuity and provide context. There are two types of Gists: point-of-view Gists that provide a short summary for a single, specific document theme, and generic Gists that provide a summary reflecting all of the themes in the document.
The following statements show how to create the linguistic output tables for themes and Gists.

CREATE TABLE books_themes (
  cid     NUMBER,
  pk      VARCHAR2(64),
  theme   VARCHAR2(256),
  weight  NUMBER);
  
CREATE TABLE books_gist
  cid   NUMBER,
  pk    VARCHAR2(64),
  pov   VARCHAR2(256),
  gist  LONG);

Once you create the tables, you can populate them with linguistic output by calling the procedures CTX_LING.REQUEST_THEMES, CTX_LING.REQUEST_GIST, and CTX_LING.SUBMIT, as follows:

EXECUTE ctx_ling.request_themes(
  'books_policy',
  'Palace of Desire',
  'books_themes')
EXECUTE ctx_ling.request_gist(
  'books_policy',
  'Palace of Desire',
  'books_gist')
VARIABLE handle NUMBER
EXECUTE :handle := ctx_ling.submit
PRINT handle

The first two procedure calls set up the requests and the third call submits them to the queue for processing. Note that you must call the CTX_LING.REQUEST_THEMES and CTX_LING.REQUEST_GIST once for each document for which you want to generate linguistic output.
After you generate the linguistic output, you can run queries against the output tables to find documents based on themes or Gists. For example, to find all documents with a "happiness" theme, you can run the following query:

SELECT pk, weight
FROM   books_themes
WHERE  theme = 'happiness'
ORDER BY weight;

PK                     WEIGHT
---------------------  ------
Consider This, Senora  2028
Grand Opening          2869

Recall that the BOOKS_THEMES table contains only key, theme, and weight information. If you want to display the text of the document, you would need to join the BOOKS_THEMES table to the table containing the text column.

 

Incomplete Integration

Although the ConText Option is a valuable addition to the Oracle Universal Server, it is not integrated as well as it could be with the other server components. In particular, installation is complicated, requiring many manual steps to complete installation. Also, ConText Option Server management is different from Oracle7 Server management. For example, the Oracle7 Server is integrated as an NT service, and you can set it to start automatically in the Services control panel, but you must start the ConText Server manually with a separate ConText Server manager program. The incomplete integration is an annoyance, but it will likely be addressed in the upcoming Oracle8 Server.

 

Oracle8 and the Future of the Universal Server

Object technology will be an integral part of Oracle8, the next major release of the Oracle Universal Server. Oracle8, due to be released for general availability in mid-1997, is said to be a full object-relational database that extends Oracle Server's relational database capabilities to objects and provides further integration of complex datatypes such as text, spatial, video, Web, and multidimensional data. Oracle8 will integrate objects and extended datatypes with the traditional strengths of an RDBMS: parallelism, concurrency, scalability, and distributed processing.



Steve Roti is the owner of Olympic Software, a database consulting firm in Bend, Oregon. He is an active user of SQL DBMSs on Unix, Linux and Windows. You can email him at steveroti@hotmail.com.

 Listing 1

SQL> SET LONG 90
SQL> COLUMN text FORMAT A30 WORD_WRAP
SQL> SELECT SCORE(0), title, text
2 FROM   books
3 WHERE  CONTAINS(text, 'summer | fence') > 0
4 ORDER BY SCORE(0);

SCORE0  TITLE                    TEXT
------  -----------------------  ------------------------------
10      Independence Day         In Haddam, summer floats over
                                 tree-softened streets like a
                                 sweet lotion balm from a

50      The Sound and the Fury   Through the fence, between the
                                 curling flower spaces, I could
                                 see them hitting.

Listing 2 

SELECT title, text
FROM   books
WHERE  CONTAINS(text, '?door') > 0 ;

TITLE                TEXT
-------------------- ------------------------------
The Liars Club       My sharpest memory is of a
                     single instant surrounded by
                     dark. I was seven, and our
                     family doctor knelt before me
                     where I sat on a mattress on
                     the bare floor.

Palace of Desire     Al-Sayyid Ahmad Abd al-Jawad
                     closed the door behind him and
                     crossed the courtyard of his
                     house by the pale light of the
                     stars.

 Listing 3

SELECT title, text
FROM   books
WHERE  CONTAINS(text,'(day*2 | year*1) > 10')>0

TITLE                          TEXT
------------------------------ ------------------------------
Corelli's Mandolin             Dr Iannis had enjoyed a
                               satisfactory day in which none
                               of his patients had died or
                               got any worse. He had attended
                               a surprisingly easy calving,
                               lanced on abcess, extracted a
                               molar,

Grand Opening                  As they followed the
                               Mississippi out of the Twin
                               Cities on U.S. 61, Brendan
                               wondered why his parents and
                               his grandfather seemed not to
                               share his dread. Year after
                               year he had

Sidebar: A Laundry List for Oracle's Universal Server

Features of Oracle7 Universal Server Release 7.3 for Windows NT

Oracle7 Server Release 7.3.2.2.0
Oracle WebServer Release 1.0
Oracle7 Server Options included:
* Distributed Option
* Advanced Replication Option
* Parallel Query Option
Additional Licensable Oracle7 Server Options:
* ConText Option
* Spatial Data Option
* Parallel Server Option
* Advanced Networking Option
* Enterprise Manager Performance Pack
* Workflow Option
* Video Option
* Document Option
* TextServer3 Option
* Messaging Option
Oracle7 Utilities:
* Import/Export
* SQL*Loader
* Oracle Server Manager
SQL*Net Protocol Adapters:
* Oracle Named Pipes Adapter
* Oracle SPX Adapter
* Oracle TCP/IP Adapter
* Oracle DECnet Adapter
* Oracle DCE Adapter
SQL*Plus

Requirements
Server hardware:
32MB RAM recommended
65MB hard-disk space
20MB hard-disk space additional for replication support
60MB hard-disk space additional for ConText Option
CD-ROM drive

Client hardware:
8MB RAM
25MB hard-disk space
CD-ROM drive

Server software:
Microsoft Windows NT V3.51 or V4.0

Client software:
Microsoft Windows NT V3.51 or V4.0
Microsoft Windows 95
Microsoft Windows 3.1 or higher

ConText Option Documentation:
Oracle ConText Option QuickStart (26 pages)
Oracle ConText Option Administrator's Guide (232 pages)
Oracle ConText Option Application Developer's Guide (222 pages)
Oracle ConText Option Messages (48 pages)


Copyright 1997 Miller Freeman Inc. All Rights Reserved
Redistribution without Permission is prohibited