

Are you confused about exactly what Oracle's Universal Server is? This first part in our series of articles looking at the options available with the Universal Server starts by examining the ConText in-depth.
In the latter half of the 1990s, with relational database servers firmly
established as the product of choice for storing large bodies of alphanumeric
data, both users and vendors are looking for ways to extend their investments
into new areas of database applications. The current hot topic is the "universal
server," and as is often the case in the computer industry, the meaning
of this term is evolving over time. The two most common conceptions of a
universal server are as a relational database server with support for extended
datatypes, and as an object-relational database server providing persistent
storage for complex objects.
In this series of articles, I will discuss the various options available
with Oracle's Universal Server, Enterprise and Workgroup editions, including
the ConText Option, Spatial Data Option, and others as they become available.
In this, the first part of this series, I look at the currently shipping
version of Oracle7 Server Release 7.3 for Windows NT, and I focus on the
ConText Option, which is in wide use now.
If you're looking for a full-featured object-relational server, you'll
have to wait until Oracle8 ships later in 1997. Oracle7 Release 7.3, the
first release in the Oracle Universal Server architecture, is still primarily
a relational server with a variety of optional components that provide support
for storing, retrieving, and handling complex datatypes such as text, spatial
data, images, audio, and video.
The full contents of the Oracle7 Release 7.3 for Windows NT CD are shown
in the accompanying sidebar "A Laundry List
for Oracle's Universal Server" . The two options that are most
relevant to the discussion of universal servers are the ConText Option and
the Spatial Data Option. (Note that although these two options are included
in the Release 7.3 CD, they are separately licensable products.) Additional
relevant options that ship separately from Release 7.3 are the Oracle Messaging
Option and the Oracle Video Option. Briefly, the Oracle Messaging Option
provides a centralized repository for email, documents, and calendaring/scheduling
information, as well as providing directory services. The Oracle Video Option
supports simultaneous access to realtime video and other corporate information
from within the same application.
In this article, I focus on the ConText Option because it is the most widely
used Universal Server extended datatype. From a product architecture standpoint,
it's important to understand that the ConText Option runs as a separate
text-management server that provides data-manipulation services on textual
data at the request of the Oracle7 Server. With this in mind, let's take
a look at this option in more detail.
The ConText Option integrates textual data with traditional structured
datatypes in an Oracle7 database and enables text queries from most tools
that use SQL or PL/SQL. Although text data is stored in the same database
as structured data, text queries and data-manipulation requests are handled
by a separate ConText Server that works in concert with the Oracle Server.
Figure 1 shows the relationship between an Oracle Server process and a ConText
Server process communicating through the system global area.

Figure 1
The ConText server process is separate from the Oracle Server process(es).
When the user enters a text query, the Oracle Server puts an entry in the
text request queue, the ConText server processes the queued request, and
the results are returned to the user through the Oracle Server response.
The ConText Option supports both text queries and linguistic queries. Text
queries are word and phrase searches performed on the full document text
using the CONTAINS function, whereas linguistic queries return
theme and/or Gist information, which is pregenerated and stored as structured
data in linguistic output tables.
Before you can perform text queries and linguistic queries, the database
columns used to store text must be defined as text columns, textual data
must be entered or imported into the columns, and text indexes and linguistic
output must be generated for the columns. For example, the following SQL
statement creates a table with a text column:
CREATE TABLE books ( title VARCHAR2(30), text LONG NOT NULL, PRIMARY KEY (title));
Oracle recommends using CHAR, VARCHAR2, LONG, or LONG
RAW as the datatype for a text column. The example above uses LONG
because it can hold up to 2GBs of data, whereas CHAR is limited
to 255 bytes and VARCHAR2 is limited to 2000 bytes. After you
create the table, explicitly identify the columns that you want to use to
store text by creating what the ConText Option refers to as a "policy"
for each column. A policy defines the storage method and preferences that
the ConText Server uses to create a text index for the column. The default
indexing preferences are:
For example, the following call to the CTX_DDL.CREATE_POLICY
procedure creates a default policy named BOOKS_POLICY for the
text column in the BOOKS table owned by the CTXSYS
user:
EXECUTE ctx_ddl.create_policy(
'books_policy',
'ctxsys.books.text');
You can load text data into the table using SQL*Plus, SQL*Loader,ctxload
(a batch loader supplied with the ConText Option), or a custom application.
The ctxload utility works only with LONG and LONG
RAW columns.
In order to perform text queries, you must first create a text index
for each text column. The following call to the CTX_DDL.CREATE_INDEX
procedure creates a text index using the preferences previously specified
in the BOOKS_POLICY:
EXECUTE ctx_ddl.create_index('books_policy');
After creating text indexes , you can use the CONTAINS function
in the WHERE clause of a SQL SELECT statement
to find documents that contain target words. In Listing 1, the string "summer | fence" is a query expression
that instructs the ConText Server to return documents that contain the words
"summer" or "fence."
The SCORE function returns a value based on the number of occurrences
of the words in the document (each occurrence has a value of 10). When the
words are ORed together in a query expression, the maximum
score of the two words is returned; when they are ANDed together,
the minimum score is returned. In the previous example, Independence Day
has a score of 10 because the word summer occurs once in the document, whereas
The Sound and the Fury has a score of 50 because "fence" occurs
five times (only the first occurrence is shown because the SET LONG
90 statement displays only the first 90 characters of the document).
In addition to AND and OR operators, you can use
many additional operators and special characters in ConText query expressions.
ACCUMULATE and MINUS are logical operators that
work similarly to AND and OR. ACCUMULATE returns
a score by adding the value of its operands, and MINUS returns
the score of the right operand subtracted from the left.
The proximity operator NEAR calculates a score based on how
close words are to one another rather than by how often they appear in the
document. The score is 100 when the query terms are adjacent. For example,
the following query returns documents that contain the words "small"
and "town," with the highest score going to the document in which
they appear closest together:
SELECT SCORE(0), title, text FROM books WHERE CONTAINS(text, 'small NEAR town') > 0 ORDER BY SCORE(0);
The expansion operators expand the number of query terms based on a single
word or phrase supplied by the user. The STEM operator (represented
in a query expression by the symbol "$") expands the list of words
to include all words having the same stem or root word. For example, "$sing"
expands into "sing," "sang", and "sung." The
FUZZY operator (represented by the symbol "?") expands
the list of words to include all words with similar spelling. For example,
"?read" expands into "lead" and "real." The
SOUNDEX operator (represented by the symbol "!")
expands the list of words to include all words that sound the same. It uses
the same logic as the SQL SOUNDEX function to search for words
that have a similar sound. Listing 2 shows
a FUZZY query on the word "door" returning text containing
both "door" and "floor."
The control operators regulate the results returned by the query expression
operand on the left based on the value of a qualifier operand on the right.
The WEIGHT operator assigns a relative importance (0.1 to 10)
to an operand. The THRESHOLD operator returns documents only
if their score is greater than a specified threshold value. The MAX
operator returns the specified number of highest-scoring documents. Listing 3 shows a query with the word
"day" weighted twice as heavily as the word "year" and
a threshold of 10. In other words, text with one occurrence of "day"
will be returned, but two occurrences of "year" are required to
pass the threshold.
Wildcard characters can be used in query expressions to expand word searches
into pattern searches. The "_" and "%" characters work
just as they do with the SQL LIKE operator, matching one and
any number of characters, respectively.
Taken together, all of these text-query operators give you a wide range
of flexibility in performing full-text searches of documents. However, it's
worth noting that you can use the CONTAINS function only in
the WHERE clause of a SELECT statement. It cannot
be used in an UPDATE, INSERT, or DELETE
statement.
In order to perform linguistic queries on documents, you must create
linguistic output tables and then generate the output for the text column.
Oracle can generate two types of linguistic output: themes (document classifications)
and Gists (text summaries). Figure 2 shows the relationship between the
tables containing the document text and the linguistic output tables.

Figure2
You use the ConText Server to generate linguistic output from text documents
and store it as structured data in output tables. Then you use the Oracle7
Server to query the tables directly, providing fast access to themes and
Gists.
The ConText Server has a large knowledgebase of the English language
that it uses to classify a document by its major themes. Themes present
a profile of the main subjects or topics of a document. In essence, they
provide a quick snapshot of what the document is about. Up to 16 themes
are generated for each document, and each theme is assigned a relative weight
that measures the strength of the theme relative to the other themes in
the document.
A Gist is a summary of a document and consists of selected paragraphs that
reflect the themes of the document. Whole paragraphs are used to create
Gists in order to maintain continuity and provide context. There are two
types of Gists: point-of-view Gists that provide a short summary for a single,
specific document theme, and generic Gists that provide a summary reflecting
all of the themes in the document.
The following statements show how to create the linguistic output tables
for themes and Gists.
CREATE TABLE books_themes ( cid NUMBER, pk VARCHAR2(64), theme VARCHAR2(256), weight NUMBER); CREATE TABLE books_gist cid NUMBER, pk VARCHAR2(64), pov VARCHAR2(256), gist LONG);
Once you create the tables, you can populate them with linguistic output
by calling the procedures CTX_LING.REQUEST_THEMES, CTX_LING.REQUEST_GIST,
and CTX_LING.SUBMIT, as follows:
EXECUTE ctx_ling.request_themes( 'books_policy', 'Palace of Desire', 'books_themes') EXECUTE ctx_ling.request_gist( 'books_policy', 'Palace of Desire', 'books_gist') VARIABLE handle NUMBER EXECUTE :handle := ctx_ling.submit PRINT handle
The first two procedure calls set up the requests and the third call
submits them to the queue for processing. Note that you must call the CTX_LING.REQUEST_THEMES
and CTX_LING.REQUEST_GIST once for each document for which
you want to generate linguistic output.
After you generate the linguistic output, you can run queries against the
output tables to find documents based on themes or Gists. For example, to
find all documents with a "happiness" theme, you can run the following
query:
SELECT pk, weight FROM books_themes WHERE theme = 'happiness' ORDER BY weight; PK WEIGHT --------------------- ------ Consider This, Senora 2028 Grand Opening 2869
Recall that the BOOKS_THEMES table contains only key, theme,
and weight information. If you want to display the text of the document,
you would need to join the BOOKS_THEMES table to the table
containing the text column.
Although the ConText Option is a valuable addition to the Oracle Universal Server, it is not integrated as well as it could be with the other server components. In particular, installation is complicated, requiring many manual steps to complete installation. Also, ConText Option Server management is different from Oracle7 Server management. For example, the Oracle7 Server is integrated as an NT service, and you can set it to start automatically in the Services control panel, but you must start the ConText Server manually with a separate ConText Server manager program. The incomplete integration is an annoyance, but it will likely be addressed in the upcoming Oracle8 Server.
Object technology will be an integral part of Oracle8, the next major release of the Oracle Universal Server. Oracle8, due to be released for general availability in mid-1997, is said to be a full object-relational database that extends Oracle Server's relational database capabilities to objects and provides further integration of complex datatypes such as text, spatial, video, Web, and multidimensional data. Oracle8 will integrate objects and extended datatypes with the traditional strengths of an RDBMS: parallelism, concurrency, scalability, and distributed processing.
| Listing 1 |
SQL> SET LONG 90
SQL> COLUMN text FORMAT A30 WORD_WRAP
SQL> SELECT SCORE(0), title, text
2 FROM books
3 WHERE CONTAINS(text, 'summer | fence') > 0
4 ORDER BY SCORE(0);
SCORE0 TITLE TEXT
------ ----------------------- ------------------------------
10 Independence Day In Haddam, summer floats over
tree-softened streets like a
sweet lotion balm from a
50 The Sound and the Fury Through the fence, between the
curling flower spaces, I could
see them hitting.
|
| Listing 2 |
SELECT title, text
FROM books
WHERE CONTAINS(text, '?door') > 0 ;
TITLE TEXT
-------------------- ------------------------------
The Liars Club My sharpest memory is of a
single instant surrounded by
dark. I was seven, and our
family doctor knelt before me
where I sat on a mattress on
the bare floor.
Palace of Desire Al-Sayyid Ahmad Abd al-Jawad
closed the door behind him and
crossed the courtyard of his
house by the pale light of the
stars.
|
| Listing 3 |
SELECT title, text
FROM books
WHERE CONTAINS(text,'(day*2 | year*1) > 10')>0
TITLE TEXT
------------------------------ ------------------------------
Corelli's Mandolin Dr Iannis had enjoyed a
satisfactory day in which none
of his patients had died or
got any worse. He had attended
a surprisingly easy calving,
lanced on abcess, extracted a
molar,
Grand Opening As they followed the
Mississippi out of the Twin
Cities on U.S. 61, Brendan
wondered why his parents and
his grandfather seemed not to
share his dread. Year after
year he had
|
Sidebar: A Laundry List for Oracle's Universal Server
Features of Oracle7 Universal Server Release 7.3 for Windows NT
Oracle7 Server Release 7.3.2.2.0
Oracle WebServer Release 1.0
Oracle7 Server Options included:
* Distributed Option
* Advanced Replication Option
* Parallel Query Option
Additional Licensable Oracle7 Server Options:
* ConText Option
* Spatial Data Option
* Parallel Server Option
* Advanced Networking Option
* Enterprise Manager Performance Pack
* Workflow Option
* Video Option
* Document Option
* TextServer3 Option
* Messaging Option
Oracle7 Utilities:
* Import/Export
* SQL*Loader
* Oracle Server Manager
SQL*Net Protocol Adapters:
* Oracle Named Pipes Adapter
* Oracle SPX Adapter
* Oracle TCP/IP Adapter
* Oracle DECnet Adapter
* Oracle DCE Adapter
SQL*Plus
Requirements
Server hardware:
32MB RAM recommended
65MB hard-disk space
20MB hard-disk space additional for replication support
60MB hard-disk space additional for ConText Option
CD-ROM drive
Client hardware:
8MB RAM
25MB hard-disk space
CD-ROM drive
Server software:
Microsoft Windows NT V3.51 or V4.0
Client software:
Microsoft Windows NT V3.51 or V4.0
Microsoft Windows 95
Microsoft Windows 3.1 or higher
ConText Option Documentation:
Oracle ConText Option QuickStart (26 pages)
Oracle ConText Option Administrator's Guide (232 pages)
Oracle ConText Option Application Developer's Guide (222 pages)
Oracle ConText Option Messages (48 pages)
Copyright 1997 Miller Freeman Inc. All Rights Reserved
Redistribution without Permission is prohibited