Home » Server Options » Text & interMedia » Select all strings that appear a substrings of a given string (Oracle Database 10g Enterprise Edition Release 10.2.0.2.0 on Solaris)
icon5.gif  Select all strings that appear a substrings of a given string [message #402873] Tue, 12 May 2009 22:27 Go to next message
ShiningMasamune
Messages: 6
Registered: May 2009
Location: New Jersey
Junior Member

Greetings all, I'm a bit new to databases so please be gentle!

I have a table with a column of strings, and one big string given to me. I would like to select all strings in the column that are substrings of the given big string. Basically I want this:

SELECT str FROM table WHERE CONTAINS(?, str, 1) > 0


Where ? would be replaced with my given string. Except, this query gives me
Error: ORA-20000: Oracle Text error:
DRG-10599: column is not indexed

In an absolutely ideal world, I want only strings that are whole-word substrings. In other words, if my given string is "I sold shoes to make money" and the str column contains the strings "make money" and "old shoes" I would want only the former selected.

Can this be done? And more importantly, can it be done quickly considering my str column has ten million rows?

Thanks!

[Updated on: Tue, 12 May 2009 22:37]

Report message to a moderator

Re: Select all strings that appear a substrings of a given string [message #402878 is a reply to message #402873] Tue, 12 May 2009 23:15 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9077
Registered: November 2002
Location: California, USA
Senior Member
Welcome to the ORAFaq forums. In order to use the CONTAINS query operator, you have to first create a CONTEXT index, which is what your error message is telling you. However, that would be the syntax that you would use if your search string where "make money" and your column value was "I sold shoes to make money", not the other way around. You could probably just use the built-in INSTR function or LIKE for what you are trying to do. In the future, pleas post a complete test case, including create table and insert statements for sample data, so that it makes the problem as clear as possible, so that we can provide the most helpful response. Please see the forum guide at the top of each forum for what we expect, or use the link below. I see that you are a student, so please understand that you will learn more if we guide you in the right direction, by suggesting things like INSTR, then expecting you to research that in the online documentation and return with what you have tried, if you still have problems, rather than providing a complete solultion.

Forum Guide:

http://www.orafaq.com/forum/t/88153/0/
Re: Select all strings that appear a substrings of a given string [message #402880 is a reply to message #402878] Tue, 12 May 2009 23:34 Go to previous messageGo to next message
ShiningMasamune
Messages: 6
Registered: May 2009
Location: New Jersey
Junior Member

Thanks for the reply, Barbara.

I tried INSTR, but it has a two problems. Firstly, it doesn't have the whole-word only property that (in my understanding) CONTAINS does. Secondly, it seems impossible to avoid a full table scan every time, which is too slow for me to use. I'm looking for a faster method.

Apologies for not posting the full test case, I'll do so now:

CREATE TABLE tab (str VARCHAR(255))
INSERT INTO tab VALUES ('old shoes')
INSERT INTO tab VALUES ('make money')
INSERT INTO tab VALUES ('sold')
INSERT INTO tab VALUES ('unrelated')

For the input "I sold shoes to make money", the result set would be "make money" and "sold", as they match whole-word substrings.
Re: Select all strings that appear a substrings of a given string [message #402889 is a reply to message #402880] Wed, 13 May 2009 00:10 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9077
Registered: November 2002
Location: California, USA
Senior Member
I think a CTXRULE index and a MATCHES query operator might suit your needs. Please see the demo below. I added some data from user_objects, just to increase the row count and demonstrate index usage.

SCOTT@orcl_11g> CREATE TABLE tab (str VARCHAR(255))
  2  /

Table created.

SCOTT@orcl_11g> BEGIN
  2    INSERT INTO tab VALUES ('old shoes');
  3    INSERT INTO tab VALUES ('make money');
  4    INSERT INTO tab VALUES ('sold');
  5    INSERT INTO tab VALUES ('unrelated');
  6  END;
  7  /

PL/SQL procedure successfully completed.

SCOTT@orcl_11g> INSERT INTO tab SELECT object_name FROM user_objects
  2  /

773 rows created.

SCOTT@orcl_11g> CREATE INDEX idx ON tab (str) INDEXTYPE IS CTXSYS.CTXRULE
  2  /

Index created.

SCOTT@orcl_11g> EXEC DBMS_STATS.GATHER_TABLE_STATS (USER, 'TAB')

PL/SQL procedure successfully completed.

SCOTT@orcl_11g> SET AUTOTRACE ON
SCOTT@orcl_11g> SELECT * FROM tab
  2  WHERE  MATCHES (str, 'I sold shoes to make money') > 0
  3  /

STR
------------------------------------------------------------------------------------------------------------------------
make money
sold


Execution Plan
----------------------------------------------------------
Plan hash value: 1948011179

------------------------------------------------------------------------------------
| Id  | Operation                   | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |    39 |   780 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TAB  |    39 |   780 |     2   (0)| 00:00:01 |
|*  2 |   DOMAIN INDEX              | IDX  |       |       |     0   (0)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CTXSYS"."MATCHES"("STR",'I sold shoes to make money')>0)


Statistics
----------------------------------------------------------
        238  recursive calls
          0  db block gets
        313  consistent gets
          0  physical reads
          0  redo size
        469  bytes sent via SQL*Net to client
        416  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
          2  rows processed

SCOTT@orcl_11g> SET AUTOTRACE OFF

Re: Select all strings that appear a substrings of a given string [message #403126 is a reply to message #402889] Wed, 13 May 2009 19:17 Go to previous messageGo to next message
ShiningMasamune
Messages: 6
Registered: May 2009
Location: New Jersey
Junior Member

Well, I'm having trouble creating the CTXRULE index... Trying to do so apparently crashes the server! So I'll have to talk to the admin before I can test this, but from your demo, that looks like exactly what I need. Many thanks =)
Re: Select all strings that appear a substrings of a given string [message #403133 is a reply to message #403126] Wed, 13 May 2009 22:00 Go to previous message
Barbara Boehmer
Messages: 9077
Registered: November 2002
Location: California, USA
Senior Member
You might try a small test case first, like what I provided, to make sure that everything is set up properly on your system. Then see the following section for settings that affect index performance:

http://download.oracle.com/docs/cd/B19306_01/text.102/b14217/aoptim.htm#i1006756
Previous Topic: Help with a CONTEXT index
Next Topic: Oracle Text error
Goto Forum:
  


Current Time: Thu Mar 28 15:58:00 CDT 2024