Delimited list as parameter: what are the options?

0. Introduction

Earlier or later, every sql developer will be confronted with queries like this:

select * from tbl_employees 
                      where ssn in ('123-456-789','777-777-777');

Open in new window

But with the issue that the list should be a parameter at some point (either in the sql procedure itself, or the programming language that calls the sql). And to take the MS SQL Server syntax as example, the following does not return an error, but simply does not work:

-- MS SQL Server
                      CREATE PROCEDURE get_employees_from_ssn_list @ssn_list VARCHAR(2000) 
                      AS SELECT * FROM tbl_employees WHERE ssn IN (@ssn_list)

Open in new window

It will return the one matching employee record if you pass 1 single ssn value to the procedure.
As soon as you pass several items in the list, you will get:
0 records returned.
The explanation is this: for the SQL engine, @ssn_list is 1 single (string/varchar) value, while for the human reader, there is indeed clearly the list. We will see in the next steps the different solutions for this.
Note: in all the examples, we will consider the procedure is called with the list with comma as separator, no spaces, no (additional) quotes inside the list:

exec get_employees_from_ssn_list '123-456-789,777-777-777'

Open in new window

1. Quick and Dirty: Dynamic SQL

A solution can be found very quickly, and on many forums you will get that answer (unfortunately).
For MSSQL, it's this easy:

-- MS SQL Server
                      CREATE PROCEDURE get_employees_from_ssn_list @ssn_list VARCHAR(2000) 
                      AS EXEC('SELECT * FROM tbl_employees 
                      WHERE ssn IN (''' + @ssn_list + ''')');

Open in new window

In short, you build the sql "on-the-fly", and execute it. There are 2 issues with this:

SQL injection (in general: READ IT): Wikipedia. I won't go into the details, but take it as a very serious

Security: in MS SQL Server, the dynamic sql will run under the permissions of the caller of the stored procedure, and not of the owner of the stored procedure.

In Oracle, we cannot return a recordset like in MS SQL Server or MySQL, you need to return a ref cursor, and the calling application has to loop somehow through the cursor.
There is an alternative, for Oracle, using the PIPELINED function method shown later, though, but you might want to explore that yourself
So, the corresponding code would read:

-- Oracle
                      CREATE OR REPLACE PROCEDURE get_employees_from_ssn_list
                      (p_ssn_list IN VARCHAR2, emp_refcur in out SYS_REFCURSOR)
                      AS 
                        v_SQL varchar(2000);
                      BEGIN 
                        v_SQL := 'SELECT * FROM tbl_employees WHERE ssn IN (''' || p_ssn_list || ''')';
                        OPEN emp_refcur FOR v_SQL;
                      END;

Open in new window

As from MySQL 5.x, it's also possible, but a bit more complicated. Here as to be run using the mysql command line:

-- MySQL
                      delimiter //;
                      create procedure get_employees_from_ssn_list(p_ssn_list IN VARCHAR)
                      begin
                      prepare stmp from CONCAT('SELECT * FROM tbl_employees 
                      WHERE ssn IN (\'', p_ssn_list , '\')';
                      execute stmp;
                      deallocate prepare stmp;
                      end//
                      delimiter ;//

Open in new window

Due to the issues noted above, please do NOT use that kind of syntax, but read further!

2. Inefficient Method

Another method you will find is the a syntax using LIKE to find a match in the string. This trick can sometimes be very useful, but used alone, the big issue with this is performance: it will be unable to use any index on the ssn field.

-- MS SQL Server
                      CREATE PROCEDURE get_employees_from_ssn_list @ssn_list VARCHAR(2000) 
                      AS SELECT * 
                          FROM tbl_employees 
                         WHERE ',' + @ssn_list + ',' LIKE '%,' + ssn + ',%';

Open in new window

The explanation of how it works is obvious once you replace, for visualization, the variable and the column value:

WHERE ',123-456-789,777-777-777,' LIKE '%,123-456-789,%'

Open in new window

Hint: if the field ssn was of numerical data type, you would need to write cast(ssn as varchar(100)), for example.

As usual, the translation for MySQL:

-- MySQL
                      delimiter //;
                      create procedure get_employees_from_ssn_list(p_ssn_list IN VARCHAR)
                      begin
                       SELECT * 
                         FROM tbl_employees 
                        WHERE CONCAT(',', p_ssn_list , ',')  LIKE CONCAT('%,', ssn, ',%');
                      end//

Open in new window

You might find the built-in FIND_IN_SET function from MySQL a very tempting alternative:
http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_find-in-set

-- MySQL
                      delimiter //;
                      create procedure get_employees_from_ssn_list(p_ssn_list IN VARCHAR)
                      begin
                        SELECT * 
                          FROM tbl_employees 
                         WHERE FIND_IN_SET(ssn, p_ssn_list) > 0;
                      end//

Open in new window

To finish, the Oracle version (I hope you don't get lost in the quotes):

-- Oracle
                      CREATE OR REPLACE PROCEDURE get_employees_from_ssn_list
                      (p_ssn_list IN VARCHAR2, emp_refcur in out SYS_REFCURSOR)
                      AS 
                        v_SQL varchar(2000);
                      BEGIN 
                        v_SQL := 'SELECT * FROM tbl_employees 
                             WHERE '',' || p_ssn_list || ','' LIKE ''%,'' || ssn || '',%'' ';
                        OPEN emp_refcur FOR v_SQL;
                      END;

Open in new window

3. Parse the list into a set

As we understood so far, IN (@parameter) does not work, because the SQL engine only sees 1 value. Though the 2 suggestions above are quick to implement, they have very serious drawbacks.
So, we need to get a true list from that string. In my MS SQL Servers, I have this function implemented:

-- MS SQL Server
                      CREATE FUNCTION dbo.ParmsToList 
                      (@Parameters varchar(8000), @delimiter varchar(10) )
                      -- SQL 2005+ version would change the varchar(8000) into varchar(max)
                      returns @result TABLE (Value varchar(8000), row_num int identity )
                      AS  
                      begin
                        declare @dx varchar(9)
                       
                        if @delimiter is null  set @delimiter = ' '
                        if datalength(@delimiter) < 1 set @delimiter = ' '
                        set @dx = left(@delimiter, datalength(@delimiter)-1)
                       
                        declare @Value varchar(8000), @Pos int
                       
                        set @Parameters = @Parameters + @delimiter
                        set @Pos = CHARINDEX(@delimiter, @Parameters, 1)
                       
                        IF REPLACE(@Parameters, @delimiter, @dx) <> ''
                        BEGIN
                          WHILE @Pos > 0 
                          BEGIN
                            SET @Value = LEFT(@Parameters, @Pos - 1)
                            IF @Value <> ''
                            BEGIN
                              INSERT INTO @result (Value) 
                              VALUES (@Value) 
                            END
                            SET @Parameters = SUBSTRING(@Parameters, @Pos+ datalength(@delimiter),8000)
                            SET @Pos = CHARINDEX(@delimiter, @Parameters, 1)
                       
                          END --WHILE @Pos > 0 
                        END -- IF REPLACE(@Parameters, @delimiter, @dx) <> ''
                         
                        RETURN
                      END

Open in new window

and the function is used like this:

-- MS SQL Server
                      CREATE PROCEDURE get_employees_from_ssn_list @ssn_list VARCHAR(2000) 
                      AS SELECT * 
                           FROM tbl_employees 
                          WHERE ssn IN (SELECT VALUE FROM dbo.ParmsToList(@ssn_list,',') );

Open in new window

I will not discuss (here) on how efficient my function is, as that is out of the scope of this article.

note: the row_num field returned by my function is to allow ordering of the result set by the order in which the items are in the list, if that is needed, using the join syntax:

-- MS SQL Server
                      CREATE PROCEDURE get_employees_from_ssn_list @ssn_list VARCHAR(2000) 
                      AS SELECT e.* 
                           FROM tbl_employees e
                           JOIN dbo.ParmsToList(@ssn_list,',') f
                               ON f.value = e.ssn 
                          ORDER BY f.row_num;

Open in new window

In MySQL, there is no alternative, unfortunately, as:

user-defined functions can only return single values

procedures cannot be used (like in most other databases) directly inside a query

If you find differently, please post, and I will update the article accordingly (with credits).

In Oracle, I implemented a similar function than in SQL Server, using the PIPELINED technique. We will need a preparation, though:

-- Oracle
                      CREATE OR REPLACE TYPE myVarcharTable AS TABLE OF VARCHAR2
                      /

Open in new window

Now, we can create our function:

-- Oracle
                      create or replace FUNCTION get_rows_from_list
                      (l in LONG default null, sep in varchar2 default ',')
                      return myVarcharTable PIPELINED
                      as
                        l_pos INT := 1;
                        l_next INT;
                        l_part VARCHAR(500);
                      begin
                        select instr( l, sep, l_pos) into l_next FROM DUAL;
                        while (l_next>0)
                        loop
                          select substr(l, l_pos, l_next - l_pos) into l_part FROM DUAL;
                          pipe row(l_part);
                      
                          select l_next + 1, instr( l, sep, l_pos) 
                            into l_pos, l_next   FROM DUAL;
                        end loop;
                      
                        select substr(l, l_pos) into l_part FROM DUAL;
                        pipe row(l_part);
                      
                        return;
                      end;
                      /

Open in new window

And the usage would be like this:

-- Oracle
                      CREATE OR REPLACE PROCEDURE get_employees_from_ssn_list
                      (p_ssn_list IN VARCHAR2, emp_refcur in out SYS_REFCURSOR)
                      AS 
                      BEGIN
                        OPEN emp_refcur FOR 
                          SELECT ssn, last_name, first_name 
                            FROM tbl_employees 
                           WHERE ssn IN (SELECT column_value
                                   FROM TABLE(get_rows_from_list(p_ssn_list, ','))
                                        )
                        ;
                      END;

Open in new window

You now have a method to do this without dynamic sql, though still nice and easy-to-read code, and possibly using a index on ssn field. You will see that you can reuse that function quite often in your developments.

4. Parse using XML features

Again, we must leave MySQL out of the picture, as it does not (yet) have the XML features required.
MySQl 5.1 does have some XML support, but there is no method to get a set from the xml:
http://dev.mysql.com/doc/refman/5.1/en/xml-functions.html

With MS SQL Server 2005 or higher, we could delegate the parsing to XML. You can choose the tag names as you wish (XML is case sensitive!), I choose the shortest possible length; the stands for table, while stands for row.

-- MS SQL Server
                      create function dbo.ParmsToListViaXML
                      (@Parameters varchar(8000), @delimiter varchar(10) )
                      returns @result TABLE (Value varchar(8000), row_num int identity )
                      AS
                      Begin
                        declare @XML XML
                        SET @XML  = '<t><r>' +  Replace(@Parameters , @delimiter, '</r><r>') + '</r></t>'
                      
                        INSERT INTO @result (value)
                        SELECT t.r.value('.','VARCHAR(8000)') AS val
                          FROM @XML.nodes('/t/r') as t(r)
                      
                        return
                      END

Open in new window

The usage is the same as for my other version. Of course, this function could have a brother that accepts the xml directly, or you could use that directly in your procedure, avoiding the additional function call.
For SQL 2000, you could use the OpenXML function to implement the same.

As from Oracle 9i, we can implement the same XML parsing:

-- Oracle
                      CREATE OR REPLACE PROCEDURE get_employees_from_ssn_list
                      (p_ssn_list IN VARCHAR2, emp_refcur in out SYS_REFCURSOR)
                      AS 
                       v_ssn_list varchar2(2000);
                      BEGIN
                       v_ssn_list := '<t><r>' || replace( p_ssn_list, ',',  '</r><r>') || '</r></t>';
                       OPEN emp_refcur FOR 
                         SELECT ssn, last_name, first_name 
                           FROM tbl_employees 
                           WHERE ssn IN ( SELECT extract(value(t), '/r/text()').getStringval() 
                                            FROM table(xmlsequence( extract(xmltype(v_ssn_list), '/t/r'))) t 
                                        )
                       ;
                      END;

Open in new window

For both MS SQL Server and Oracle codes, not being used to XML in general, it took me a couple of hours to get both working correctly. Hence, don't be discouraged if you don't understand the syntax right away.

5. Even other alternatives

I got reference of this page: http://vyaskn.tripod.com/passing_arrays_to_stored_procedures.htm (thanks aikimark). Though it's dedicated to MS SQL, it contains another one I wanted to point out to be "complete", however it is a bit "cryptic" to understand on how it works. See Method 5: Using a table of numbers or pivot table, to parse the comma separated list.. The same concept will work for MySQL and Oracle, you only have to change the function names as needed. update: found the "same" method, adjusted for MS SQL Server 2005 in this article

With MS SQL Server, we could explore even another method, using CLR (.net) stored procedures.
However, as with the XML table, for the simple task given here, the overhead will just be not worth the effort.
For Oracle, the corresponding feature would be java stored procedures/functions.
I do agree that T-SQL and PL/SQL are not necessarily more efficient to parse the CSV/XML. However, for the task given, it is efficient enough rather than call for "outside" methods.

6. IN and it's alternatives

For sake of simplicity, I sticked with the IN (subselect) in steps 3 and 4. Of course, for matters of performance, you could first insert the results of the parsing code into a intermediate (temp) table, and then use either of the 3 syntax:

-- MS SQL Server
                      SELECT e.* FROM tmp_employees e 
                      WHERE ssn IN ( select t.SSN FROM temp_table t);

Open in new window

-- MS SQL Server
                      SELECT e.* FROM tmp_employees e 
                      WHERE EXISTS ( select null FROM temp_table t WHERE t.ssn = e.ssn);

Open in new window

-- MS SQL Server
                      SELECT e.* FROM tmp_employees e  
                      JOIN temp_table t 
                         ON t.ssn = e.ssn;

Open in new window

Notes:

the IN() syntax will require that the subquery does NOT return a NULL value.

the JOIN syntax will require the temp table has no duplicates

in MS SQL Server, you can use either a #temp table, or a @table variable

You could also bypass the (temp) table, and join directly to the function, as I showed already above:

-- MS SQL Server
                      CREATE PROCEDURE get_employees_from_ssn_list @ssn_list VARCHAR(2000) 
                      AS SELECT e.* 
                           FROM tbl_employees e
                           JOIN dbo.ParmsToList(@ssn_list,',') f
                               ON f.value = e.ssn 
                          ORDER BY f.row_num;

Open in new window

The JOIN with the function also works in Oracle.

7. Conclusions and Warnings

*WARNING*
Though tempting and quick to implement, you should think at least twice before using one of the two first suggestions.

*XML Overhead*
The XML method is interesting to know, but the XML overhead (more data, more parsing) makes this method (likely) less efficient. Though, for an UPDATE of several rows at once, this can be a very powerful method. See concrete example in MS SQL here (requires registration):
http://www.sqlservercentral.com/articles/Stored+Procedures/2977/
In regards to the overhead, just do the math: consider you want to pass X items with N characters length.
With the CSV format, you will pass exactly (N*X) + (X-1), in short 1/Nth overhead.
For XML, you will need more: (N*X) + (X*7) + 7, in other words, over 7/Nth overhead.
This will negatively impact, with high load, network bandwidth to and from the server, as well as CPU and Memory usage on the server.

*LIMITS*
It is important that you know the string size limits of your database/version.
MS SQL Server 7 and 2000 VARCHAR == 8000
MS SQL Server 2005+ both XML and VARCHAR(MAX) == 2GB

MySQL's VARCHAR size depends on the version:
up to MySQL 5.0.2 == 255
MySQL 5.0.3 or higher == 64K
Note: if you need more data for your list in MySQL, you might use a MEDIUMTEXT or LONGTEXT data type

Oracle's VARCHAR2 == 4000
Note: if you need more data for your list in Oracle, you will need a CLOB data type

I hope you find the article helpful for your current or future developments in any of the databases you are working with.

If you find/know some other method to implement the above, also in other databases, you are welcome to post that information here.

References

MS SQL Server
Create Stored Procedure:
SQL 2000: http://msdn.microsoft.com/en-us/library/aa258259(SQL.80).aspx
SQL 2005: http://msdn.microsoft.com/en-us/library/ms190669(SQL.90).aspx
SQL 2008: http://msdn.microsoft.com/en-us/library/ms190669.aspx

MySQL
Create Stored Procedure syntax:
http://dev.mysql.com/doc/refman/5.0/en/create-procedure.html

sql prepare syntax:
http://dev.mysql.com/doc/refman/5.0/en/sql-syntax-prepared-statements.html

Oracle
Create Stored Procedure syntax:
www.techonthenet.com/oracle/procedures.php

Implement pipelined functions:
http://www.akadia.com/services/ora_pipe_functions.html

Using xml table extractions:
http://stanford.edu/dept/itss/docs/oracle/10g/appdev.101/b10790/xdb13gen.htm
http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/functions228.htm

Comments (12)

Guy Hengel [angelIII / a3]

Billing Engineer

CERTIFIED EXPERT

Most Valuable Expert 2014

Top Expert 2009

Author

Commented: 2014-01-15

Hi,

actually, you should NOT modify the sql function, but pass the data correctly using CDATA tags:
http://en.wikipedia.org/wiki/CDATA

CHeers and thanks for your comment.
a3

ValentinoV

BI Consultant

Most Valuable Expert 2011

Commented: 2014-01-16

Hey A3,

Thanks for the CDATA tip, that makes sense! I actually spent some time yesterday looking for something like that (as my workaround was kinda dirty...) but alas. So thanks for mentioning this!

I still think that the function should be able to handle special XML characters by itself though, otherwise queries (or reports in my particular case) become too annoying to write. In case someone is interested, here's the updated version using the clean CDATA tags:

CREATE FUNCTION [dbo].[ParmsToListViaXML]
(@Parameters varchar(max), @Delimiter varchar(10) )
RETURNS @result TABLE (value varchar(8000), row_num int identity )
AS
BEGIN
	
	declare @xml XML

	SET @xml = '<t><r><![CDATA['
		+ REPLACE(@Parameters , @Delimiter, ']]></r><r><![CDATA[') + ']]></r></t>';

	INSERT INTO @result (value)
	SELECT t.r.value('.','VARCHAR(8000)') AS val
	FROM @xml.nodes('/t/r') as t(r);
	
RETURN
END

Best regards,
VV

Jim Horn

SQL Server Data Dude

Most Valuable Expert 2013

Author of the Year 2015

Commented: 2014-11-09

I greatly enjoyed reading this article. Voted Yes.

aikimark

Get vaccinated; Social distance; Wear a mask

Top Expert 2014

Commented: 2014-11-10

Jeff Moden (of no-RBAR fame) wrote an article in 2011 about parsing a CSV file. He also compares his best algorithm against a CLR splitter. The CLR (.Net app) wins by a wide margin. If you have to pass a lot of delimited data in a high performance setting, then a .Net plug-in is the way to go.
http://www.sqlservercentral.com/articles/Tally+Table/72993/

The discussion about the article is incredible. I learned a lot from reading and participating.
http://www.sqlservercentral.com/Forums/Topic1101315-203-1.aspx

Commented: 2014-12-05

I have also found this page (for oracle) to be very interesting:
http://www.oracle-developer.net/display.php?id=412
but must admit the code is not really "readable" ...