ms-sql-server-group-concat-sqlclr

SQL Server CLR user-defined aggregates that collectively offer similar functionality to the MySQL GROUP_CONCAT function. Specialized functions ensure the best performance based on required functionality. Aggregates implemented using C#; requires .NET Framework 3.5.

View the Project on GitHub orlando-colamatteo/ms-sql-server-group-concat-sqlclr

User Guide

Deployment Instructions:

  • Download GroupConcatInstallation.zip
  • Extract archive contents to a local directory. The archive contains the following files:
    • GroupConcatInstallation.sql Contains the SQL statements to enable SQL CLR in a given database, build the GroupConcat Assembly and User-defined Aggregates.
    • GroupConcatUninstallation.sql Contains the SQL statements to drop the GroupConcat Assembly and User-defined Aggregates.
    • Test.1.BuildTestData.sql Contains SQL statements to build a table containing test data.
    • Test.2.DemoFunctions.sql Contains SQL statements that demonstrate the usefulness of the User-defined Aggregates as well as compares them to the XML methods they can replace.
  • Open GroupConcatInstallation.sql in SSMS.
  • Edit the database name in the USE statement at the top to suit your environment and execute.

Syntax:

  • dbo.GROUP_CONCAT(VALUE NVARCHAR(4000))
    • Description
      • Aggregate function accepts strings to be aggregated and produces a comma-delimited list of grouped strings.
    • Arguments
      • VALUE: any valid string
    • Return type
      • NVARCHAR(MAX)
  • dbo.GROUP_CONCAT_D(VALUE NVARCHAR(4000), DELIMITER NVARCHAR(10)) 
    • Description
      • Aggregate function accepts strings to be aggregated and a custom delimiter and produces a delimited list of grouped strings.
    • Arguments
      • VALUE: any valid string
      • DELIMITER: any valid string
    • Return type
      • NVARCHAR(MAX)
  • dbo.GROUP_CONCAT_DS(VALUE NVARCHAR(4000), DELIMITER NVARCHAR(10), SORT_ORDER TINYINT)
    • Description
      • Aggregate function accepts strings to be aggregated, a custom delimiter and a sort order and produces a delimited list of grouped strings in the desired order.
    • Arguments
      • VALUE: any valid string
      • DELIMITER: any valid string
      • SORT_ORDER: tinyint; 1 = Ascending Order, 2 = Descending order
    • Return type
      • NVARCHAR(MAX)
  • dbo.GROUP_CONCAT_S(VALUE NVARCHAR(4000), SORT_ORDER TINYINT)
    • Description
      • Aggregate function accepts strings to be aggregated and a sort order and produces a comma-delimited list of grouped strings in the desired order.
    • Arguments
      • VALUE: any valid string
      • SORT_ORDER: tinyint; 1 = Ascending Order, 2 = Descending order
    • Return type
      • NVARCHAR(MAX)

Test Instructions:

  • Follow the Deployment Instructions above to make the Aggregates available in your environment.
  • Open script Test.1.BuildTestData.sql in SSMS and execute in test environment to build some test data to be used for exercising the UDAs.
  • Open script Test.2.DemoFunctions.sql in SSMS and execute in test environment to exercise the UDAs as well the XML methods they can replace.

FAQ:

Q: Why should I bother using these UDAs when I can get the same results using T-SQL w/ XML and avoid enabling the SQLCLR?
A: It is true that you can arrive at the same result using T-SQL w/ XML and avoid enabling the SQLCLR. If you are in an environment where SQLCLR is disabled, and the powers-that-be will not allow it to be enabled for one reason or another, then unfortunately this solution will not work for you. However, if SQLCLR is an option, the UDAs offer a more intuitive and concise syntax than do the XML methods and they provide comparable or better performance* than the XML methods.
    * depends on use case, see next FAQ

Q: How do the UDAs stack up with other methods in terms of performance?
A: An extensive performance analysis was done where results of the "XML PATH" *, "XML PATH,TYPE" and UDAs were compared when handling 6 different use cases. When the "XML PATH" method was included in the testing (it is not an option for volatile data because of the escape issue) the UDAs outperformed the XML methods in 3 of 6 use cases. When the "XML PATH" method was excluded due to the character escape issue the UDAs outperformed the "XML PATH,TYPE" method in 4 of 6 use cases.
    *  the "XML PATH" method suffers from an issue where ampersands and angle brackets are escaped as & < and > which disqualifies them from use in many scenarios due to the display problems that can cause

Q: Why provide four UDAs instead of just one with all the parameters one might need?
A: SQL CLR functions do not currently allow for overloading as in .NET methods. For each specialized interface a separate UDA is required. Performance tip: Pick the correct UDA for your use-case. If you want a comma-delimited unsorted list there is no need to use GROUP_CONCAT_D or GROUP_CONCAT_DS because comma is the default delimiter and they will perform slightly worse over the same data than GROUP_CONCAT and GROUP_CONCAT_S respectively.

Q: Why not use the example UDA from Microsoft Books Online?
A: The UDAs in this project utilize a more efficient internal data structure for storing, joining and sorting grouped strings.

Known Limitations:

  1. Delimited lists can only be sorted by the values they contain. If you require sorting by a column other than the values they contain use the XML methods.
  2. Delimited lists are sorted as strings. If the values must be sorted as numeric use the XML method. There is an open Issue that once resolved will give the option to sort values as numerics.

Source Code:

The Main directory contains the source code in the downloadable release bundle. If you are interested only in the code supporting the released solution look in the Main directory.

The Experimental directory contains source code from my research & development that lead to the code contained in the Main directory. However, the code in Main is not related to Experimental  by any branching. It was a pre-cursor to the Main code and is there for your consideration, to show other methods reviewed and attempted before the the technique emerged and Main was developed.