Code Project

Link Unit

Showing posts with label TSQL. Show all posts
Showing posts with label TSQL. Show all posts

Wednesday, July 23, 2014

Easy way to split string

We generally use function to split a delimited string into rows. The code for same is listed below
CREATE FUNCTION
[dbo].[fnSplitString] ( @string NVARCHAR(MAX) ,@delimiter CHAR(1) ) RETURNS@output
TABLE (splitdata NVARCHAR(MAX)) BEGIN DECLARE @start INT ,@end INT SELECT @start = 1 ,@end = CHARINDEX(@delimiter, @string) WHILE @start < LEN(@string) + 1 BEGIN IF @end = 0 SET @end = LEN(@string) + 1 INSERT INTO @output (splitdata) VALUES (SUBSTRING(@string, @start, @end - @start)) SET @start = @end + 1 SET @end = CHARINDEX(@delimiter, @string, @start) END RETURN END GO We will be using XQuery method to split the string in easy and efficient way. Following steps are taken to split the string. #1. Convert the string into formatted XML,
for that replace the delimitor with any tag (used in example). #2. Remove empty nodes like '' or '' #3. list value of all nodes USE tempdb DECLARE @x VARCHAR(MAX) DECLARE @x1 XML DECLARE @t1 DATETIME DECLARE @t2 DATETIME SET @x = '100|101|102|' SET @x = replicate(@x, 1) SELECT @x1 = cast
(
replace
(
replace
(
'<x>' + replace
(
@x, '|', '</x><x>')
+ '</x>', '<x ></x>', '')
, '<x></x>', '')
AS XML)
SET @t1 = getdate() -- start time of process SELECT X.x1.value('.', 'varchar(max)') FROM @x1.nodes('//x') X(x1) -- XQuery SET @t2 = getdate() -- End time of process SELECT datediff(ms, @t1, @t2) -- Find time difference in milliseconds SET @t1 = getdate() -- start time of process SELECT * FROM [dbo].[fnSplitString](@x, '|') SET @t2 = getdate() -- End time of process SELECT datediff(ms, @t1, @t2) Benchmarks for XQuery method to split the string -- XQuery -- 0ms for 3 -- 0ms for 30 -- 6ms for 300 -- 36ms for 3000 -- 350ms for 30000 -- 7s for 300000 Benchmarks for Procedure method to split the string -- 3ms for 3 -- 10ms for 30 -- 100ms for 300 -- Procedure 1m for 3000 -- Procedure 11s for 30000 -- Procedure 2m15s for 300000
Conclusion: Looking at performance and simplicity of XQuery, I will recommend it over procedural way. Hope it helps.

Friday, June 20, 2014

Easy Way to remove HTML tags in SQL Server

We were looking for a way to remove/strip HTML tags from varchar columns. On making search found some code samples, which suggested to create a stored procedure like following.

Click to view Stored Procedure StripHtml

CREATE FUNCTION [dbo].[StripHtml] (@HTMLText VARCHAR (MAX))
RETURNS VARCHAR (MAX)
AS
BEGIN
DECLARE @Start  int
DECLARE @End    int
DECLARE @Length int

-- Replace the HTML entity & with the '&' character (this needs to be done first, as
-- '&' might be double encoded as '&amp;')
SET @Start = CHARINDEX ('&', @HTMLText)
SET @End = @Start + 4
SET @Length = (@End - @Start) + 1

WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN
SET @HTMLText = STUFF (@HTMLText, @Start, @Length, '&')
SET @Start = CHARINDEX ('&', @HTMLText)
SET @End = @Start + 4
SET @Length = (@End - @Start) + 1
END

-- Replace the HTML entity < with the '<' character
SET @Start = CHARINDEX ('<', @HTMLText)
SET @End = @Start + 3
SET @Length = (@End - @Start) + 1

WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN
SET @HTMLText = STUFF (@HTMLText, @Start, @Length, '<')
SET @Start = CHARINDEX ('<', @HTMLText)
SET @End = @Start + 3
SET @Length = (@End - @Start) + 1
END

-- Replace the HTML entity > with the '>' character
SET @Start = CHARINDEX ('>', @HTMLText)
SET @End = @Start + 3
SET @Length = (@End - @Start) + 1

WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN
SET @HTMLText = STUFF (@HTMLText, @Start, @Length, '>')
SET @Start = CHARINDEX ('>', @HTMLText)
SET @End = @Start + 3
SET @Length = (@End - @Start) + 1
END

-- Replace the HTML entity & with the '&' character
SET @Start = CHARINDEX ('&amp;', @HTMLText)
SET @End = @Start + 4
SET @Length = (@End - @Start) + 1

WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN
SET @HTMLText = STUFF (@HTMLText, @Start, @Length, '&')
SET @Start = CHARINDEX ('&amp;', @HTMLText)
SET @End = @Start + 4
SET @Length = (@End - @Start) + 1
END

-- Replace the HTML entity   with the ' ' character
SET @Start = CHARINDEX (' ', @HTMLText)
SET @End = @Start + 5
SET @Length = (@End - @Start) + 1

WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN
SET @HTMLText = STUFF (@HTMLText, @Start, @Length, ' ')
SET @Start = CHARINDEX (' ', @HTMLText)
SET @End = @Start + 5
SET @Length = (@End - @Start) + 1
END

-- Replace any
tags with a newline
SET @Start = CHARINDEX ('<br />', @HTMLText)
SET @End = @Start + 3
SET @Length = (@End - @Start) + 1

WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN
SET @HTMLText = STUFF (@HTMLText, @Start, @Length, CHAR(13) + CHAR(10))
SET @Start = CHARINDEX ('<br />', @HTMLText)
SET @End = @Start + 3
SET @Length = (@End - @Start) + 1
END

-- Replace any
tags with a newline
SET @Start = CHARINDEX ('<br />', @HTMLText)
SET @End = @Start + 4
SET @Length = (@End - @Start) + 1

WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN
SET @HTMLText = STUFF (@HTMLText, @Start, @Length, 'CHAR(13) + CHAR(10)')
SET @Start = CHARINDEX ('<br />', @HTMLText)
SET @End = @Start + 4
SET @Length = (@End - @Start) + 1
END

-- Replace any
tags with a newline
SET @Start = CHARINDEX ('<br />', @HTMLText)
SET @End = @Start + 5
SET @Length = (@End - @Start) + 1

WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN
SET @HTMLText = STUFF (@HTMLText, @Start, @Length, 'CHAR(13) + CHAR(10)')
SET @Start = CHARINDEX ('<br />', @HTMLText)
SET @End = @Start + 5
SET @Length = (@End - @Start) + 1
END

-- Remove anything between tags
SET @Start = CHARINDEX ('<', @HTMLText)
SET @End = CHARINDEX ('>', @HTMLText, CHARINDEX ('<', @HTMLText))
SET @Length = (@End - @Start) + 1

WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN
SET @HTMLText = STUFF (@HTMLText, @Start, @Length, '')
SET @Start = CHARINDEX ('<', @HTMLText)
SET @End = CHARINDEX ('>', @HTMLText, CHARINDEX ('<', @HTMLText))
SET @Length = (@End - @Start) + 1
END

RETURN LTRIM( RTRIM(@HTMLText))

END

GO



This stored procedure does what it says, but it is lot of code and procedural. So, thought of using XML data type and value method of SQL Server as embedded below.










NB: HTML text must be properly formatted otherwise we will get XML Parsing error.

Hope it Helps.