This SQL Server trick seems simple, but let’s look at it in detail.
The Basics
The following selects persons which LastName has a Y or Z as the third character from the end of the value:
|
|
About Accuracy
First, LEN() ignores trailing spaces. In SQL Server, LEN() strips trailing spaces but SUBSTRING indexes the full string. If LastName has trailing whitespace (common in CHAR(n) columns or messy imports), the start position is computed against a shorter logical length, while SUBSTRING counts the real characters — so it reads the wrong character.
|
|
So run a trim first:
|
|
Next, let’s talk about case sensitivity and collation. IN ('Y','Z') matches y/z only under a case-insensitive collation (the default). On a _CS_ collation, lowercase third-from-end letters won’t match. Make it explicit if it matters which the collation in the WHERE clause:
|
|
About Performance
The predicate is non-SARGable. Wrapping LastName in SUBSTRING/LEN means SQL Server can’t seek an index. It must evaluate the expression for every row, forcing a full table/clustered index scan. No index on LastName will help as written.
A cleaner equivalent expression (same non-SARGable cost, but clearer intent and avoids the LEN math):
|
|
If this query runs often, add a persisted computed column and index it, turning the scan into a seek:
|
|
About SARGable
SARGable means Search ARGument able. A predicate (a WHERE/JOIN/ON condition) is SARGable if the query optimizer can use it as a search argument to seek through an index rather than scanning every row.
The core rule
A predicate is SARGable when the indexed column is left bare on one side of the comparison — no functions, no calculations wrapped around it.
|
|
Why Does It Matter
An index is sorted by the column’s actual value. The engine can binary search (seek) to Smith instantly. But once you wrap the column — e.g., SUBSTRING(LastName, ...)— the index is sorted byLastName, not by the substring result. The engine has no way to jump to the answer, so it must compute the expression for every row (a scan).
What Can You Do to Fix It
Rewrite the statement so the column stays bare and the computation happens on the other side (or via a range). For example:
|
|
When the logic genuinely can’t be expressed against the bare column (like “third character from the end”), the workaround is a persisted computed column with its own index — you precompute the expression once at write time so there’s a real indexed value to seek against, as shown above.
Follow me
Thanks for reading this article. Make sure to follow me on X, subscribe to my Substack publication and bookmark my blog to read more in the future.
Photo by Microsoft Server