Today’s tip is a relatively simple (but limited) way to programmatically extract the output columns of a SQL Server stored procedure or query along with the tables they come from.
This is something that came up at work. I didn’t want to do it by hand because there were many stored procedures to check, but it was difficult to automate because they could be (and were indeed) quite complex, with many selects and joins, computed columns, aliases and so on, and to make things worse the SQL code had no standard indentation or structure, so string searching and regular expressions were not reliable options. Parsing the Transact-SQL code would have been way harder than anything else, so it was out of discussion too. Unfortunately, the limitations of this solution (that I’ll mention later) stopped us from using and we had to do the job by hand.
Lazy as I am, I wanted to make SQL Server itself give me the information I needed. After meditating about it for a while an idea came to my mind: I could try to extract the information from the execution plan.
You probably know it, but the query execution plan that you can see in Microsoft SQL Server Management Studio (SSMS from now on) comes from the SQL Server engine in XML format -SSMS just displays it in a graphical form. This XML is a real gem that provides lots of information about the query (even things that aren’t displayed in the execution plan tab), and you can see it by setting the “SET STATISTICS XML” option to ON before you run your query. For example, let’s run the following query (just a plain SQL query so you can see what it’s going on, but remember that it’s the same for stored procedures) that joins the Product and ProductSubcategory tables of the AdventureWorks sample database:
Notice that I got two result sets. The first is the query’s result, and the second is the execution plan in XML format (if you don’t see it, make sure the “Include Actual Execution Plan” option isn’t turned on). If you click on the link SSMS will open the XML in a new window:
Look at the OutputList node in there. It contains all the columns that appear in the result, each one within a ColumnReference element:
<ColumnReference Database=“[AdventureWorks]“ Schema=“[Production]“ Table=“[Product]“ Alias=“[p]“ Column=“ProductID“ />
<ColumnReference Database=“[AdventureWorks]“ Schema=“[Production]“ Table=“[Product]“ Alias=“[p]“ Column=“Name“ />
<ColumnReference Database=“[AdventureWorks]“ Schema=“[Production]“ Table=“[Product]“ Alias=“[p]“ Column=“ProductNumber“ />
<ColumnReference Database=“[AdventureWorks]“ Schema=“[Production]“ Table=“[Product]“ Alias=“[p]“ Column=“Color“ />
<ColumnReference Database=“[AdventureWorks]“ Schema=“[Production]“ Table=“[Product]“ Alias=“[p]“ Column=“StandardCost“ />
<ColumnReference Database=“[AdventureWorks]“ Schema=“[Production]“ Table=“[ProductSubcategory]“ Alias=“[psc]“ Column=“Name“ />
As you can see the information is quite detailed: for each column you have the database, the table and the schema it belongs to. A non-intuitive thing to keep in mind is that the Alias attribute is not the column’s alias, but the table’s.
Armed with this knowledge we can now create a tool to extract the information automatically: it’s just a matter of executing the stored procedure and retrieving the execution plan to process it. But let’s do one last thing before jumping to the code. I don’t know about you, but I really hate to work with XML -I’d rather work against an object model. It turns out that we can do just that: the ShowPlan’s schema is publicly available for download, and with it and the XML Schema Definition Tool (xsd.exe, located at C:\Program Files\Microsoft SDKs\Windows\v6.0A\bin\xsd.exe on my machine) we can create the classes needed to take the execution plan’s XML and deserialize into an object graph. Run this command to generate the classes (I renamed the showplanxml.xml.xsd file I downloaded to showplanxml.xsd):
xsd C:\showplanxml.xsd /classes /o:C:\
This will create a file in C: called showplan.cs with several classes within it (the root being ShowPlanXML) that we can now include in a new project in Visual Studio (you may want to add a namespace to the generated classes). The code required to get a ShowPlanXML instance from the execution plan is really simple:
using (XmlReader reader = XmlReader.Create(“ExecutionPlan.xml”, new XmlReaderSettings()))
XmlSerializer serializer = new XmlSerializer(typeof(ShowPlanXML));
ShowPlanXML plan = (ShowPlanXML)serializer.Deserialize(reader);
In the code snippet above I’m loading the XML from a local file just for demo purposes. In the real application I read it from a SqlDataReader after executing the stored procedure. I’m printing the version just to make sure the deserialization worked. But let’s do something more interesting:
foreach (StmtBlockType blocks in plan.BatchSequence)
foreach (StmtBlockType block in blocks)
foreach (BaseStmtInfoType item in block.Items)
StmtSimpleType statement = item as StmtSimpleType;
if (statement != null)
foreach (ColumnReferenceType columnReference in statement.QueryPlan.RelOp.OutputList)
Console.WriteLine(columnReference.Table + ” - “ + columnReference.Column);
Here I go through all the batches and statements until I reach the column references. The BaseStmtInfoType class has 4 derived classes: StmtCursorType, StmtReceiveType, StmtSimpleType, and StmtUseDbType (all these types are generated by xsd.exe too). You can tell more or less what case each one stands for based on the names. Since I’m looking for statements I only handle the case when the item is a StmtSimpleType instance. When I compile and run the code I get the following output:
Voilà! This is just what I wanted. Now some things to consider about the structure of the execution plan:
- For each step in the execution plan you’ll see a RelOp node (with its corresponding OutputList node). These nodes are nested according to the order they were executed. In our case:
We’ll have 3 RelOp nodes: the outermost one will be a Hash Match, containing two others, an Index Scan and a Clustered Index Scan (this is probably easier to understand if you see the graphical execution plan and the XML document side by side).
The good news is that for our purposes we only need to worry about the first RelOp node, since it corresponds to the last operation (the Hash Match).
- If you have more than one SQL statement in your stored procedure you’ll get execution plans for each of them. If you just care about the last result set you only need to process the last execution plan:
- SET STATISTICS XML ON will execute the query. The benefit of this is that you can extract the columns aliases from the SqlDataReader and match them by position with columns in the XML execution plan, provided you don’t have computed columns (see “Limitations” later). If you don’t need the aliases you can use the SET SHOWPLAN_XML option instead, which will generate the execution plan but will not actually execute the query, and therefore should be faster.
This technique has 3 big limitations:
- If you have computed columns the resulting XML will be much more complex. For example, let’s add a new column computed with a CASE statement to our query:
Now the execution plan will look like this:
As you can see the column is referred to as “Expr1004″. To actually find the expression you will have to navigate (possibly deep) into the document:
The entire CASE expression is in the XML, and you can get to the actual column used (MakeFlag) but it can be tedious (and this is intended to be a simple trick).
- You can’t determine the column aliases. The XML document doesn’t contain the aliases. At first I thought I could match the column references in the OuputList node to the column names from the query result by position, but unfortunately the columns in the XML are not necessarily in the same order. I was quite disappointed when I saw this because it makes this technique far less powerful (I’ve only checked in SQL Server 2005, perhaps things are better in SQL Server 2005 SP2 or SP3, or SQL Server 2008, but I don’t have any of them handy to check).
- If you have conditional logic the parameters that you pass to the stored procedure matter. This what you would expect since we are working with an execution plan anyway.
Well that’s all. I’m planning to upload a sample application for all this, but I’m too tired to do it tonight, so wait for the next post if you are interested. Hope this helps, happy coding!