Tuesday, March 3, 2020

Finding unique element types in XML with PowerShell

I recently needed to deal with a large (165MB) XML file, so the first question I had is: How many unique element types are actually in the file? I found a bunch of truly painful-sounding answers online, mainly involving the distinct-values() XPATH 2.0 function. If you've ever dealt with XPATH, you know that just setting up the toolchain environment to use it is pretty labor intensive. However, I then started looking for PowerShell cmdlets that work with XML and I found Select-Xml:

https://docs.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/select-xml?view=powershell-7

It looks nice enough, but it turns out that it has a few features that are really nice. Specifically, once you read in an XML document, your variable will have a Node member, and that member contains one member for each type of element in your document! So it automatically shows you the list of distinct elements. You can then maneuver through the members to find the number of each type of element ($myvar.Node.myelementType.count) or to view one of the instances ($myvar.Node.myelementType[0]), etc.

In my case, I found that my 165MB file (produced by the ITNM DLA) only contained about 17 unique element types, and that there were only about 2500 network devices, which I could then easily loop through to get the information I needed. Chained with ConvertTo-Csv, I was able to massage the data exactly as needed.

The moral of the story is: PowerShell is amazing right out of the box. I didn't have to add any new packages or anything, and it gave me precisely what I needed.

No comments: