Showing posts with label PowerShell. Show all posts
Showing posts with label PowerShell. Show all posts

Tuesday, March 3, 2020

Finding unique element types in XML with PowerShell

I recently needed to deal with a large (165MB) XML file, so the first question I had is: How many unique element types are actually in the file? I found a bunch of truly painful-sounding answers online, mainly involving the distinct-values() XPATH 2.0 function. If you've ever dealt with XPATH, you know that just setting up the toolchain environment to use it is pretty labor intensive. However, I then started looking for PowerShell cmdlets that work with XML and I found Select-Xml:

It looks nice enough, but it turns out that it has a few features that are really nice. Specifically, once you read in an XML document, your variable will have a Node member, and that member contains one member for each type of element in your document! So it automatically shows you the list of distinct elements. You can then maneuver through the members to find the number of each type of element ($myvar.Node.myelementType.count) or to view one of the instances ($myvar.Node.myelementType[0]), etc.

In my case, I found that my 165MB file (produced by the ITNM DLA) only contained about 17 unique element types, and that there were only about 2500 network devices, which I could then easily loop through to get the information I needed. Chained with ConvertTo-Csv, I was able to massage the data exactly as needed.

The moral of the story is: PowerShell is amazing right out of the box. I didn't have to add any new packages or anything, and it gave me precisely what I needed.

Friday, September 13, 2019

If you're scripting on Windows, use PowerShell

My last post on PowerShell was in 2008, so I thought I would write an update. If you're writing scripts on Windows, you should probably be using PowerShell. It seems to have 99 to 100% of the tools (especially parsers) that I ever need. I just recently needed to scrape a web page for some data, so I thought I would spend some time messing with PowerShell to get it going. Well, it only took about 30 minutes to develop the entire script that I needed, with absolutely no external dependencies.

Here's the whole script:

# get the web page. Yes, PowerShell has a 'curl' command/alias
$resp = curl -UseDefaultCredentials http://myhostname/mypagename

# Get all of the rows of the table with an ID of "serverTable"
$rows = $resp.ParsedHtml.getElementById("serverTable").getElementsByTagName("TR")

# Loop through the rows, skipping the header row: 
for ($i=1; $i -lt $rows.Length(); $i++) {

# get the hostname of this row
  $thehost = $rows[$i].getElementsByTagName("TD")[2]

# get the date this host was last rebooted

# if the host was rebooted over 20 days ago, print that date

  if ([datetime]::Now.AddDays(-20) -gt [datetime]::parseexact($rebootDateString.innerText,"G", $null))   {
    $rebootDateString.innerText } }

That's it, with no external references and nothing extra to install. It's got date parsing, date arithmetic, HTML parsing and HTTP request capabilities all built in. I realize that this then isn't portable... or is it? There's actually a PowerShell port for Linux available, with instructions here from Microsoft:

I know that Python is a hot language these days, but I don't like it as much as PowerShell. I tried to do the above with Python, and it took quite a bit longer, even though I've used Python more than PowerShell. You have to import some classes and then use XPath to find elements in the HTML. PowerShell was just straightforward and easy, at least for me, with my background and expectations. YMMV, but I like PowerShell.