Ripping Sharepoint Metadata Apart

Ripping Sharepoint Metadata Apart

I previously blogged about exporting files and metadata from Sharepoint. It worked!!!! Well, now that all the files are extracted – I have ended up with a csv.

  1. seems some double quotes are included – but a proper csv parser (eg. Excel) will turn those into 1 single quote
  2. The Xml column contains an xml node
  3. Add an xml header and a root node
    1. <?xml version=”1.0″ encoding=”UTF-8″ standalone=”no” ?>
    2. <root>
    3. <z:row …..
      />
    4. </root>
  4. Then using Notepad++ with “XML Tools plugin installed” – you can surf the node path with ctrl-alt-shift-P
    1. turns out to be /root/z:row
    2. a c# app – try this article
  5. Also – use the Plugins->xml tools->Pretty Print Attributes
  6. An XPath /root/z:row[@ows_MetaInfo] would get what we require
    1. Parse this crazy thing with CRLF or
    2. The first number 1234;# is the id number – remove it
    3. field:TY|val
      1. field name
      2. TY = type
      3. val = value
    4. Use a regex like this to parse
  7. Now .. undoing the whole thing by hand
    1. undo the HTML Entities – now this will likely be done with the XML API
      1. use the Notpad++ plugin HTML Tag (Plugin->Html Tag->Decode Enitites)
    2. &lt; -> < etc.
    3. unencode entities like #x0020 to a space etc.
  8. To extract thumbnails – they are stored in Base64 – here is a c# app to decode into jpegs

 

ELB Solutions.com Inc.
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.