Ripping Sharepoint Metadata Apart

I previously blogged about exporting files and metadata from Sharepoint. It worked!!!! Well, now that all the files are extracted – I have ended up with a csv.

  1. seems some double quotes are included – but a proper csv parser (eg. Excel) will turn those into 1 single quote
  2. The Xml column contains an xml node
  3. Add an xml header and a root node
    1. <?xml version=”1.0″ encoding=”UTF-8″ standalone=”no” ?>
    2. <root>
    3. <z:row …..
      />
    4. </root>
  4. Then using Notepad++ with “XML Tools plugin installed” – you can surf the node path with ctrl-alt-shift-P
    1. turns out to be /root/z:row
    2. a c# app – try this article
  5. Also – use the Plugins->xml tools->Pretty Print Attributes
  6. An XPath /root/z:row[@ows_MetaInfo] would get what we require
    1. Parse this crazy thing with CRLF or
    2. The first number 1234;# is the id number – remove it
    3. field:TY|val
      1. field name
      2. TY = type
      3. val = value
    4. Use a regex like this to parse
  7. Now .. undoing the whole thing by hand
    1. undo the HTML Entities – now this will likely be done with the XML API
      1. use the Notpad++ plugin HTML Tag (Plugin->Html Tag->Decode Enitites)
    2. &lt; -> < etc.
    3. unencode entities like #x0020 to a space etc.
  8. To extract thumbnails – they are stored in Base64 – here is a c# app to decode into jpegs