Ripping Sharepoint Metadata Apart

I previously blogged about exporting files and metadata from Sharepoint. It worked!!!! Well, now that all the files are extracted – I have ended up with a csv.

seems some double quotes are included – but a proper csv parser (eg. Excel) will turn those into 1 single quote
The Xml column contains an xml node
Add an xml header and a root node
1. <?xml version=”1.0″ encoding=”UTF-8″ standalone=”no” ?>
2. <root>
3. <z:row …..
  />
4. </root>
Then using Notepad++ with “XML Tools plugin installed” – you can surf the node path with ctrl-alt-shift-P
1. turns out to be /root/z:row
2. a c# app – try this article
Also – use the Plugins->xml tools->Pretty Print Attributes
An XPath /root/z:row[@ows_MetaInfo] would get what we require
1. Parse this crazy thing with CRLF or
2. The first number 1234;# is the id number – remove it
3. field:TY|val
  1. field name
  2. TY = type
  3. val = value
4. Use a regex like this to parse
Now .. undoing the whole thing by hand
1. undo the HTML Entities – now this will likely be done with the XML API
  1. use the Notpad++ plugin HTML Tag (Plugin->Html Tag->Decode Enitites)
2. < -> < etc.
3. unencode entities like #x0020 to a space etc.
To extract thumbnails – they are stored in Base64 – here is a c# app to decode into jpegs