I previously blogged about exporting files and metadata from Sharepoint. It worked!!!! Well, now that all the files are extracted – I have ended up with a csv.
- seems some double quotes are included – but a proper csv parser (eg. Excel) will turn those into 1 single quote
- The Xml column contains an xml node
- Add an xml header and a root node
- <?xml version=”1.0″ encoding=”UTF-8″ standalone=”no” ?>
- <root>
- <z:row …..
/> - </root>
- Then using Notepad++ with “XML Tools plugin installed” – you can surf the node path with ctrl-alt-shift-P
- turns out to be /root/z:row
- a c# app – try this article
- Also – use the Plugins->xml tools->Pretty Print Attributes
- An XPath /root/z:row[@ows_MetaInfo] would get what we require
- Parse this crazy thing with CRLF or
- The first number 1234;# is the id number – remove it
- field:TY|val
- field name
- TY = type
- val = value
- Use a regex like this to parse
- Now .. undoing the whole thing by hand
- undo the HTML Entities – now this will likely be done with the XML API
- use the Notpad++ plugin HTML Tag (Plugin->Html Tag->Decode Enitites)
- < -> < etc.
- unencode entities like #x0020 to a space etc.
- undo the HTML Entities – now this will likely be done with the XML API
- To extract thumbnails – they are stored in Base64 – here is a c# app to decode into jpegs