So, I had some manually entered non-tables in an AutoCAD document that I needed to get to an Excel format. Non tables literally refers to tables made by placing text in a rectangular x,y fashion. So these were Bentley ISO’s that the machine makes a material take off in the upper right corner. Not knowing that the future held in terms of the number of columns – this turned out to be a “get all x values and make a histogram of buckets x millimeters wide”. This is basically how communications determines wireless signals in quadrature (x/y plane). Since humans can enter text boxes alongside the machine – what is the best guess for the grid?
We could use some col math like the Verterbi algorithm – but that is a little too complicated. Basically making a histogram and determininig the best row and column was the best approach. Unlike wireless communication theory – the bucket widths are completely known w.r.t. to each drawing that we process AND the number of columns are unknown. What is known is that within a drawing – there are a consistent number of rows and columns.
It worked by the way.
Some links.
- http://stackoverflow.com/questions/8979752/how-to-count-number-of-peaks-in-graph-graph-analysis
- Verterbi algorithm: http://en.wikipedia.org/wiki/Viterbi_algorithm