Mar 13, 2007

Another data mining approach

Few weeks ago I urgently needed some tool to gather data for me from a sites. It wasn't easy to find some good tool for crawling sites and get the data from them. Since I love experiments I decided to develop my own spider. I needed it to be human like program which can get and analyze document from a given string. After the document is loaded once - the application should find a link and click on it and then - wait the other page to be loaded and do someting like that. I used the IE automation approach, as it seemed easier then. But it was a lot of pain until all the functionalities were implemented indeed. I will pass some code and explanations later, for now you can search the http://www.codeproject.com and http://msdn2.microsoft.com/en-us/default.aspx
as they seemed to be a good starting point. When I go home I will paste some code among with explanations.

No comments: