Text Analysis of PDF Documents Using WordStat for Stata

If you have a PDF and you want to get it into Stata for analysis using WordStat then these are the steps to follow:

  • Open Stata

  • Go the User menu

  • Select WordStat

  • Select Document Conversion Wizard

  • Browse to find the file that you want to convert

  • Tell WordStat how you want to process the file.  It can be processed as a single document, as pages, as paragraphs or sections.  The simplest way is as a whole document, however, if you do it by pages or paragraphs you can analyse between pages or between paragraphs.

  • Click on Next

  • Then select the type of file that you want to save.  The options are QDA Miner Project (.PPJ) or Stata 13 (.dta) or Stata14/15 (.dta).

  • Save the file to a location on your C:\

If you have saved the file as a Stata file, go back into Stata and then open the file.  You will have a stata file with observations and variables.  If you chose document then there will be only one observation with all the text of the document as one of the variables.    This is why Provalis Research chose Stata to work with, as a single Stata observation can hold up to 2.14billion characters.  If you chose pages, then you will have observations for as many pages as was detected by the Document Conversion Wizard.

You then perform the analysis using WordStat.

Other things to remember are that you can import multiple documents using the document conversion wizard.  For example if there were three studies that you wanted to examine then you can import all three at one time.  If you import as documents, then you would have (in this case) 3 cases / observations.  Again you can split it by pages or paragraph or section (as defined by a particular character in the document).

92 views0 comments

Recent Posts

See All