Exploring AIChE 2015 Technical Program (PD2M topic): Implementation Details

You can find the main write up of this in this location. This document lists the locations of code used.

Getting data (Scraping data from the web)

The first step to this analysis is getting data on the PD2M program (sesssions, talk title, authors, organization, abstract). The rvest package in R makes it easy to scrape websites. The code for scraping the data is here. The part which was tricky and required some manual cleaning was extracting organization names. I used the Standford Named Entity Recognizer through the R package StanfordCoreNLP for tagging the organization names and then cleaned up names that were incorrect.

Exploring data

Basic summaries by talk, sessions, authors is in this file

Getting Topics from data

The estimation of underlying topics and summaries based on the estimated topics is in this file.

Session Info

All of this analysis was done in RStudio (version 0.99.465).