Workflow Reuse: WORKS 2011 Supplementary Materials

We report here on the reuse of a library of workflows for text analytics for analyzing data from a Question & Answering (Q&A) site. The Madsci Network is an Ask-A-Scientist website; it provides a human-mediated Q&A service that answers questions in 26 different scientific fields. Boasting a store of over 40,000 questions and answers, it serves as a unique repository of scientific knowledge. The work is described in:

  • "Making Data Analysis Expertise Broadly Accessible through Workflows". Matheus Hauder, Yolanda Gil, Ricky Sethi, Yan Liu, and Hyunjoon Jo. Proceedings of the Seventh IEEE International Conference on e-Science, Stockholm, Sweden, December 5-8, 2011. Available as a preprint.

Here we present supplementary materials for the article.


Table 0: Confusion matrix for all 26 labels



DocumentClassification testing training.PNG

Figure 0.1: Workflow: Document Classification, testing, and training on The Madsci Network

TopicModelling lastestPaper.png

Figure 0.2: Workflow: Topic Modelling on The Madsci Network using Reduce

TopicModelling Stemmer.PNG

Figure 0.3: Workflow: Topic Modelling on The Madsci Network WITHOUT Reduce WITH Stemmer

