Workflow Reuse: WORKS 2011 Supplementary Materials
We report here on the reuse of a library of workflows for text analytics for analyzing data from a Question & Answering (Q&A) site. The Madsci Network is an Ask-A-Scientist website; it provides a human-mediated Q&A service that answers questions in 26 different scientific fields. Boasting a store of over 40,000 questions and answers, it serves as a unique repository of scientific knowledge. The work is described in:
- "Making Data Analysis Expertise Broadly Accessible through Workflows". Matheus Hauder, Yolanda Gil, Ricky Sethi, Yan Liu, and Hyunjoon Jo. Proceedings of the Seventh IEEE International Conference on e-Science, Stockholm, Sweden, December 5-8, 2011. Available as a preprint.
Here we present supplementary materials for the article.
Table 0: Confusion matrix for all 26 labels
Figure 0.1: Workflow: Document Classification, testing, and training on The Madsci Network
Figure 0.2: Workflow: Topic Modelling on The Madsci Network using Reduce
Figure 0.3: Workflow: Topic Modelling on The Madsci Network WITHOUT Reduce WITH Stemmer