Making Expertise Accessible through Workflows
Workflows capture valuable expertise and therefore can enable non-expert users to reuse workflows created by experts and representing complex data analysis processes. First, workflows are often assembled from components available in well-known software libraries, which only experts are aware of and know how to use. Second, workflows capture expert-level knowledge on how these individual components need to be combined.
We wanted to evaluate whether workflows are understandable and usable by non-experts. Work to date on workflow reuse has focused on expert scientists reusing workflows from other scientists. While reuse by other expert scientists saves them time and effort, reuse by non-experts is an enabling matter as in practice they would not be able to carry out the analytic tasks without the help of workflows.
We did two studies with users who had limited data analytic knowledge and even basic programming skills to apply workflows to their data. We used a library of workflows for text analytics, that includes workflows for document classification, document clustering, and topic detection. These workflows capture expertise on using supervised and unsupervised statistical learning algorithms, as they reflect state-of-the art methods to prepare data, extract features, down-select features, and train models of the data.
Our studies support the notion that workflows can be reused by non-experts to carry out sophisticated data analysis tasks, even when they have very limited programming skills. Non-experts can reuse and extend workflows to customize them for new data and new applications.
We have two detailed reports from these studies:
- Workflow Reuse: A report by researchers who are not expert in machine learning or text analytics, using the workflows for a project that targeted the analysis of a text corpus to improve a question-answering web site.
- Workflow Usability: A report on how high school students with limited programming skills reused workflows to analyze twitter data
This work is reported in the following journal articles:
* "Making Data Analysis Expertise Broadly Accessible through Workflows". Matheus Hauder, Yolanda Gil, Ricky Sethi, Yan Liu, and Hyunjoon Jo. Proceedings of the Seventh IEEE International Conference on e-Science, Stockholm, Sweden, December 5-8, 2011. Available as a preprint.