In the past decade, the proliferation of data and the emergence of large language models have presented both opportunities and challenges in academia. The expanding volume of data, which records knowledge from various human activities, enables data-driven approaches to optimizing numerous aspects of industrial manufacturing and people's daily life. These improvements largely stem from machine learning models trained with this data. However, the industry still faces limitations in both extracting knowledge from large, unstructured, or heterogeneous datasets and transforming the extracted knowledge into actionable insights. This challenge is exacerbated in highly specialized domains where only a few analysts possess the expertise to interpret the data. Despite the recent advancements of large language models providing more intelligent assistance for many data analysis tasks, it remains essential to ensure that these machine learning models and the knowledge they encompass are safe to use and employed for social good with human verification.
In my dissertation work, I develop visual analytics (VA) and human-computer interaction (HCI) methodologies for representing and interacting with various forms of knowledge and data, particularly text data. I propose a visual knowledge discovery framework that integrates human expertise with computational approaches throughout the knowledge discovery process, while also addressing the limited availability of domain experts and the increasing scale of data. Moreover, I investigate how visual analytics can efficiently and safely harness extensive knowledge from large machine learning models, enabling users to effectively steer the exploration process and make well-informed decisions.
This dissertation presents six published research works organized around my visual knowledge discovery framework and its three key tasks: knowledge exploration, knowledge presentation, and knowledge exploitation. Firstly, I demonstrate how visual analytics can support knowledge exploration with large, high-dimensional, and heterogeneous data in the domain of manufacturing and machine maintenance. Subsequently, I introduce two knowledge presentation solutions for two distinct types of data—numerical data facts and unstructured text data. Lastly, I showcase three visually-assisted knowledge exploitation applications in various domains and scenarios, encompassing document summarization, technical text annotation, and data-driven machine learning model validation.
My work demonstrates how mixed-initiative methods through visual analytics applications can resolve real-world challenges in highly-specialized domains. I leverage state-of-the-art machine learning techniques, particularly natural language processing models, while always involving domain practitioners in the loop. My approach facilitates communication among parties with mismatched knowledge levels, including domain experts, data analysts, computer scientists, and artificial intelligence. Meanwhile, I prioritize the critical role of human knowledge and integrate it into intelligent visualization interfaces that undergo qualitative evaluations. I believe that domain experts' insights, supervision, and verification are invaluable, regardless of how advanced machine learning techniques become. Through the projects outlined in this dissertation, I hope to encourage philosophical and social discussions surrounding the rapidly expanding field of artificial intelligence. Ultimately, my objective is to contribute to a future where intelligent visual analytics systems can augment and enhance human capabilities, enabling individuals to navigate through the potential challenges brought by advanced AI techniques.