About Me
I'm a doctoral candidate advised by Libby Hemphill, University of Michigan School of Information. I graduated with a bachelor of science from UCLA Department of Statistics and Data Science.
Research keywords: Large Language Models, Multi-agent Systems, Data Archives, Biomedical and Health Informatics, Computational Social Science
My main research interests include:
- Large Language Models, Multimodality, and AI Agents. Large Language Models (LLMs) and Multimodal Foundation Models are advanced AI models trained on extensive data, excelling in information generation and comprehension. These models power AI agents capable of complex problem-solving and in-context user interactions. Our focus encompasses both single-agent and multi-agent systems, aiming to advance the algorithms and functionalities of LLMs and their applications in AI agents. In perticular, we develop cutting-edge evaluation benchmarks and construct advanced systems that build upon and surpass the current state-of-the-art. We also integrate multimodality, enabling these agents to process and understand diverse inputs like images and audio alongside text. This approach enhances human-computer interaction, making AI agents more versatile and intuitive.
- AI-augmented Archiving and Data Curation. AI models and infrastructures (e.g., vector and graph databases) can enhance utility of archives and efficiency of curation. By augmenting metadata and streamlining workflows, from data collection to data recommendation, autonomous archiving and curation systems can bring together human and machine at scale. Specific use cases include social media archives (e.g., CHSTA & CAAHTA) and scholarly archives (e.g., ICPSR Bibliograph of Data-related Literature).
- Health and Biomedical Informatics. Advanced AI and deep learning techniques can transform health provision and auxiliary diagnosis. We seek to understand the behaviors of and the opinions towards in-need populations during health crises. We also develop innovative models for conditions like Autism and Parkinson’s Disease, utilizing multimodal data such as audio-visual inputs and facial expressions. This approach enhances the precision and efficiency of medical diagnoses. Overall, we aim to advance healthcare technology by integrating cutting-edge AI with traditional health services and medical practices, improving patient outcomes and care.
- Infometrics (Bibliometrics and Scientometrics). Knowledge, particularly those from academia and industry, is interlinked and frequently structured as tabular data. We explore ways to comprehend and represent the dissemination of such interconnected information. Using network-based metrics and data visualization techniques, We assess the effects of research data reuse and the benefits of investing in its curation. Recently, I've been pursuing a systematic approach to boost research data discoverability, with the exploration of altmetrics.
- Anti-Racism with Computational Social Science. Both the real world and the online realm grapple with challenges such as racism, sexism, and other forms of inequity. These challenges often manifest as harassment, offensive behavior, and toxicity (e.g., the Anti-Asian hate crimes during the COVID-19 pandemic & the Anti-LGBTQ discrimination witnessed during the 2022 Mpox outbreak). Accurate documentation of social media trends and public opinion is essential to prevent history from repeating itself. Immediate analytical insights and policy recommendations can address social issues in a timely manner.
Here’s a PDF of my CV.
Teaching
Employment
Get in Touch
Email is best.
- If you are a student in my class, please follow the course communication guidelines and only use my email as an "emergency" communication channel.
- If you are an undergrad/graduate student interested in working with me, please be aware that my advisor's lab (Hemphill Research) is NOT currently recruiting students.
- If you are an HR from tech industry, I appreaciate your interest. Please make sure the job is located in the U.S. before reaching out.
Publications
The following publications are listed by research themes. For a complete list of my publications, please visit my Google Scholar page.
Large Language Models, Multimodality, and AI Agents
- Fan, L., Hua, W., Li, L., Ling, H., Zhang, Y., & Hemphill, L. (2024) NPHardEval: Dynamic Benchmark on Reasoning Ability of Large Language Models via Complexity Classes. In preparation for submission to the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024).
- Hua, W., Fan, L., Li, L., Hemphill, L., & Zhang, Y. (2024) War and Peace (WarAgent): Large Language Model-based Multi-Agent Simulation of World Wars. In preparation for submission to the 41st International Conference on Machine Learning (ICML 2024).
- Li, L., Fan, L., Atreja, S., & Hemphill, L. (2024) "HOT" ChatGPT: The promise of ChatGPT in detecting and discriminating hateful, offensive, and toxic comments on social media. Manuscript accepted by ACM Transactions on the Web.
- Li, L., Ma, Z., Fan, L., Lee, S., Yu, H., & Hemphill, L. (2023) . ChatGPT in education: A discourse analysis of worries and concerns on social media. Education and Information Technologies. doi: 10.1007/s10639-023-12256-9
AI-augmented Archiving and Data Curation
- Fan, L., Yin, Z., Yu, H., & Gilliland, A.J. (2022) Using Machine Learning to Enhance Archival Processing of Social Media Archives. Journal on Computing and Cultural Heritage. doi: 10.1145/3547146
- Fan, L., Lafia, S., Li, L., Yang, F., & Hemphill, L. (2023) DataChat: Prototyping a Conversational Agent for Dataset Search and Visualization. Proceedings of the Association for Information Science and Technology (ASIS&T). doi: 10.1002/pra2.820
- Fan, L., Lafia, S., Wofford, M., Thomer, A.K., Yakel, E., & Hemphill, L. (2023) Mining Semantic Relations in Data References to Understand the Roles of Research Data in Academic Literature. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries. doi: 10.1109/JCDL57899.2023.00039
- Lafia, S., Fan, L., & Hemphill, L. (2022) A Natural Language Processing Pipeline for Detecting Informal Data References in Academic Literature. Proceedings of the Association for Information Science and Technology (ASIS&T). doi: 10.1002/pra2.614
Health and Biomedical Informatics
- Fan, L., Yu, H., & Yin, Z. (2020) Stigmatization in social media: Documenting and analyzing hate speech for COVID‐19 on Twitter. Proceedings of the Association for Information Science and Technology (ASIS&T). doi: 10.1002/pra2.313
- Yu, H., Fan, L., & Gilliland, A.J. (2022) Disparities and resilience: analyzing online Health information provision, behaviors and needs of LBGTQ+ elders during COVID-19. BMC Public Health. doi: 10.1186/s12889-022-14783-5
- Fan, L., Li, L., & Hemphill, L. (2023) Characterizing Online Toxicity During the 2022 Mpox Outbreak: A Computational Analysis of Topical and Network Dynamics. Journal of Medical Internet Research. Under review. Manuscript avaliable upon request.
- Li, X., Fan, L., Wu, H., Chen, K., Yu, X., Chao, C., Cai, Z., Niu, X., Cao, A., & Ma, X. (2024) Enhancing Autism Spectrum Disorder Early Detection with the Parent-Child Dyads Block-Play Protocol and a Hybrid Deep Learning Framework. IEEE Journal of Biomedical and Health Informatics. Under review. Manuscript avaliable upon request.
- Lv, C., Fan, L., Li, H., Ma, J., Jiang, W., & Ma, X. (2024) Leveraging Multimodal Deep Learning Framework and a Comprehensive Audio-Visual Dataset to Advance Parkinson’s Early Detection. Biomedical Signal Processing and Control. Under review. Manuscript avaliable upon request.
- Wang, X., Fan, L., Li, H., Jiang, W., Bi, X., & Ma, X. (2024) AttSeqNet: Leveraging Attention-Driven and Time-wise Splitting Seq2seq Model to Enhance Eye Movement Event Detection in Parkinson's Disease. Biomedical Signal Processing and Control. Under review. Manuscript avaliable upon request.
Infometrics (Bibliometrics & Scientometrics)
- Fan, L., Li, L., Ma, Z., Lee, S., Yu, H., & Hemphill, L. (2023) A Bibliometric Review of Large Language Models Research from 2017 to 2023. ACM Transactions on Intelligent Systems and Technology. Under reivew.
- Lafia, S., Fan, L., Thomer, A.K., & Hemphill, L. (2022) Subdivisions and Crossroads: Identifying Hidden Community Structures in a Data Archive’s Citation Network. Quantitative Science Studies. doi: 10.1162/qss_a_00209
- Hemphill, L., Thomer, A., Lafia, S., Fan, L., Bleckley, D., & Moss, E. (2024) A Dataset for Measuring the Impact of Research Data and their Curation. Scientific Data. Under review. Manuscript avaliable upon request.
Anti-Racism via Computational Social Science
- Fan, L., Yu, H., & Gilliland, A.J. (2022) Aggravated Anti-Asian Hate since COVID-19 and the #StopAsianHate Movement: Connection, Disjointness, and Challenges. In book Hate Speech on Social Media: A Global Approach. doi: 10.25768/654-916-9
- Fan, L., Yu, H., Yin, Z., & Gilliland, A.J. (2021) #StopAsianHate: Archiving and Analyzing Twitter Discourse in the Wake of the 2021 Atlanta Spa Shootings. Proceedings of the Association for Information Science and Technology (ASIS&T). doi: 10.1002/pra2.475
- Yin, Z., Fan, L., Yu, H., & Gilliland, A.J. (2020) Using a Three-step Social Media Similarity (TSMS) Mapping Method to Analyze Controversial Speech Relating to COVID-19 in Twitter Collections. Proceedings of the IEEE International Conference on Big Data (Big Data). doi: 10.1109/BigData50022.2020.9377930
- Fan, L. & Presner, T. (2022) Algorithmic Close Reading: Using Semantic Triplets to Index and Analyze Agency in Holocaust Testimonies. Digital Humanities Quarterly.
- Presner, T. & Fan, L. (2024) Algorithmic Close Reading: Analyzing Vectors of Agency in Holocaust Testimonies. In book Ethics of the Algorithm: Digital Humanities and Holocaust Memory. Princeton University Press.
Presentations and Working Papers
These are selected talks, workshop papers, and conference abstracts. Most of these work are (informally) presented or not (yet) submitted.
- Hemphill, L., Xing, J., & Fan, L. (2023) Comparing Costs for Cloud-based Data Archives.
- Fan, L., Lafia, S., Bleckly, D., Moss, E., Thomer, A.K., & Hemphill, L. (2022) Librarian-in-the-Loop: A Natural Language Processing Paradigm for Detecting Informal Mentions of Research Data in Academic Literature. Presented at the ACM CHI'22 Workshop on Data Work Across Domains.
- Fan, L. (2021) Archival Data Thinking. An invited talk at an UCLA Ed&IS lecture (Management of Digital Records, Fall 2021).
- Presner, T., Bonazzi, A., Fan, L., Tóth, G., Deblinger, R., & Shepard, D. (2020) Digital Humanities Methods for Analyzing Holocaust and Genocide Testimonies. Presented at Digital Humanities Conference (DH2020).
One More Thing
I'd like to acknowledge my junior high school teacher Chunhong Hu, who kindly and patiently oriented me for doing research and writing academic articles in his leisure time. It is my honor to include the two high school math research papers advised by him in 2010-2011, on my Google Scholar page.