I am an assistant professor in the Philip Merrill College of Journalism and the College of Information Studies of the University of Maryland, College Park. I have interests in research areas related to Big Data and Data Science, including Database, Data Mining, and Natural Language Processing. My current research focus is on Computational Journalism and Social Sensing. I am directing the Computational Journalism Lab at UMD. Before joining UMCP, I was an assistant professor in the Computer and Information Science department at the University of Mississippi. I earned a doctoral degree in Computer Science from University of Texas at Arlington in 2016.
For recent research projects, please visit the CJLab's webpage.
Automated Fact-checking: Politicians and media figures make claims about “facts” all the time. The new army of fact-checkers can often expose claims which are false, exaggerated or half-truths. Technology, social media and new forms of journalism have made it easier than ever to disseminate falsehoods and half-truths faster than the fact-checkers can expose them. This “gap” in time and availability limits the effectiveness of fact-checking. The goal of this project is to pursue towards a completely automatic fact-checking platform, investigate the technical challenges and propose potential solutions [C+J 2015]. We are building ClaimBuster [CIKM 2015], a platform to monitor live streams, websites, and social media to catch factual claims, detect matches with a curated repository of fact-checks, and deliver the matches instantly to viewers. Major components of the platform are- text mining, social media analysis and collaborative fact-checking. This project has received media attention from multiple news outlets, including the guardian, Austin American-Statesman, Poynter and New Scientist.
Significant Fact Monitoring: The goal of this project is to augment journalists identify data-backed, attention-seizing facts which serve as leads to news stories. Examples of such facts are- “This month the Chinese capital has experienced 10 days with a maximum temperature in around 35 degrees Celsius—the most for the month of July in a decade”, “Michael Jordan had 53 points in the Chicago Bulls' win over the Detroit Pistons. No one before had a better or equal performance in 1995-96 season”. Given an append-only database, upon the arrival of a new tuple, the challenge is to design algorithms which efficiently search for facts without exhaustively testing all possible ones [ICDE 2014, C+J 2014]. We developed FactWatcher [VLDB 2014], a system which finds story leads from ever-growing data and provides features including fact ranking, fact-to-statement translation, and keyword-based fact search. This system won an Excellent Demonstration Award in VLDB 2014.
Skyline Group: Traditional Pareto frontier (skyline) computation is inadequate to answer queries which need to analyze not only individual points but also groups of points. To approach this gap, we proposed a novel concept “Skyline Group” [TKDE 2014, CIKM 2012] that represents groups which are not dominated by any other groups. We demonstrated its applications through a web-based system CrewScout [CIKM 2014] in question answering, expert team formation and paper reviewer selection. An attractive characteristic of a skyline team is that no other team of equal size can dominate it. In contrast, given a non-skyline team, there is always a better skyline team. This property distinguishes CrewScout from other team recommendation techniques.
Crowdsourcing Pareto-optimal Objects: Finding Pareto-optimal objects through crowdsourcing has applications in public opinion collection, group decision making, and information exploration. Departing from prior studies on crowdsourcing skyline and ranking queries, it considers the case where objects do not have explicit attributes and preference relations on objects are strict partial orders. The partial orders are derived by aggregating crowdsourcers’ responses to pairwise comparison questions. The goal is to find all Pareto-optimal objects by the fewest possible questions [CIKM 2015].
University of Maryland | |
---|---|
JOUR 779V/479V, INST 408I: Computational Journalism | FALL 2019 |
INST 627: Data Analytics for Information Professionals | FALL 2020 |
INST 767: Big Data Analysis | SPRING 2020, 2021 |
University of Mississippi | |
---|---|
Engr 691: Advanced Topics in NLP | FALL 2018 |
Csci 582: Computational Journalism | SPRING 2019, SPRING 2017 |
Csci 581: Data Mining | SPRING 2018, FALL 2016 |
Csci 517: Natural Language Processing | FALL 2018 |
Csci 444: Information Visualization | FALL 2017 |
Csci 387: Software Design and Development | SPRING 2019, SPRING 2018, SPRING 2017 |
University of Texas at Arlington [primarily as a Teaching Assistant] | |
---|---|
CSE 6324: Advanced Topics in Software Engineering | SPRING 2014 |
CSE 5311: Design and Analysis of Algorithms | FALL 2013, FALL 2011 |
CSE 4334/5334: Data Mining | FALL 2014 [Course Instructor] |
CSE 3330: Database Systems and File Structures | SPRING 2012, SUMMER 2011, SPRING 2011, FALL 2010 |
CSE 1310: Introduction to Computers and Programming | SPRING 2011 |
Daffodil International University |
---|
Computer Fundamentals, Numerical Methods, Instrumentation and Control, Electrical Circuit, Compiler, Simulation and Modeling, VLSI. |