Led by Indian American doctoral student Darsh Shah, a group of MIT researchers created a system that boasts it can be used to automatically update factual errors in Wikipedia articles.
MIT reports that the system promises to reduce time and effort spent by human editors who now do the task manually.
Wikipedia comprises millions of articles that are in constant need of edits to reflect new information. That can involve article expansions, major rewrites, or more routine modifications such as updating numbers, dates, names, and locations, the report, published Feb. 12, said.
In a paper being presented at the AAAI Conference on Artificial Intelligence, the researchers describe a text-generating system that pinpoints and replaces specific information in relevant Wikipedia sentences, while keeping the language similar to how humans write and edit, according to MIT’s report.
The idea is that humans would type into an interface an unstructured sentence with updated information, without needing to worry about style or grammar.
The system would then search Wikipedia, locate the appropriate page and outdated sentence, and rewrite it in a humanlike fashion, it said.
In the future, the researchers say, there’s potential to build a fully automated system that identifies and uses the latest information from around the web to produce rewritten sentences in corresponding Wikipedia articles that reflect updated information, MIT added.
“There are so many updates constantly needed to Wikipedia articles. It would be beneficial to automatically modify exact portions of the articles, with little to no human intervention,” Shah, a student in the Computer Science and Artificial Intelligence Laboratory and one of the lead authors, said in the report.
“Instead of hundreds of people working on modifying each Wikipedia article, then you’ll only need a few, because the model is helping or doing it automatically. That offers dramatic improvements in efficiency,” he added.
Many other bots exist that make automatic Wikipedia edits. Typically, those work on mitigating vandalism or dropping some narrowly defined information into predefined templates, Shah told MIT.
The researchers’ model, he says, solves a harder artificial intelligence problem: Given a new piece of unstructured information, the model automatically modifies the sentence in a humanlike fashion.
“The other [bot] tasks are more rule-based, while this is a task requiring reasoning over contradictory parts in two sentences and generating a coherent piece of text,” according to Shah.
The system can be used for other text-generating applications as well, says co-lead author and CSAIL graduate student Tal Schuster.
In their paper, the researchers also used it to automatically synthesize sentences in a popular fact-checking dataset that helped reduce bias, without manually collecting additional data.
Shah and Schuster worked on the paper with their academic advisor Regina Barzilay, the Delta Electronics Professor of Electrical Engineering and Computer Science and a professor in CSAIL.
Behind the system is a fair bit of text-generating ingenuity in identifying contradictory information between, and then fusing together, two separate sentences, the report continued.
It takes as input an “outdated” sentence from a Wikipedia article, plus a separate “claim” sentence that contains the updated and conflicting information.
The system must automatically delete and keep specific words in the outdated sentence, based on information in the claim, to update facts but maintain style and grammar.
The system was trained on a popular dataset that contains pairs of sentences, in which one sentence is a claim and the other is a relevant Wikipedia sentence.
Each pair is labeled in one of three ways: “agree,” meaning the sentences contain matching factual information; “disagree,” meaning they contain contradictory information; or “neutral,” where there’s not enough information for either label, the report elaborated.
The system must make all disagreeing pairs agree, by modifying the outdated sentence to match the claim.
The study also showed that the system can be used to augment datasets to eliminate bias when training detectors of “fake news,” which can be used to help identify fake news.
