How linguistics can help curb fake news using Big Data, AI

September 03, 2019 18:48
According to a study, false news would stand a chance of at least 20 percent success in misleading readers just with the headline alone. Photo: AFP

How easily would an average individual buy into fake news?

According to a study carried out by the Massachusetts Institute of Technology, false news would stand a chance of at least 20 percent success in misleading readers just with the headline alone.

In the study, it is also estimated that on average, readers would buy into one in every five pieces of fake news which they have read.

Over the years, fact-checking has basically been a major method employed to fight fake news.

Now, as it turns out, some linguists across the world are beginning to tackle the issue from a different angle to combating fake news: using Big Data and artificial intelligence (AI) to identify fake news stories through automatic detection.

Or to put it more precisely, some linguists are now trying to use computers to project the probability of whether a news story is fake or authentic.

For example, a team of experts specializing in algorithms and linguistics with Cornell University have recently identified some “unique features” regarding the writing styles of fake news:

- The massive use of words such as “I” in an apparent attempt to explicitly stress that the news content only reflects personal opinions;

- Frequent and abundant use of prepositions in the articles such as “because of", “due to”, “regarding”, “since”, etc;

- Fake news stories often use more verbs and pronouns than authentic news stories do, and mention full names relatively less;

- There are often more punctuation marks and short sentences in fake news stories;

- Fake news articles often contain more news that is based on the observation of the writers, hence the frequent use of “saw” or “heard”;

- More emphasis is placed on how present events are going to affect the future, whereas authentic news stories tend more to compare the past with the present;

- Fake news articles often tend to carry a more positive and sentimental tone, as opposed to the relatively “negative” tone of genuine news stories;

- More frequent use of everyday colloquial expressions and a more certain tone as compared to authentic news coverage, which covers multiple possibilities in a somewhat uncertain tone.

Based on the findings of the Cornell University study, a team made up of a linguist and some other researchers at the Simon Fraser University in Canada have recently found that fake news adopts more use of expressions which are common in “hate speech”, and there are more words relating to sex, death and anxiety in fake news articles.

By contrast, authentic news stories often use a higher percentage of words pertaining to business, and the genuine news stories use words related to economy more frequently.

According to linguists who study fake news, at present their biggest problem is that many fake news sources tend to mix fake and authentic news content together in their articles so as to trick the AI system and jam the self-learning speed of computers, thereby making it increasingly difficult for them to differentiate between fake and genuine news articles.

Meanwhile, people who “create” fake news are also getting the hang of how to cover up the truth, which makes it even harder for people to identify fake news stories.

As fake news is rapidly coming of age, a growing number of academics are now taking the view that using Big Data and AI to identify problem content may be the only way out for mankind in combating disinformation in the days ahead.

After all, it is virtually impossible for us to fact-check each and every piece of news story, given the sheer number of fake news stories and the meteoric pace of their growth.

This article appeared in the Hong Kong Economic Journal on Aug 27

Translation by Alan Lee

[Chinese version 中文版]

– Contact us at [email protected]


HKEJ contributor