Using Archives to Challenge Misinformation


I was asked to speak at a conference around the importance of our cultural heritage, organised by Louise Broch from Dansk Kulturarv and taking place at the offices of DR in Copenhagen.

My title was 

Using cultural archives to challenge ‘fake news’

And the outline was:

 “Those who control the past, control the future; and those who control the present, control the past.” 

Seventy years after the publication of Nineteen Eighty-Four, George Orwell’s observation remains true and relevant – but it does not have to be read as a testament to the power of autocracies. 

Instead we can treat our access to and use of cultural archives as an important tool in pushing against misinformation and ‘fake news’ in the modern world. We can use our ability to shape our access to the past for good, if we choose to. 

The Talk

This is the text I based my talk on. 

A view of the DR offices walking from the metro

First, let’s get rid of the term ‘fake news’. It has been appropriated by a number of politicians, most notably the President of the United States,  to undermine good journalism and try to damage people’s belief in the news they read.

As Claire Wardle from First Draft has argued very strongly, the term ‘fake’ is cannot cover the many different types of misinformation(the inadvertent sharing of false information) and disinformation(the deliberate creation and sharing of information known to be false), and it also taps into a whole narrative about the ‘mainstream media’ that is designed to undermine and damage the credibility of journalism. 


As a journalist myself I’d rather not be part of that process. 

So let’s try to avoid ‘fake news’.  

If I had the choice I’d probably revert to two old-fashioned words to describe the stuff we see shared online: liesand propaganda– but I’ll accept misinformation and disinformation as useful working categories.

So here’s our question, restated: what is to be done to limit the disruption, oppression and political impact caused by mis- and disinformation? And how we can use AV archives to counter deceitful content in all its rich variety?

Begin at the Beginning

Like any good archivists and librarians, we know it’s good to classify. Putting the world in order is a way to understand it better, and that’s a first step to changing it.

So here, also from Claire Wardle, is a useful classification of types of mis- and disinformation:

  • Satire or parody
  • False Connection
  • Misleading Content
  • False Content
  • Imposter Content
  • Manipulated Content
  • Fabricated Content

This classification roughly tracks both potential harm and the degree of technical sophistication needed to create the material.

We can also get a sense of the reasons why this sort of material is created and shared.  

Eliot Higgins and Claire Wardle have offered this list:

  • Poor Journalism
  • Parody
  • to Provoke or ‘Punk’
  • Passion 
  • Partisanship
  • Profit
  • Political Influence or Power
  • Pure Propaganda

And Wardle even has a matrix to show  the relationship between the types and the motives: 

We also have a good idea of the different ways the content is shared, whether unwittingly by people on social media, clicking retweet without checking, or repeated or used in a story by journalists under pressure, used by groups trying to change public opinion, or developed and spread as part of a sophisticated disinformation campaign.

So it’s not that we don’t know what’s going on. 

We even know who is doing it, with reports from people  like Carl Miller who have interviewed the people creating Trump and Clinton memes for money

The Mueller report offers a detailed analysis of how the Russian government used Fancy Bear to destabilise the US presidential election

And there’s an enormous amount being written about the increasing ease with which people can fake audio and video, sometimes called Deep Fakes because they use deep learning techniques, or ‘synthetic media’ if you want to sound more technical

So it’s not that we don’t know what is going on.  We just don’t seem to know what to do about it.

How then do we manage the issue?  How do we use what we have to counter the tide of disinformation? How do we ensure that the past is used for good and not evil?

One of the main things is obviously to check whether what is claimed is actually… you know.. ‘true’.

This is not as simple as it sounds, because we don’t really have a good model of truth to rely on, especially when it comes to political discourse, where there are many competing ‘truths’.  

We don’t have to believe that true propositions correspond to facts in the objective world as Tarski argued. Instead we might prefer to embrace the Quinean coherence model of truth and argue that a fact is ‘true’ if it corresponds to other facts in an interlinked network of supporting assertions, some of which are more core than others.

So if you won’t accept that ‘two plus two equals four (in bases below five)’ then we have no real space for discourse, while if you want to believe that Morrissey isn’t really a racist then there’s scope for discussion. Perhaps.

That is the way that fact checking organisations like Full Fact work: they check what is said against other sources, as here

And they do a great job and deserve our support.

Tools like Amnesty’s metadata check tool are also useful

And there are good examples of this sort of thing working in practice

As a result there’s a lot of advice out there about how to do this sort of checking

But what role do archives play in this? 

Well, there’s some obvious stuff, that is implicit in what I’ve just said. 

If we want to be able to check what people say against the record, then the record needs to be available.  

So as an absolute minimum we should be opening up our archives 

We know that isn’t as simple as it sounds – yesterday Louise showed me around the wonderful archives here and I saw how you face all the issues that the BBC faces in terms of old formats, partial records and limited funding.

But we can aspire. And we can try to deliver archive collections that are usable both by our own journalists and by third parties.

That means sorting out some big issues:

  • Provenance – the origin and chain of custody – matters
  • Indexing is vital – we need comprehensive metadata, perhaps generated by machine analysis
  • Rights – the ability to use the material
  • Access – people need to be able to find the stuff that is relevant
  • Speed of response – how to make this happen fast enough for newsroom or fact checkers or even for the general public

Simple stuff like The Guardian’s move to add a date to images from old stories can help

But I think there’s a deeper mission here, that is not just about putting what we currently have in our collections at the service of those who are challenging the false narratives, important though  that is.

Many aspects of our current situation are not new.  In her book The New Propaganda, written in 1937/8, the British feminist and scholar Amber Wells Blanco White dissects the use of modern communications technologies to support the Fascist regimes that then existed in Europe and attempts a psychoanalytic explanation of how mass media can lead populations to support autocratic leaders like Hitler and Mussolini.

My take on it is at

It’s a fascinating and frightening book, and was clearly used by Orwell as he designed the architectures of control used by the Party in Nineteen Eighty-Four.

(more about her here

More significantly, the book talks at length about the ways in which fascist governments must hide or misrepresent the past in order to allow their lies to flourish. This is the genesis of Orwell’s Memory Hole, to which Winston Smith consigns those editions of the Times that contain untruths.

So if we have a mission it’s this:

  • Don’t let the past be overwritten
  • Don’t let the story get unwritten

The audio visual archive lets us challenge those who would impose another narrative, because we have almost 200 years of images, 150 years of sounds and 120 years of moving image to refere to. It may be imperfect, selective, and biased  but it’s there.

And every day we learn more about the ways the past was selectively captured, and we can begin to correct for it, to ensure that the story told tomorrow is a better one.

For example I learned recently that the wet collodion photographic technique used to capture images of Māori people in the past did not register their blue tā moko tattoos, writing them out of the record. Now we know that we can begin to acknowledge it.

So to answer my original question, what can we do?

  • We can collect – today’s broadcasts are tomorrow’s archive
  • We can catalogue
  • We can make what we have accessible
  • We can provide tools that can be used by anyone to find and reference what we hold

And of course

  • We can assert the importance of the record – in all its imperfection
  • We can improve that record, to the best of our ability
  • We can speak truth to (corrupt) power and comfort the oppressed

Our role is  to strengthen the connections of the truth as expressed in our archives to the wider world, so that story becomes harder to challenge.

Of course it’s never straightforward. For example, what if what goes into the archive is itself disinformation, but comes from a powerful source and merits inclusion in the historical record?

If we record Trump’s rallies, whose truth are we telling? How do we avoid them becoming a ground truth in ten, fifty or a hundred years? The speeches of many politicians in history have been retrospectively flagged as suspect because they eventually lost – what happens in the time they are winning?

Also, what do we do about the fact that the really good deep fakes have learned from our archives in the first place – without all that well catalogued and properly subtitled footage of Obama, no fake video…

(If it’s any comfort that one won’t matter soon as the newest techniques need very little source material – Baidu has new software called Deep Voice that claims to need only 4 seconds of your voice in order to emulate it. A year ago it needed 30 mins worth..

However there is worse news…  this probably doesn’t matter anyway. However effectively we repudiate the misinformation and disinformation, people won’t change their views.

Recent research by Nieman Labs into competing narratives around politically divisive isssues in the US  demonstrates that beliefs are value-based not fact-based

“Fact-checking can’t do much when people’s “dueling facts” are driven by values instead of knowledge … most disappointing finding … there are no known fixes to this problem”

So why bother?

Because truth matters  – or at least an auditable historical record does, for those who will tell the story in the future

Because if we let the Party control the past then they will control the present and the future.

empty archive shelves in the DR archive

And finally, because there are new things to consider. I’m starting to think that current models of disinformation might matter less in future because of the wider environment within which we now exist and the growing importance of machine learning. 

An important  use of lots of the archive we now curate is going to be training machine learning systems, and those systems are increasingly going to shape our lives. We will live in an envelope defined by the decisions made by them, and we are not prepared.  

If that’s the case then getting audiovisual archives in good shape for the machine learning revolution will matter enormously and could affect that future, and it might matter more than anything we do today to challenge the conspiracy theories or antivaxxers or populist politicians willing to lie in exchange for power.

Let me explain why.

Just as nothing in the evolution of life on earth can have prepared a human being to drive a car at 80mph/130kph along a motorway [see xkcd] so nothing can have prepared us for working in a time of networked augmented intelligence, when we are so immersed in computation that the boundaries between what our brains are processing and what our silicon augments are processing blur to the point where drawing a line is impossible and there is no point in talkling about ‘virtual’ or ‘augmented’ or ‘extended’ reality as there is just the reality of the ‘extended human”.   

The technologies that have previously become embedded in our construction of the world in the past, like reading glasses or amplifying hearing aids, were not malleable and their function was defined at the time of manufacture. Even the scientific equipment we used to explore the very far or the very small or the very dangerous was a product of physical not logical engineering until relatively recently.

Today data is acquired, and processed, and presented but the processing is both malleable and mysterious, based on assumptions and models that even when made explicit are quickly forgotten. 

Our relationship to the the world is now almost entirely mediated by technologies that determine what should and should not be presented to their human operators, and in the process the code that runs those systems shapes the way hypotheses are tested, evidence is analysed, policies are turned into lived reality, and worldviews are challenged.   

For the moment most of the code running on the machines was developed, written and tested by other human beings, and the worldview embedded in that code comes at least from human bias and prejudgement. But we’re getting to the point where the systems will incorporate ML models trained on data and configured in ways that are beyond human understanding. 

If a neural network can diagnose an illness when no opthamologist can do the same, and on the basis of criteria that human operators have no access to, then what of the science grounded in the work of those same machines analysing particle collisions, pulsar emissions, or political decisions? What of the news algorithms that will schedule bulletins, shape newsfeeds, perhaps even write the stories in an age of automated journalism.

Something more than mere computation has been loosed upon the world, and some rough beast is slouching towards the silicon to be born: it is the soul of the new world.  

I do not think we are remotely prepared for this, but one thing we can do is make sure that the records we have of the past in our audio-visual archives are preserved, digitised, catalogued, annotated and accessible.  It’s the least we can do to honour the past and ensure that we have a chance of asserting some shared version of history in the future.

So to  end, I think the best thing we can do to protect ourselves is to ensure that our archives are ready for their most important task: training the machines that will eventually be trusted with our destiny.

Thank you