I n the last few years, the term “big data” has become an omnipresent buzzword in the academic and professional circles and the media across the globe. Some commentators called big data as “the new oil of the 21st century”, “the world’s most valuable resource” and “the foundation of all of the megatrends that are happening today, from social to mobile to the cloud to gaming”.
The exponential growth in big data analytics can be explained from a marketbased perspective. On the supply side, data have become more readily available, and processing power has kept increasing — as predicted by Moore’s Law in the 1970s. Rapid advances in instrumentation and sensors, digital storage and computing, communications, and networks, including the advent of the internet in the mid to late 1990s, have spurred and an irreversible march towards the ‘big data revolution,’ generating, transmitting and giving access to more and more data.
Humans directly or indirectly create as much as 2.5 trillion megabytes of data each day today, and the numbers are only growing. As increasingly large amounts of data are captured from humans, machines, and the environment, the temptation to analyse them grows, a phenomenon sometimes known as datafication. The current deluge of data, spurred by the increased digitisation of information, provides countless opportunities for data mining, a set of techniques seeking to extract hidden patterns from datasets in a variety of contexts.
These new capabilities have started affecting organisations and the core processes they follow. Interest in data analytics has been growing due to the demand for more valid intelligence products following the controversies caused by the 9/11 attacks and the absence of weapons of mass destruction in Iraq. Before 9/11, the US intelligence community lacked and missed specific pieces of information pointing to the terrorist plot.
In 2002, a national intelligence estimate made a series of erroneous assessments regarding Iraq’s WMD program, which were later used to justify the US decision to go to war in Iraq. These events cast doubt on the intelligence collection and analysis capabilities of the US government, especially in the domain of human intelligence (HUMINT), and increased the pressure on senior decision-makers to adapt intelligence processes to an increasingly complex security environment. Big data capabilities, it was hoped, would compensate for the limitations, and sometimes the absence of HUMINT.
Consequently, intelligence agencies around the world began to embrace more systematic and sophisticated data collection and analysis techniques. Given the widespread use of the term “big data”, one would expect to find a new account of what it means, what it does, and how it works in the national security context. However, the field of security studies has, thus far, paid little attention to this concept. VOLUME, VELOCITY, VARIETY & VERACITY The expression “big data” is often understood as a set of enormous datasets. But what exactly qualifies as an extensive dataset? Volume is interpreted and processed differently in multiple fields and at different points in time.
For a social scientist, a dataset including hundreds of thousands of entries may seem significant, but would not appear so to a computer scientist. Similarly, while computer scientists might have considered a database of hundreds of thousands of entries to be very large in the early days of computing, today’s researchers work with billions of entries. A 2010 study found that the amount of data produced globally was 1.2 zettabytes or 1,200,000,000,000 trillion gigabytes. It is expected that by 2020, worldwide data production will reach 35 zettabytes.
The desire and ability to process such large volumes of data constitute a significant component of the definition of big data, but capacity alone is not sufficient to define big data. Early descriptions of big data describe how large amounts of data put heavy demands on computing power and resources, thus causing a ‘big data problem.’ As the world keeps producing more and more data, this problem is still with us. Processing capabilities for all these data now lag behind storage capabilities.
In other words, we have access to massive amounts of data but are not able to use them all. Volume, therefore, should be considered not in isolation, but concerning the ability to store and process data. The definition of big data must comprehend not only numbers or volume but the capacity to use these data. In a narrow sense, counter-intelligence and security aim to protect intelligence agencies against penetration by adversary services.
A broader and more universal understanding of counter-intelligence and security encompasses defense against significant threats to national security, including espionage, but also terrorism and transnational crime. One security application of big data analytics, specifically through NLP capabilities, is the identification of malicious domains and malicious codes (malware) in cyberspace. Automated data analytics can be used as a part of broader systems to defend computer networks. In the field of cybersecurity, network-based intrusion detection systems monitor internet traffic, looking for specific signatures or codes that deviate from the norm or have already been identified as malware.
Such systems help analysts to spot advanced persistent threats and automatically block cyberattacks. Cyberattacks take place at the speed of light, and this raises interesting questions about the diminishing role of humans in national security decisionmaking. When network intrusion detection systems analyze vast amounts of data to block cyber threats automatically, big data analytics effectively replaces humans. Yet significant data capabilities are not a panacea, and the inability of algorithms to take into account the broader context of an attack can make it hard for machines to detect social engineering scams on their own.
To date, security studies researchers have not explicitly defined or instituted a framework for assessing the significant data phenomenon. To fill this gap in the available literature, I have explored the characteristics, technology, and methods of big data and have situated them in the context of national security. Our exploration of big data in traditional intelligence activities — requirements, collection, processing, analysis, dissemination, and counterintelligence and security — suggests that technological advances have allowed security professionals to collect and process more extensive and more diverse amounts of data, sometimes rapidly, so that they can be analysed and intelligence can be disseminated more effectively.
These strengths and the limitations of traditional intelligence disciplines like HUMINT explain why big data tools have played an increasingly dominating role in the national security processes. Bhola Nath Sharma is the retired Inspector General of Police in Border Security Force. He has done extensive research on data analytics.