When intelligence analysts are required to understand a complex uncertain situation, one of the techniques they use most often is to simply draw a diagram of the situation. Natural language processing has matured to the point where the conversion of freeform text reports to these diagrams can be largely automated. The diagrams are attributed relational graphs (ARGs), an extension of the abstract directed graph from mathematics. In these ARGs, nodes represent people, organizations, objects, or events. Edges represent relationships like interaction, ownership, or trust. Attributes store the details of each node and edge, like a person's name or an interaction's time of occurrence. ARGs function as external memory aids, which are crucial tools for arriving at unbiased conclusions in the face of uncertain information [4].
The intelligence community's focus over the past 20 years on improving intelligence collection has come at the cost of improving intelligence analysis [4]. The problem today is often not a lack of information, but instead, information overload. Analysts lack tools to locate the relatively few bits of relevant information and tools to support reasoning over that information. Graph-based algorithms help security analysts solve the first problemsifting through a large amount of data to find the small subset that is indicative of threatening activity. This activity is often suspicious not because of the characteristics of a single actor, but because of the dynamics between a group of actors. In contrast with databases and spreadsheets, which tend to facilitate reasoning over the characteristics of individual actors, graph representations facilitate reasoning over the relationships between actors. Subgraph isomorphism and social network analysis are two important graph-based approaches that will help analysts detect suspicious activity in large volumes of data.
Subgraph isomorphism algorithms search through large graphs to find regions that are instances of a specific pattern graph [8]. The analyst defines, as a graph, activity patterns that are believed to be indicative of threatening activity. Once those patterns are defined, the algorithm identifies regions of observed activity that match the patterns. One possible pattern is shown in the figure here, along with an inexact match to that pattern embedded in an activity graph containing both threatening and innocuous activity. Supporting evidence may come from many different sources, but once evidence is incorporated into the global activity graph, the algorithm lets the analyst quickly pinpoint subsets of activity that warrant further attention. Without advanced search algorithms like these, the analyst's task of identifying suspicious activity within a huge body of evidence is much more difficult.
Being able to find inexact pattern matches is critical. Foremost, analysts operate in an environment with limited observability. In addition, the analyst might need to match a general pattern without knowing all of the details, or the analyst may have defined some aspects of the pattern incorrectly. Finding inexact matches also alerts the analyst to activity that "breaks the mold" of previous threats, and can prevent the kind of surprises for which intelligence agencies have been criticized.
In our work, we have developed a set of genetic algorithms that solve the exact and inexact subgraph isomorphism problem. The search algorithm distributes nicely, allowing it to run on a server farm for improved performance. This enables efficient searches for patterns containing as many as 75 nodes and 107 different possible realizations.
Social network analysis (SNA) is the study of human social interaction. Graph representations are ubiquitous throughout SNAsequences of interactions between people are usually represented as an ARG. SNA metrics quantify different aspects of the ARG's topology, and the metric values can be used to characterize the roles of individuals within a group, or the state of a group or organization as a whole. The key opportunity for intelligence analysis is that "normal" social interaction and the social interaction of illicit groups tend to exhibit significantly different SNA metric values.
The geodesic assumption and the redundancy assumption state that for human interaction, "People with strong relationships usually communicate via the shortest path" and "Normal social networks are redundant." Studies have shown that both of these assumptions are typically false for groups trying to hide their activities [1]. For those groups, information compartmentalization and robustness to the compromise of group members overrule the efficiency concerns that otherwise lead to the geodesic and redundancy assumptions. The resulting differences in the groups' structures can be quantified by SNA metrics. The SNA theory of homophily argues that most human communication occurs between people similar to each other. Thus people pursuing illicit activities are likely to be found communicating with others pursuing illicit activities. This "relational autocorrelation" further drives these groups' SNA metrics toward anomalous values [5].
Our work combines SNA metrics with statistical pattern classification to give the analyst a tool for automatically pinpointing suspicious group dynamics within large volumes of data [2]. This combination will yield algorithms that constantly scan incoming information, alerting the analyst to anomalous social patterns that might indicate threatening activity. Some illicit groups (for example, terrorist cells moving from a "sleeper" to an "active" state) finish in a relatively normal configuration, but the history of how they arrived there is highly abnormal [6]. Therefore, it is also important to develop techniques to classify activity based on the evolution of a group's SNA metric values [3]. Because analysts have imperfect visibility into these interactions, we must also consider the sensitivity of various SNA metrics to limited observability [7].
Subgraph isomorphism and statistical classification via SNA metrics are two important classes of techniques that operate on attributed relational graphs, a representation familiar to the intelligence analyst. These two techniques help the analyst solve one of today's most common intelligence problems: finding significant combinations of events in a deluge of information.
1. Baker, W.E. and Faulkner, R.R. The social organization of conspiracy: Illegal networks in the heavy electrical equipment industry. American Sociological Review 58, 6 (Dec. 1993), 837860.
2. Coffman, T. and Marcus, S. Pattern classification in social network analysis: A case study. In Proceedings of the 2004 IEEE Aerospace Conference (Big Sky, MT, Mar. 2004).
3. Coffman, T. and Marcus, S. Dynamic classification of groups using social network analysis and HMMs. In Proceedings of the 2004 IEEE Aerospace Conference (Big Sky, MT, Mar. 2004).
4. Heuer, R.J. Psychology of Intelligence Analysis. Center for the Study of Intelligence, Central Intelligence Agency, 2001.
5. Jensen, D., Rattigan, M., and Blau, H. Information awareness: A prospective technical assessment. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003.
6. Krebs, V. Mapping networks of terrorist cells. Connections 24, 3 (Winter 2001), 4352.
7. Thomason, B., Coffman, T., and Marcus, S. Sensitivity of social network analysis metrics to observation noise. In Proceedings of the 2004 IEEE Aerospace Conference (Big Sky, MT, Mar. 2004).
8. Ullman, J.R. An algorithm for subgraph isomorphism. Journal of the ACM 23, 1 (Jan. 1976), 3142.
This research was sponsored by the Defense Advanced Research Projects Agency (DARPA), under the EELD and Genoa 2 programs. The views and conclusions contained in this article are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of DARPA or the U.S. government.
©2004 ACM 0002-0782/04/0300 $5.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2004 ACM, Inc.
No entries found