In law enforcement, I like to think of unstructured data as the haystack. Finding the needle can feel like a cumbersome and even fruitless effort. Many years ago, I was supporting a National Law Enforcement Information Exchange program in the United States. Early on, it was recognized that there was intrinsic value in capturing names from the unstructured portion of an incident report in order to establish and expose primary relationships and associations. At the time, Natural Language Processing (NLP) was in its infancy and entity extraction was a relatively new concept.
For names, a variant list was used to identify and classify person names. Tokens that were classified as ‘anonymous’ or ‘fictitious’ were identified through token frequency analysis and removed from consideration. Of course, this was a manual process and not a machine learning exercise with a pre-trained name finder model. Our results were sometimes humorous sifting through AKA names, and sometimes disastrous as in the early stages when most records were resolving to a common criminal mastermind; ‘Miranda Warning’. Clearly the technology had some catching up to do.
With advancements in computational linguistics and machine learning, our ability to identify and extract names from unstructured documents and data stores has gone far beyond my earlier experiences. Nevertheless, I am shocked that so few agencies have endeavored to do so. Understanding that a relationship exists between two people is absolutely paramount in a criminal investigation. Sifting through hundreds of occurrence/incident reports trying to establish a relationship between two individuals is not making good use of an investigator or crime analyst’s time. I discovered that first-hand during an internship in California where I was certified as a Crime and Intelligence Analyst by the DOJ.
The IMT Intel:ID™ Appliance automates the process of discovery and provides actionable insight to investigators. Intel:ID has native NLP and Entity Extraction tools that can train and learn on a specific type of document or data store to find names that are associated with an incident/occurrence report. The names and metadata are loaded through a pipeline that checks for redundancy in the structured content and loaded as part of the underlying incident/occurrence as an explicit relationship. With Intel:ID, real-time entity resolution and non-obvious relationship detection continue on all incoming data. Alerts are generated in real-time for investigators and front-line officers and a comprehensive dossier that includes everything we know about and individual, including relationships and associations provide the insight necessary to ensure officer safety.
IMT Intel:ID™ is the thread that connects the needles in your haystack… the end result is intelligent context that you can trust.
Join IMT at the IACP 2018 conference in Orlando, Florida. We will be at Booth #629 (the IBM Booth) to talk about how IMT’s know how can help your department create intelligent context you can trust.