Over the last 5 years, Jigsaw Security has been working in our OSINT-X platform to bring in information for our analyst of intelligence value. Our analyst review over 20,000 reports, news items and feed data every single day. Over the last few months we have been working with the Jigsaw development team to highlight some new capabilities that will allow analyst to monitor even more data much faster than what our analyst can do today by scoring and removing non relevant information that may not be of value to our analyst.
Bringing Entity Extraction to our OSINT & Closed Source Intelligence
One of our greatest accomplishment with our intelligence platform is that ability to extract entity information. As you can see from the above display, we now have the ability to key in on certain elements of information based on a running list of keywords. For instance, during the recent fighting in Syria, we utilized keywords such as Kurds, ceasefire, ISIS and other keywords and limited our updates to the cities and locations that were of interest to our analyst. By utilizing targeted update, we can keep up with what is occurring in real time, the same holds true for our cyber related data. None of this would be possible without our entity extraction techniques, see below...
What is Entity Extraction?
In order to make sense of millions of lines of news and information, you must be able to extract certain key elements from the information. This information is entity information and it critical when monitoring targeted environments. In short entities are the following:
Locations and Geographical Information
Persons names and identifying information
Objects and Things of interest (think munitions, missiles, terrorist or similar terms)
Who - Who is the content about (names, organizations, businesses, etc.)
What - What is occurring in the story or article?
When - Date and time information is critical - In order to figure out if something is relevant you must have date and timestamps on the data. This allows analyst to filter the information to certain time frames (think the last 2 hours for near real time updates)
How - How was the action carried out?
Keyword highlighting - A list of critical alerts can be created based on keywords, for instance if you want to be alerted to bombing incidents worldwide, you could build a stream of data using the keywords "bomb" or "bombing" with the time frame of within the last 2 hours. In addition to keyword and important event alerting, you can also count all words in information coming into the intelligence platform to rank the occurrences of words to alert you that something has occurred even if you haven't put in specific keywords yet. Think of this the same way Twitter does trending topics but without a need to tell the system what to look for. In short the intelligence system will rank activity of billions of articles to outline the most common occurrences of specific words.
Connected Data - This feature takes activity that is occurring and draws connections to other events that may be related - For instance if there is a news article the says the president has held a speech and then within minutes an executive order was signed the timeline for the president will show that there was a speech and an executive order that may be connected based on time or the contents of the speech. Data can be connected in many ways.
Important Number Sets - This feature finds contact such as social security numbers, zip codes, telephone numbers, age, date fields and more
What document types can we perform entity extraction?
One of the major limitations with ingest of data is that you need to perform some pre-processing to ensure that the right tags, references, keywords and important information (such as our entity extracted information is captured when data is ingested into Hadoop, Splunk, Elasticsearch, Solr, S3 or any other big data technology. In order to do this, we must process the documents and source text data when it is ingested. The problem is that for each document type, we must have a separate ingest process to be able to understand and extract the information contained in the document. Instead of doing this, Jigsaw Security has written the world most extensive ingest platform that includes entity extraction and support for over 22 file types naively including video, audio and non-textual files. These ingested files are then indexed so they can be found quickly and easily. Audio files are automatically extracted into transcript formats so that you can search the content of audio files, the audio track of videos is also captured as well as still images (with rudimentary image identification) right from video files. In short we have figured out the most accurate way to ingest data and to capture the needed entity and attribute information to make your intelligence system actually work.
We currently support 221 file types including docs, wavs, txt, pdf, powerpoint, xml as just some examples. Chances are we have a processor in the platform to process your files. In the event that you have a format we cannot process, we can write a custom processor in a few minutes.
Why buy inferior products when our intelligence platform can process all of your documents, log files, binary files, etc. Contact Jigsaw Security to learn how our intelligence ingest stream beats our competition. Our competitors can't handle binary files but we can. We also support all major languages and can extract information from the web, rss, file servers, ftp, sftp and many other sources such as the dark web and paste sites. In short we have the best ingest processors in the industry so stop missing items of intelligence value and utilize a system that works.