Enron - Search for sensitive information

Enron email dataset was used to understand the nature of false positives in a proprietary software system which was capable of recognizing personally identifiable information present in email data.The proprietary system tagged emails based on complex set of rules and regular expression matches.However, the system had lots of false positives.

The major issue here was to identify positive samples that contain Personally Identifiable information. Hence we devised a methodology to intelligently identify sub groups of emails based on the lingusitic information so that it can be used to extract samples(positve) which can be used to develop supervised machine learning models.

Data Description : Meta data about each email was present in the form of raw text files along with tag information.
Goal: The goal of the project was to identify sub groups of email data that are probable candidates for true positives
Model :Word2vec is a group of related models that are used to produce word embeddings. These models are shallow, two-layer neural networks that are trained to reconstruct linguistic contexts of words. Word2vec takes as its input a large corpus of text and produces a vector space, typically of several hundred dimensions, with each unique word in the corpus being assigned a corresponding vector in the space. Word vectors are positioned in the vector space such that words that share common contexts in the corpus are located in close proximity to one another in the space [Wikipedia]
Results: Five prominent Clusters of data were identified using the Word2Vec Model.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Enron Email - Search for Sensitive Information.ipynb		Enron Email - Search for Sensitive Information.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Enron - Search for sensitive information

About

Releases

Packages

Languages

sreeaurovindh/enron-search

Folders and files

Latest commit

History

Repository files navigation

Enron - Search for sensitive information

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages