Method for unsupervised detection of Web Trackers
ISP customers and corporate employees unwillingly expose sensitive information to tracking services. This methodology allows to automatically detect web services performing user tracking activity. By analyzing HTTP/S network traces generated by real users or bots, this unsupervised algorithm classifies tracking services to build blacklists to block them.
Web tracking services, typically embedded in websites and portals, base their business on the collection of information about users browsing the Web. When a user visits website, her browser is induced to contact the embedded tracking services, which keep track of the visit and collect a variety of information (e.g., IP address, type of device, etc.). As tracking services are usually linked to many websites, the user is monitored and tracked almost continuously during her browsing activity. Hence, tracking services can obtain sensitive information and build accurate user profiles (religious or political preferences, etc.). Even corporates are potential targets of tracking services. In fact, they can rebuild corporate employees’ activity and, then, collect information which tech corporate would likely protect. By inspecting traffic traces, this methodology identifies services that use HTTP/S dynamic requests to track users. The algorithm analyzes the key-value pairs contained in requests and identifies the keys whose values show a one-to-one mapping with the user. Keys exhibiting such behavior are labeled as tracking, as the services using them. With the output of the algorithm, we can build blacklists to install in browser plugins or firewalls, and prevent users and employees to contact trackers, thus preserving their privacy.
- Corporate data protection;
- Unsupervised filter creation to block malicious sites.
- Completely unsupervised;
- Personalized filter lists, built using specific user/corporate traffic log;
- No pre-built models or rules;
- Compatible with all platforms.