Content-based file matching (Fingerprints)

Magnolia · January 7, 2024, 10:33am

Given advances on image and object recognition, there seems to be another way for content based file-matching:

Recognizing key elements of a video, such as actors will give valuable information on what film we might be dealing with.

Even for an actor like John Wayne, when recognized as playing in the movie, the number of movies is quickly narrowed from many thousands of films out there to around 160.
There are even less films where e. g. Bill Murray, Dan Aykroyd and Sigourney Weaver play together. So with this simple content-based technique the selection can be reduced down to Ghostbusters 1 or 2.

This approach may be extended to other elements to get further cues, such as recognizing buildings, animals, landscapes or by identifying text in the credits or elsewhere.

This is complementary to using perceptual hashes on the audio track of video files.

All cues taken together should quickly give a very reliable hit rate.

Magnolia · October 14, 2024, 2:36pm

To those not familiar with content fingerprinting / perceptual hashes, here is an interesting (although sad) example of what is currently possible.

Not just possible as in “edge case with huge amounts of resources” but as in “built into regular Smart TVs from Samsung and LG.”

From the study:

Smart TVs implement a unique tracking approach called Automatic Content Recognition (ACR) to profile viewing activity of their users. ACR is a Shazam-like technology that works by periodically captur- ing the content displayed on a TV’s screen and matching it against a content library to detect what content is being displayed at any given point in time.

And further

ACR periodically captures frames (and/or audio), builds a fingerprint of the content, and then shares it with an ACR server for match- ing it against a database of known content (e.g., movies, ads, live feed). When the fingerprint matches, ACR server can determine exactly what piece of content is being watched on the smart TV. […] Fingerprints in ACR are essentially hash of the content, which can be matched at the server-side to identify the content.

Note: The suggestion for Plex achieves the same purpose minus the imho nefarious aspects.

The required database of known contents can be built from volunteering Plex supporters, who already have content that is mapped to titles via Plex naming conventions.

Topic		Replies	Views
Same/similar movie names, HELP!~ General Discussions	20	1291	January 7, 2020
Echoprint for music sections Feature Suggestions	3	162	June 10, 2021
Why does plex swing and miss with matches so often? General Discussions	14	1053	January 7, 2020
Groooße Filmsammlung neu in Plex einpflegen Deutsch - German server-qnap	54	2252	August 18, 2021
How to name a file to get a correct match General Discussions	24	850	January 8, 2020

Content-based file matching (Fingerprints)

Related topics