Developing custom scanner, when does Scan() recursion happen?

Developing custom scanner, when does Scan() recursion happen?

I’m developing a custom scanner/metadata agent as kind of an exercise to teach myself Python.

I’ve found a handful of resources floating around the web, but no comprehensive documentation about scanning, so I’m trying to reverse-engineer my scanner based on other scanners I’ve found.

I suppose my biggest question for the moment is, when/where/how many times is the Scan method called?

I’ve found, in a number of places, the following test code:

if __name__ == '__main__':  #command line
  path  = sys.argv[1]
  files = [os.path.join(path, file) for file in os.listdir(path)]
  media = []
  Scan(path[1:], files, media, [])
  print("Files detected: ", media)

If I point this to my own library structure, I could potentially get varying results. I’m just trying to understand.

So given this folder structure:

/
	$PLEX_HOME/Library/Application Support/Plex Media Server/	<-- Plex main folder
	(%USERPROFILE%\Local Settings\Application Data\Plex Media Server\) <-- Plex main folder on Windows
	(/var/lib/plexmediaserver/Library/Application Support/Plex Media Server/) <-- Plex main folder on Debian
		Plug-ins/
			PlexSportsAgent.bundle/
				Contents/
					Code/
						__init__.py		<-- Metadata Agent script
					Resources/
					DefaultPrefs.json
					info.plist
		Scanners/
			Series/
				PlexSportsScanner/	<-- Scanner script (not sure if I can put in a folder structure like this, but I'd like to keep it organized)
					Data/
						Leagues/	<-- Should this be moved somewhere else?
							MLB/
								Teams.json	<-- Programmatically cached teams info
							NBA/
								Teams.json	<-- Programmatically cached teams info
							NFL/
								Teams.json	<-- Programmatically cached teams info
							NHL/
								Teams.json	<-- Programmatically cached teams info
						__init__.py
						SportsDataIO.py
						Teams.py
						TheSportsDB.py
					__init__.py		<-- Defines the Scan method
	mnt/Media/
		Video/
			Sports/		<-- Sports library
				MLB/
					2021/
				NBA/
				NFL/
					2004-2005/
						NFL.Super Bowl.XXXIX.Patriots.vs.Eagles.720p.HD.TYT.mp4
						NFL.Super Bowl.XXXIX.Patriots.vs.Eagles.720p.HD.TYT.ts
					2017-2018/
						Super.Bowl.LII.2018.02.04.Eagles.vs.Patriots.1080p.HDTV.x264.Merrill-Hybrid-5.1-PHillySPECIAL.mkv
				NHL/
				UFC/
				.plexignore
				Phillies vs. Red Sox Game Highlights (7_10_21) _ MLB Highlights.mp4
				yt1s.com - Phillies vs Cubs Game Highlights 70821  MLB Highlights.mp4
				yt1s.com - Phillies vs Red Sox Game Highlights 7921  MLB Highlights.mp4

… the aforementioned script yields me the following results, presuming that sys.argv[1] was the path to the root of my Sports Library

Scan(path, files, media, subdirs, root=None)

path = "mnt/Media/Sports"
files = [
	"/mnt/Media/Sports/MLB",
	"/mnt/Media/Sports/NBA",
	"/mnt/Media/Sports/NFL",
	"/mnt/Media/Sports/NHL",
	"/mnt/Media/Sports/UFC",
	"/mnt/Media/Sports/.plexignore",
	"/mnt/Media/Sports/Phillies vs. Red Sox Game Highlights (7_10_21) _ MLB Highlights.mp4",
	"/mnt/Media/Sports/yt1s.com - Phillies vs Cubs Game Highlights 70821  MLB Highlights.mp4",
	"/mnt/Media/Sports/yt1s.com - Phillies vs Red Sox Game Highlights 7921  MLB Highlights.mp4"
]
subdirs = []

My assumptions here are that:

  • sys.argv[1] is fully-qualified (just for argument’s sake)
  • files only has the surface depth of path. If so, when does the recursion take place? The way this is currently set up, this will script will yield 5 directories and 4 files (Phillies games and a .plexignore file)
    – If I pass this through to my own Scan(), which in turn, gets filtered by VideoFiles.Scan(), ultimately I am left with 3 video files to chomp on.
    – Am I responsible for recursing the remaining folder structures myself? Or is Scan() called multiple times from the outside for the next level of depth?
    – If I am responsible for recursing myself, should the subdirs parmeter be populated? Presumably with the 5 subfolders? Or should I take the test script at face value and discover the subfolders from the files parameter? Should the files parameter be filtered from the test script to only include files, not files AND folders (os.listdir(path) if os.path.isfile(file) - or something like that)

As it currently stands, Scan() will only yield me 3 Phillies games, and whack everything else out of existence. This is why I’m confused.

Can someone, perhaps a member of the Plex team, assist me in understanding?

Try looking at Absolute-Series-Scanner/Absolute Series Scanner.py at 3188799d714c2334c5990a8e2f7508cb0c3fbc38 · ZeroQI/Absolute-Series-Scanner · GitHub

My guess is that you need to call it on every root directory.

After tinkering with it a bit, I have come to this same conclusion as well.