@jjross asked how I use SnapRaid and to put together a tutorial: https://forums.plex.tv/discussion/comment/1113441#Comment_1113441
This fits in nicely with this thread so here we go:
from www.snapraid.it homepage
SnapRAID is a backup program for disk arrays. It stores parity information of your data and it recovers from up to six disk failures.
SnapRAID is mainly targeted for a home media center, with a lot of big files that rarely change.
Beside the ability to recover from disk failures, other features of SnapRAID are:
All your data is hashed to ensure data integrity and to avoid silent corruption.
If the failed disks are too many to allow a recovery, you lose the data only on the failed disks. All the data in the other disks is safe.
If you accidentally delete some files in a disk, you can recover them.
You can start with already filled disks.
The disks can have different sizes.
You can add disks at any time.
It doesn’t lock-in your data. You can stop using SnapRAID at any time without the need to reformat or move data.
To access a file, a single disk needs to spin, saving power and producing less noise.
SnapRAID is different from hardware RAID, unRAID, FlexRAID, ZFS, Storage Spaces in one IMPORTANT way.
All of the other programs/solutions just mentioned use real-time parity creation. SnapRAID only creates/modifies parity when you tell it to.
How is this an important distinction and is a PRO or CON???
Most people think of parity as a way to rebuild an array when you have a hardware failure. While this is true, it’s not the only reason to use parity. Parity information can be used for other reasons as well. So quickly I will tell you I have my system setup to “SYNC parity” at 4am. I’m usually done for the day way before this so this time of day works for me.
So in a nutshell, everyday at 4am my system will do a quick check to see what files are new or modified in my “array” and will recalculate the parity. I can also manually kickoff a parity sync anytime I want with a simple SNAPRAID SYNC command.
Now that you have the “background info” on my sync schedule I’ll explain a PRO of SnapRAID. So lets assume I do a big “user error” and delete all my movies that start with the letter “f” from one of my drives. I then copy other data to this drive over writing what was once my “f” movies. With other programs/hardware the operating system is quite happy to delete these files and since the parity is getting written in real-time all parity is up to date and the parity knows nothing of the now missing files.
Now in this example where I just wacked one of my libraries I can get it back without having to resort to “undelete” utils which probably wouldn’t work since I just overwrote them anyway. In this case I have the parity from my last 4am run. So I just move the “new” data back off the disk to free up enough room to restore my files. I issue the proper SnapRAID command line and POOF my files are back. In this case SnapRAID is able to compare the parity and to use the other drives to figure out what’s missing. As long as I don’t remove data from the other drives I can “restore” my files pretty easily. In the event I also changed data on other drives that could effect the parity calculations SnapRAID will tell me and bypass those files. It will always recover as much as possible and not just stop.
So that is just one simple example of a PRO. Another is the ability to check for bitrot.
The biggest CON of SnapRAID is that it’s not real-time and any new additions of data you add between SYNCs is possibly at a loss in the event of a hard drive crash.
So when I add lots of new data I typically kick off a manual run instead of waiting until 4am. As easy as “snapraid sync” from the command line.
It is usually quite fast to sync data unless you change/move data around on your drives which is not ideal with snapraid.
SnapRAID is perfect IMHO for media files that don’t change often. I wouldn’t use it for repositories that are used in business where files are constantly changing. But for media libraries where we typically keep adding data without changing what is already there it’s close to perfect. BTW, you can change your files but this just causes the sync to take longer.
With the above said I also use StableBit’s DrivePool (StableBit - The home of StableBit CloudDrive, StableBit DrivePool and the StableBit Scanner). I think most people are familar with this but just in case your not familar with it. DrivePool allows you to take a JBOD (just a bunch of disks) and combine them together into one large drive. DrivePool can go beyond this and can duplicate data on different drives. You can have multiple copies of files spread out on different disks. This would require you to have 2x or 3x the amount of storage to have one or two backups of your data. Obviously this provides protection that comes at the cost of double or tripple your storage costs!
The great thing about DrivePool is that you can remove a drive (or have a crash) and your POOL survives. Only the data on the missing drive is removed from the pool. You can remove a drive, put it in another computer and access (read/write) the drive. There is no propriatary format on the drive. The data is not striped for example.
So in the event of a hard drive crash, only the data on the crashed drive is missing from your pool. If you have 5 data drives and loose 1 drive you still have 80% of media online.
So at this point we know DrivePool is pretty great for use with Windows Plex systems. You can add additional space super easiely and never hit obstacles like with other NAS or hardware RAID. You don’t need to add drives in groups nor worry about striping.
NOW THE PROBLEM WITH DRIVEPOOL
Drivepool’s solution to “backup” is to duplicate your data over multiple drives. This requires you to have double the storage. This isn’t to hard when you only have 2 or 3 disks of original storage but becomes more of a problem as your library grows. What if you have 8 drives at present, do you want to purchase another 8 drives to store these dupes? I know I don’t!
ENTER SNAPRAID WITH DRIVEPOOL
Let’s assume we have 5 original disk in our pool. Instead of adding another 5 drives in order to duplicate our data the “drivepool way” we can instead add ONE new drive for SNAPRAID’s use to store parity on. SnapRAID once setup looks at those 5 drives and creates the parity for the 5 drives but stores them only on the parity drive. You could technically now loose a drive in your DRIVEPOOL, replace the drive and use SNAPRAID to recover your data.
As we all know you can sometimes loose multiple drives at the same time. What if you were using the Duplication feature of DrivePool and lost two drives? If these two drives happen to both have the original file and the duplicated file you just lost that file!
So with our 5 original drives and 5 “extra” drives (10 total drives) for DrivePool you could loose two drives and lose data. With SnapRAID you can have multiple Parity drives. With SnapRaid we could have our 5 original drives plus 2 parity drives (7 total drives) and we could now sustain a loss of any two drives without total loss of our data. For these same 10 drives in total we could have 5 data drives and 5 parity drives and it would take a loss of 6 drives for us to loose our data.
So as you can see as your collection of media grows you can further protect it by adding a new parity drive. As an example if you had 20 data drives you could protect them pretty well with the use of 3 parity drives. You would have to loose 4 drives in total for data loss. You could technically have 25+ data drives protected by 1 parity drive but I would not consider anything less than 2 parity drives.
Once you get the hang of how to setup and use the combination of DrivePool and SnapRaid you can get a bit more elaborate. For example you can use SELECTIVE duplication in DrivePool at the directory level. On my system I have directories such as MOVIES, NFL, TV SHOWS, EDUCATION, etc that match my Plex Libraries. None of these directories are duplicated. I then have a directory called DOCUMENTS where I typically store Word, Excel, PDF, etc. These are small files compared to my media files. This DOCUMENTS directory is setup to duplicate to 2 other drives by DrivePool.
This DOCUMENTS directory is really the only directory that will change during the day for me where I’d want to be able to quickly recover/access data in the event of a hard drive crash. Since this directory is only a few GB in size it replicates quickly and I always have the data available even in the event of a couple of drive crashes.
NOW ON TO THE SETUP
The first thing you want to do is TURN OFF all balancing options in DrivePool. You do not want DrivePool moving your files around automatically to try and keep the drives balanced as this will cause you grief with SnapRAID. I personally self balance my drives. I try to attain 250GB of free space on each drive. This allows me to add a lot of media to each drive during the day without DrivePool shifting things and making my parity go out of whack.
On my system I will typically do something like this:
Drive 1: Movies# to Movies\E
Drive 2: Movies\F to Movies\G
Drive 3: Movies\H to Movies\K
For some letters like T that use mulitple drives and just keep them “balanced” aphabetically.
Any time I self balance my drives to free each drive back to 250 I rerun a sync command which gets my parity all nice again.
So back to the SnapRAID setup. So once you have turned off all the self balancing options in DrivePool you are ready to install and configure SnapRAID. This is simple. Just install it. Then you need to edit the snapraid.conf file. Here is an example from my system shopped down to 10 data drives and 2 parity drives for easier reading:
Content Drives are: E, G, M, N, O, P, Q, R, S & T
Parity Drives are Y & Z (these are both network NAS drives)
So in the config file you setup what drives are parity and what drives are content. SnapRaid will write a single file to the root of this drive (in my config). I use a name that makes sense to me. My parity drives are numbered starting at Z working backwards. My content drives in this example start at E and go upward. For content drives I use the drive letter in the file name. At any time I can look at the root directory of any drive and can tell what drive letter it is this way. No guessing in the event of a problem.
SnapRaid only requires one content list to be stored but you can use multiple drives. I write the content to all drives. Just in case. Nothing like writing it to 2 drives and having both of these drives fail.
content E:\snapraid.E.content
content G:\snapraid.G.content
content M:\snapraid.M.content
Next you setup the drives/directories you wish to apply parity to and give it a name. I start at “d1” and work my way up. You will also note I do not create parity from the root of the drive. Instead I set the “disk” to start with the GUID of the DrivePool disk/poolpart (PoolPart.aaed7454-f99c-4644-8463-cb727acc4eac). I do this because if you loose a drive and replace it DrivePool will create a new GUID. I then edit the config file to use the new GUID and then do a restore/recover op. This way as the files are being restored they are immediately available to DrivePool since they get restored to the new GUID being used.
disk d1 E:\PoolPart.aaed7454-f99c-4644-8463-cb727acc4eac
disk d2 G:\PoolPart.1abeb4f3-90e0-4895-82df-50df19529469
disk d3 M:\PoolPart.9cae94bc-8f75-4bf0-8f38-0ff2175a1d8d
Next up is the exclusions you want SnapRAID to ignore. I’m using the defaults except for:
exclude \CloudPart.e238a496-4114-4699-9782-d229dafa06ec\
I set an autosave duration so parity is save every so often instead of waiting till the end:
autosave 5000
Everything else is defaults. Here’s the complete conf file contents:
Note had to add “.txt” to file name for it to upload here.
Carlo