As a member of the Truesec incident team, I frequently find myself recovering Microsoft Endpoint Manager Configuration Manager (MEMCM). As the product has evolved over time so have the options for recovering the environment.
In my time as a consultant I've seen Hyper-V replicas, SQL Always On, and VMWare snapshots defined as the MEMCM recovery plan. However, what happens when those full VM backups are corrupted? What are the minimum requirements to bring configuration manager back online and how do you successfully plan and execute recovery.
Lets start with understanding the bare minimum required for a successful recovery.
- SQL Backup of the MEMCM Database
- File Level Backup of the CD.LATEST Directory
It's short list but also the honest minimum you need to START the recovery of your environment. You'll also want a copy of your content sources. If you run the built in MEMCM site maintenance task which executes a backup you'll find these two things are essentially what the backup task creates plus some other site settings. However, the built in backup does suffer from some short comings. The backup task tends to run longer, than a SQL backup job and tends to impact server availability when running.
Obtaining these two items is simple. If you are running the Ola Hallengren maintenance plan in your MEMCM environment you can use the included SQL backup job and backing up the CD.Latest directory is as simple as copy paste.
When you start recovery, remember the name of the server matters. You need to ensure the primary site server uses the same name. Additionally, it's ideal to restore the SQL database to an instance of the same level. While you can restore to a newer version of SQL the better practice is to install the same version of SQL, restore your database, then upgrade SQL. Some common gotcha items include:
- System Management ADSI Permissions
- Discovery Configurations
- SQL Certificate Issues (Dependent on environment config)
- Site Server Permissions to Distribution Points (If not also being recovered)
- Client Communications - (Certificates)
While doing a recovery is scary, it's also an opportunity to fix things in your environment you otherwise wouldn't have an opportunity to fix. Including things like splitting apart the locations SQL is stored.
When you perform a recovery start the process the same way you would a new build of configuration manager. Begin at the bottom and work your way up. Rebuild SQL first and ensure its stable, and all maintenance is implemented then move on to recovering the site itself.
This sounds self explanatory. However, it's important to recognize when the recovery effort is over. When in recovery you'll be making a large number of changes. These changes often can't wait for things like change control, or manager approval. You need to act swiftly and with confidence. As a general rule you can call recovery completed when:
- Basic OSD works again - don't try to be fancy
- Basic Application installs work - don't stress about complex apps
- Basic Software Update Management
Recovery is a Marathon
Whatever the reason your environment needs to be recovered remember its a marathon not a sprint. When recovering an environment, remember you are attempting to reassemble what was likely years of work in a matter of hours or days. Having a plan on how you intend to run the marathon, and understanding when you can sprint or when you can coast is vital.