2021.03.08 - Exchange/Hyper-V/MSSql Recovery
And for something different, I try to recover data from a server that won't boot. Also a bonus Netgate SG-1100 recovery!
Intro
Recently there was a rash of Microsoft Exchange exploits that were particularly severe. A friend of mine has an older exchange server and we've been slowly working on getting him switched over to Office365. The project was solidly still in the planning phase when the patches for these exploites were released. Well we had to reboot his exchange server to apply the update and decided to reboot the Hyper-V server it was running on as well.
The servers went down and didn't come back up.
We tried to move the VHDX files to another Hyper-V server and the virtual machines wouldn't even boot. No idea how all three servers managed to die at the same time... our best guess is it has something to do with a failling hard drive and unseated stick of ram. The why wasn't that important to us as we were planning to replace it all soon anyways and even it we could figure out exactly what messed all the servers up it still doesn't fix anything, and it needed to be fixed, so here's all the tools and steps we took to fix things.
The Tools
To start with here's a quick run down of all the tools we used for this recovery:
- R-Studio - r-studio.com - Basic file recovery software
- EDB To PST Converter - edbmails.com - Extract mailboxes as PSTs from an exchange datastore file
- TestDisk - cgsecurity.org - Program for repairing damaged partition tables
- User Profile Wizard - forensit.com - Migrates Active Directory profiles to a new domain
The User Profile Wizard software is something that my friend already had the technician license of, but it's possible the free verison would have been good enough. Had the Active Directory server still been online I'd assume Microsoft has tools to help migrate to Azure AD, but with the local AD server down this tool did a fine job of moving the user profiles to the new domain.
SQL Recovery
The part we spend the longest on and I wanted to document in more detail was the SQL Database recovery. The database had apparently been in the middle of a transaction when the server crashed and the transaction log file was corrupted. This meant that the normal attach database function in SQL would fail. This also means that there will be some data loss when the database is recovered, and there's no good way to tell what was lost. We are willing to live with this, if you're not then you should probably get a pro to look at your database.
Step one was to install the same verison of SQL Server Express that the server was previously running. Looking at the folder D:\Program Files\Microsoft SQL Server we see a folder in there called MSSQL12. A quick google search gives up this version table on Wikipedia and we see that SQL12 is SQL Server 2014. Thankfully Microsoft still offers an installed for SQL Server 2014 Express so we can download that here. I installed the SQLEXPR_x64_ENU.exe as I already had SQL Server Manager installed on my computer.
Once you've got SQL Server Express installed then we need to get the database mounted. The easiest way to do that is to use an apparently undocumented command. First rename the logfile to something else. Then in SQL server run this code. Obviously replace the TestDB with what you want the imported database to have and the file name with your database file. This command is copied from mssqltips.com
USE master;
CREATE DATABASE [TestDB]
ON ( FILENAME = N'E:\MSSQL\TestDB_1.mdf' )
FOR ATTACH_FORCE_REBUILD_LOG;
GO
Now that the database is attached you'll want to check it for corruption. For that you can run DBCC CHECKDB (TestDB);. Hopefully you won't get any errors, but in my case I got a ton with a suggestion at the end to run repair_allow_data_loss to attempt to correct the errors. Obviously running that command can cause data loss, but we already decided we were ok with that. To run the repair you'll need to put the database in single user mode, run the repair command and that change the database back to multi-user
USE master;
ALTER DATABASE TestDB
SET SINGLE_USER
WITH ROLLBACK IMMEDIATE;
GO
DBCC CHECKDB (TestDB, repair_allow_data_loss);
GO
ALTER DATABASE TestDB
SET MULTI_USER;
GO
Hopefully now you can run DBCC CHECKDB (TestDB); without errors. In my case I still got an error The provided statistics stream is corrupt. My next step was to export the database as one massive file of SQL commands to recreate the database to fix the statistics stream errors. Right-Clicking on the database I choose Tasks, Generate Scripts. The under Advanced I changed Types of data to script to Schema and data.
This creates a huge text file that almost nothing wants to open. I was able to open it in SQL Server Manager in order to edit the first lines that control where the new database is created. In order to actaully run this montrosity of a file I used sqlcmd -S .\SQLEXPRESS -i TestDB.sql. Fair warning, there is no error checking what-so-ever, so if for some reason the DB creation fails (like a folder doesn't exist) the script will dump all this data into whatever database it feels like, probably master.
Netgate SG-1100 Recovery
In addition to recovering all the data we also wanted to get them setup on a stand along router. They had previously been using pfSense as a virtual machine, which makes remote managment of the server basicly impossible. Wanting to keep things as simple as possible we picked a new router that ran pfSense, the Netgate SG-1100. We were able to import the previous pfSense config without too much trouble (the way they have the interfaces laid out on that box is still very odd to me). However as we were cleaning up the wires we unplugged the power to it and discovered that it really does not like to be hard powered off. (Seems like a design flaw to me...)
The Netgate would boot back up, but only partly and would hang at the end of the boot proccess. Unfortunetly there is no easy to to reset it to factory defaults (the pin hole on it that you'd expect to reset it only reboots the system). In order to actaully factory reset the box from a corrupted config you need the factory image. There is no other way. Other units from Netgate might have a feature to factory reset from the pin hole but this one does not. (Seems like a design flaw to me...)
Netgate has the directions here: https://docs.netgate.com/pfsense/en/latest/solutions/sg-1100/reinstall-pfsense.html but basicly first you need to find the factory image. They say you'll need to open a support ticket. I have no idea what the turn around time is on their support system, but I assumed it would not be quick so I just searched the interface for the factory image file name, downloaded it and used that. This is a bad idea, don't do it, just open the support ticket and get the offical image. I searched for "pfSense-plus-SG-1100-recovery-21.02" on google and found it.
Once you have the image you need a USB drive to write it to, I used Win32 Disk Imager since I already had it installed for writing Raspberry Pi images. You can also use Rufus. You've probably already connected to console on the router, but if not then now is the time, plugging into it with a micro-USB cable will add a com port to your computer. You'll need to connect at 115200 baud.
Power off the router, connect the USB stick to it and power it back up. Interrupt the boot proccess as early as you can and you should get a Marvell>> prompt. Type run usbrecovery and hit ENTER. The router will apear to boot mostly as normal, just let it happen. When it's done booting it will ask for a destination device and should have something like mmcsd0 already there. Just push ENTER and let it start the installation. When it's done it will say Power cycle or reset to reboot. Remove the USB stick and power cycle the router and you're done!
-Nick