One of the things I was working on for most of 2012 was mobile security, specifically SE Android.
My exposure to mobile was limited to using smart phones and making minor customizations to third party ROM's before 2012.
That would all change. A group of us started looking at a specific RFP that many believed would be the major mobile entréinto the federal government and military. It wasn't. It turned into a fiasco.
What it did do, however, was solidify many people's belief that hypervisors on mobile platforms was the way to achieve multi-level cross domain access. It convinced me that they couldn't be more wrong.
Since then, my colleagues and myself were on a mission to convince the world that virtualization on mobile for security is the wrong answer. Now that I am out on my own I want to take this opportunity to share some of my own beliefs on the matter.
First a CYA part: Although these beliefs were formed while I was working elsewhere I have no reason to believe they are proprietary and I don't, in any way, represent anyone except myself.
At the beginning I, like everyone else, had no reason to doubt all the vendors out there selling mobile virtualization solutions. After all, they had demos, they had huge numbers like deployed on 1.5 billion devices. Many proposal writers would have been satisfied with this and gone to the next issue. However, because of the insane schedule requirements of this particular RFP I wanted to make sure there was software available today (which was March, 2012) that could be used immediately, on hardware that was already available.
Boy, that is an interesting rabbit hole. Anyone working in mobile knows that information in general is hard to come by. Anecdotally, people working at major mobile device manufacturers sometimes have trouble getting spec sheets for their own phones. Mobile hypervisors are no exception, and possibly worse.
Unlike Intel, which makes its own chips, ARM does not. Instead they license the chip designs to chip makers. Those makers, in general, are free to pick and choose various capabilities, extended instruction sets, add proprietary instruction sets, etc. They can even choose the endieness. This does not make a friendly environment for people who need to work very closely to the hardware, like a hypervisor developer.
Like Intel, ARM has virtualization extensions. For quite a while, in fact. ARM also has TrustZone, which is made for DRM, but provides a secure-world split from the rest of the system, and can be used for a hypervisor. The challenge at the time was that there were no mobile phones on the market that implemented either of these completely, correctly, or enabled. Further, TrustZone configuration is available only to the chip maker. This means that hypervisors that run on these technologies quite possibly work, on a development board, but not on any phones. Finally, a hardware vendor admitted that they never enable things like TrustZone because they didn't want to get into the key management business. Yay.
Well, there is still paravirtualization, right? Paravirtualization is great, really. Except that it requires modifying the OS, more specifically it means modifying the drivers. So, onto lesson #2 for the mobile noobs. No one gets driver source code except the chip manufacturers. That includes the people packaging them up into boards, the people building phones, the people building the OS for the phones, etc. Some chip manufacturers won't even let Google release binary blobs of their drivers on the website for Nexus devices, despite the fact that Nexus owners can unlock their phones and pull them all off.
One conclusion was that, while someone with deep connections to the chip makers might be able to eventually produce a phone that can run a hypervisor for multiple guest operating systems, it wasn't happening with phones on the market at the time, it wasn't happening within the schedule of this particular RFP, and it certainly wasn't going to be us doing it.
So, now the belief I went into this with has been broken down. Why did we have that belief at all? Why did we start thinking it was the right answer? We fell into the same trap that everyone else did. The Cross Domain Solution community has forgotten its heritage. Virtualization has become the magic bullet for cross domain access solutions, but why?
Once upon a time there were many trusted operating systems. They were mostly based on their non-trusted siblings. The names were straightforward; you had Trusted Solaris, Trusted IRIX, Trusted Xenix, etc. These were evaluated. You could run applications at multiple levels on them just fine. And people did, just not very many people.
By the end of the UNIX wars, most of these systems were Trusted Solaris, but they still weren't in huge use. Most people had settled on Windows for their standard tasks. Windows applications didn't run on Trusted Solaris. In fact, modern Solaris applications didn't run on Trusted Solaris, which was typically a couple major releases behind.
What to do? Well, SELinux was being developed. Virtualization was becoming possible on x86. The NSA tied the 2 together and released NetTop. You could now run Windows apps at multiple levels right next to each other on the same system, albeit very slowly. Then there were the various HAP (High Assurance Platform) projects, and other variations. Long story short, people started defaulting to 'virtualize for cross domain access', without even thinking.
For mobile this trap can be avoided entirely. You have a platform, and people are using apps written for that platform, not for another platform. You don't need to virtualize to run the apps you want.
Up until now I've been focusing on why hypervisors aren't there for mobile, and why that was the default thought anyway, but not much about the security anti-pattern part. So lets get started on that.
Mobile security is an interesting beast. It has all the issues of regular computing and a whole host of its own.
Take power management, for example. On mobile systems you want the main application processor powered down as often as possible. How do you do that while maintaining a nice user experience? You don't want an app sitting in the background constantly polling for email, killing your battery, but you also don't want everything to fire up and go download data the instant the user unlocks the device, since that will make it unusable for some amount of time. Google implemented wakelocks for this purpose. They allow apps to notify the kernel that they are doing something, and that the kernel shouldn't put the device to sleep. You also want apps to be able to simultaneously use wakelocks, and let the system sleep sooner.
Now, instead of just getting to the kernel above the apps, those wakelocks would need to make it all the way to the hypervisor. Further, without sharing wakelocks between VM's you'll have the same power draining serial usage of processor awake time.
See what I did there? Automatic info-flow between VM's, regardless of whether you do it or not. If VM1 is doing something it'll keep the processor awake, VM2 will know. If you share wakelocks and VM1 is doing something VM2 will know.
The old trusty high assurance MILS way of fixing this? Fixed time-slice scheduling. Give each VM X amount of time, move on to the next when time is up. In this world, how will the hypervisor know when to sleep? How do you manage talking on the phone, a somewhat real-time activity, if you are cycling through VM's? Finally, what is someone going to do with a phone that has a 1 hour battery life because VM's can't cooperate?
So, choose between punching a hole in the hypervisor or getting miserable power performance.
I already mentioned the issue with drivers. The current solution for hypervisors is to pass device access straight through to one (or more) of the VM's.
This is pretty standard for all virtualization based cross domain solutions, actually. Hypervisors are suppose to be small, they can't have a graphics subsystem. By using an IOMMU they assign a single VM to manage graphics (and nothing else) and then assign the DMA memory range of the device to that VM. No other VM can access that memory, so then you build one-way pipelines from all the VM's to the graphics VM and let it do compositing, etc.
As a sidenote, none of the solutions I know of actually have shared 3d acceleration, if you need 3d acceleration you have to lock a single VM to a video card.
So, back to mobile. You might be able to find an SoC with a working IOMMU, but I wouldn't count on it. I could find nothing of the sort back then.
This same method is used for all kinds of devices, such as networking. With networking, however, you now have 2-way communication. High assurance systems will need separate network cards that can each be assigned to a single VM. How do you do this on a mobile device, where there is generally one cellular radio? Typically the idea is that you have separate encryption VM's for each user facing VM.
How many device VM's are you willing to have? How many paths through the hypervisor do you open? How much security do you end up having when every device is shared between all VM's? What is the performance and power penalty if a simple action like downloading some data from a website, storing it to the filesystem and displaying a message to the screen involves no fewer than 4 VM's?
In the end you have VM's that are trusted to do the separation between user-facing VM's instead of having the hypervisor solely doing that. This means that all those assurance arguments and the fact that your hypervisor only has 10k lines of code is irrelevant. You've just added a bunch of Linux kernels to your trusted base.
In desktop virtualization based solutions these device managing VM's were typically SELinux systems. SE Android could be similarly used to lock down these systems, and at the time it was just being released. I figured that since it had to be done anyway, in order for the hypervisor use case to ever work, I might we well focus on it. At least then there would be a secure platform that could be used until the whole hypervisor thing got resolved. At this point my opinion is that even if they are available and working they aren't worth it, but I know many who disagree.
Security is always at odds with functionality, by definition. Mobile users have fairly high expectations of their devices. One example is the ability to receive phone calls. Personally I have bad experiences with my smart phones receiving phone calls without extra security so this is a serious uphill battle.
So, if a user is using a secret level VM, and his child's school calls on the unclass VM to tell him that his child has been hurt and is going to the hospital, should he be able to be receive that call? I don't think many people would want to use these devices if the answer is no, but what does saying yes entail? More holes punched through the hypervisor? Probably, but you've already made plenty of those.
How about the baseband processor? Oh, right, I haven't mentioned that yet. Pretty much every smart phone has a whole 'nother processor, operating system and software stack connected to the radio. Sometimes this processor is actually primary (it bootstraps and starts the application processor).
So, on some smart phones the baseband processor is connected directly to the microphone. Obviously while in the secret VM access to the microphone must be restricted. What does this mean? There must be a VM managing communication with the baseband processor, so it is going to have to make the hard decision as to allow the microphone to be used while a secret VM is active. This, obviously, opens all sorts of serious issues.
There are many other examples of this. Switching between 5 VM's to check email isn't going to make anyone happy. Not getting text messages from the wife while at work may not make your home life better.
You'll notice that the title of this entry specifically mentions user facing VM's on mobile devices. I do not object to using a hypervisor in conjunction with a specialized integrity measurement and monitoring system for attestation purposes. The device access necessary to such a VM would be minimal, and it would only need to do something when an attestation request was made, so it has no reason to wake up the phone. It also doesn't add many VM's that need to constantly be scheduled.
I do object to the sentiment that a hypervisor is required for multi-level cross domain access on a mobile device. I believe the sentiment is guided by where the desktop multi-level market landed, without the requirements that got it there. Additionally, I believe, by the time you have a functional implementation you won't have any more assurance than running an SE Android system that implements multi-level access controls. On top of that, the system will be significantly less usable, battery life and performance will suffer, and you'll never get away from having to modify the underlying OS anyway.
SE Android is being developed at a rapid pace and new features are added often. I can't wait to see where it goes, but I hope the hypervisor evangelists aren't able to take the steam out of it before it gets there.