on
Security Anti-Pattern: Path based access control
For those of you who missed my other Security Anti-Pattern post, an anti-pattern is a commonly reinvented bad solution to a problem. There are many of these in security but one that seems to be occurring quite often these days is path based access control, an access control system that use file paths to refer to objects. To the uninitiated this may seem like a good idea at first, hopefully this blog entry will eradicate such beliefs. Apologies in advance for the length of this post.
Not all objects are files
The most obvious problem with this is that not all objects are files and thus do not have paths. The UNIX credo that everything is a file isn’t entirely true. Network resources, processes, some types of IPC, etc are not files and thus do not have paths. Addressing these types of objects with a path based access control system is either not done or not done well. The limitation not to mitigate access on these kinds of objects is a direct result of the path based design.
Paths are ambiguous identifiers for files
Next, the pathname of any given file can be ambiguous. On most filesystems an inode represents a file and this inode can have any number of names. The inode is not associated with those names, rather the names are entries in a directory; it is these directory entries that refer to the inode. This means that you can refer to an inode in many ways. Since the policy specifies access by path each one of these names can have different permissions granted. The result is that the same subject, the active entity of an access, can have different access to the same object, the target of an access, based on how it is accessed. For example a users shell might only be able to read /etc/hosts but at the same time write to a hard link of that file at /home/user/chroot/etc/hosts. By writing to /home/user/chroot/etc/hosts he is also writing to /etc/hosts, this is clearly wrong. This is analogous to your door being locked when a burglar walks up to it but being unlocked if he walks to it facing backward.
Additionally, on some OS’s, an inode can be referred to differently when in a chroot, a private namespace, on a bind mount or as a relative path. The result of these is that different objects can be referred to using the same path. For example, if you have a chroot in /var/chroot that has an /etc/shadow file while in the chroot that file has the same path as the /etc/shadow file on the root filesystem. If an application is to be confined to a chroot and needs to access /etc/shadow it will also get access to the real shadow file via the same policy. This is clearly a problem; suppose I own a bright red Corvette, my car key can start my car but because of the way the locks and keys were designed I can also start any other bright red Corvette. Further, an inode can have zero names in the case that it’s been unlinked but is still open, or is not accessible in the current namespace. For cases like the path based access control system may not be able to enforce access controls on the object.
Path based policy is not analyzable
Because the paths to files are ambiguous there is no way to determine what kind of access and interaction can happen by analyzing the policy alone. When analyzing a security policy one must know unambiguously which subjects can access which objects. Without knowing this it is impossible to determine something seemingly simple like ‘What processes can access my /etc/shadow file’. While the policy will have rules that say process X can read /etc/shadow it will not be able to determine what other names /etc/shadow can be referred to as. If a hardlink to /etc/shadow exists in /var/chroot/etc/shadow then my policy analysis can’t reliably determine which processes can indeed access that object. The result of this is that much of the policy ends up being encoded into the filesystem. This has major disadvantages in that it isn’t possible to analyze the policy separate from the filesystem, it may be impossible to analyze it even if you have the filesystem since bind mounts, chroots and namespaces are set up dynamically at runtime and any attempt to analyze the policy on a running system can result in the filesystem changing underneath the analyzer. The point is that with path based access control systems you can’t be sure that the policy you are using is actually being enforced the way you intended.
Information flow is important
The alternative to file based access control is label based access control. In label based access control all objects have a single, unambiguous label that identifies them and allows unambiguous policy to be written for them. Historically label based access control was used with Multi-Level Security systems (MLS) which is a security model primarily for government use. Because MLS systems were used to limit information flow between levels many path based access control advocates claim that all label based access control systems are primarily used for controlling information flow. Information flow is simply the way a piece of information can flow through a system. A simple example of this is that Process A can write to file X. File X can be read by Process B. It is said that there is an information flow from Process A to Process B. Path based access control advocates, then, claim that information flow does not pertain to them, because they aren’t trying to make government grade MAC but instead want to simply confine some processes on the system. What they really mean to say is that they aren’t interested in confidentiality (e.g., keeping secret data secret) but instead of interested in system integrity. The main point that is being missed is that information flow is as relevant to integrity as it is confidentiality.
Consider a service running on a system such as a database. Ideally this database would be considered to be high integrity. The database would need to have its resources including the databases themselves, config files and interfaces to the server protected. A path based access control advocate might say that they simply need to write a policy for the database server to access these resources while preventing anything else from accessing them. This is only half the story, however. To truly know that the database server has high integrity you have to know not only what can directly affect it (by modifying its config files, connecting to it, etc) but also what can indirectly affect it. If an untrusted (low integrity) user can write to a file that a trusted database client can read then the untrusted user can affect the integrity of the database server. To make this more tangible, say a database backup script runs nightly by connecting to the database, locking and querying each and dumping the data to a backup file. This backup script is a trusted (high integrity) user of the database. The backup script has a config file that informs it of which databases to backup. The backup program must write an input file for the database that does the set of queries for each one of those databases. Naturally it writes the input file to /tmp to send to the database server (hypothetical example of course, but we all know there are many applications that do very dumb things like this). Untrusted user has the ability to put a file which he owns where the database backup script would normally put it so that he can modify it before it gets sent to the database server. The information flow here is untrusted user->database backup tmp file-> database backup program->database. This indirect flow is impossible to find in path based access control policies. The ‘we don’t care about information flow’ argument is totally bogus.
Shared directories
Speaking of /tmp, another problem arises with path based access control on shared directories like this. Namely, since path based access control relies on files to be named a certain thing it can’t protect pathnames that it doesn’t know about. Many programs write files to /tmp. The inability to limit interactions between programs in /tmp is a major pitfall of path based access control systems. Even the case that an application writes to predictable filenames isn’t sufficient. Consider SSH, it writes files to /tmp/ssh-???????. So you can just use a glob to refer to all those right? Well, the problem is that each of those files might be owned by a different user’s ssh session. If a user can access their own /tmp/ssh-??????? file, with this policy they can also access anyone else’s. Unfortunately such schemes rely on discretionary access control (DAC) or the standard Linux permissions to separate users in shared directories. This is entirely insufficient if you want to make any assurance to the integrity of any application using a shared directory.
Directories aren’t security equivalence classes
In a previous post I talked about equivalence classes. Equivalence classes give us the ability to specify a group of objects which are identical from an access control point of view. Path based access control systems also use equivalence classes to group objects together, unfortunately their grouping mechanism is already used by someone else, the user. Directories on most systems are used for organizational purposes. They group files together that represent similar ideas to the user, not necessarily to the security system. Take /etc for instance. This directory has many files depending on the system, and chances are they aren’t equivalent security-wise. The /etc/passwd file is probably more important than the /etc/services file, but even more important is the /etc/profile which could be used to hijack administrator logins. The passwd program needs to be able to write to /etc/passwd but should not be able to touch either /etc/services or /etc/profile. While this is easy to accomplish with path based access control since you can specify a single file consider the converse, a single file in many directories needs to be accessed by an application. If the application needs access to /bin/ping, /usr/sbin/traceroute, /sbin/ifconfig, /usr/sbin/tracepath and /usr/local/bin/ping2 the rules would have to be repeated for every one of them instead of merely labeling them as equivalent; any changes in access would also need to be propagated to them all.
User directories
User directories are another challenge. Suppose I want my users to be able to download files from the internet into their home directories, but obviously not execute them while being able to execute files that they build themselves. With path based access control this is not possible. The workaround here is to force them to download files into a downloads directory but forcing the user to act a certain way because of a limitation in your access control system negates the ‘ease of use’ argument that I’ll talk about later. With label based access control the browser, running in its own domain, can write files to the home directory and they will get labeled as browser downloaded files while files the user unpacks and builds himself would have different labels. Thus label based access control provides more flexibility, while using directories as equivalence classes, which is employed by path based access control, restricts flexibility.
Lack of object tranquility
Path based systems also do not allow for object tranquility. That is, when an object is moved or renamed the security attributes of that object change. Why does it make sense for the object to suddenly have different security attributes after being moved? It is still the same object. Lack of object tranquility introduces a load of problems ranging from races and unintended information flow to revocation issues on Linux. For comparison, Linux permissions (the file mode) are always preserved on move or rename, these permissions are associated with the inode and not with the path. I would talk about object tranquility more but a colleague of mine, Spencer Shimko, already has a good write-up here (part 1) and here (part 2).
Standard Linux DAC has a serious issue in that any application a user runs can change the permissions on files that user owns. For example, the users web browser can make their .gpg directory world readable and DAC is unable to address that. Path based access control has this problem as well. If an application is able to rename a file (or make a new hard link to it) it is essentially changing the permissions on that file, since permissions are determined by the name of the file. Granted this can be addressed by not allowing applications to rename or hard link files but this imposes additional, unnecessary limitations on the users and applications.
Binaries aren’t processes
A common misconception of path based access control advocates is that the binary on the filesystem is the same thing as the process. This is exemplified by the fact that typical policies specify a binary as the subject of a rule. The main problem with this belief is that it doesn’t take into consideration who is running the executable before deciding what access it should have. This is very problematic for user applications. If user A and user B both run their mail client in such configurations the mail client has access to both users’ mail directories in their home directories. For example, in the AppArmor evolution policy the ‘subject’ /opt/gnome/bin/evolution-2.4 is given access to read and write /home/*/.evolution/mail/*. Notice that rather than having access to only read the user’s email that ran the client the mail client is able to read (and write) all user’s email. Once again these types of systems are relying on DAC to separate users. On label based systems the processes run in their own label which is determined both by the label of the subject executing the binary and the label of the binary itself. This means that the evolution process for different users will have different labels and thus will only be able to access the corresponding users home directory. Some path based access control systems address this pitfall by having per-role rules where the binary is still specified as the process but there are per-role rules, meaning each role can have access to only its own files, such systems still suffer from the other disadvantages.
New files and security attributes
One ease of use claim for path based systems is that the files are always created with the appropriate security attributes since they are based on the name of a file. This is problematic for two reasons. Remember the web browser download example earlier? Path based systems can’t differentiate between files downloaded by the browser and files put there by the user (without changing the user’s behavior patterns). Second, and more important from a security perspective, is that you don’t want a file getting privileged security attributes if an inappropriate process creates the file. Consider the database race example. This could occur because the name of the file in /tmp was the security attribute supplied to the backup script. Similarly, if a user has the ability to create a /etc/shadow file that file should not be accessible by login, ssh or anything else. It is a major security problem when applications can create objects with permissions or privileges beyond those granted to the original application, if not explicitly allowed to.
Ease of use arguments are bogus
Finally, most path based access control advocates claim that the main advantage of their system is ease of use. At first this can seem reasonable. Yes, UNIX administrators are already familiar with pathnames, it’s very tempting to want to use pathnames for security, but it’s a fools errand, because pathnames don’t give you the right tool for the job.
Being easy doesn’t make it right. And ironically, when the pathname-based solutions attempt to make it right, it gets less and less easy. For example I once talked to an author of a privately used path based access control system and I asked how they handled the /tmp issue. Their solution was to patch all their apps to write to predictable filenames and use their access control system to prevent those apps from reading/writing to filenames that didn’t match the pattern. I was dumbfounded. How can one claim a usable system when their apps have to be changed to work around limitations? Obviously this doesn’t scale. Another example of this sort of workaround is from Crispen Cowan in a presentation claiming that an easy way to restrict individual users is to hard link /bin/bash to /bin/fubash, write a restrictive policy for /bin/fubash and set /bin/fubash as the login shell of the restricted user. Without even going into how incredibly easy this is to bypass, this is a very awkward and unscalable workaround that in my opinion isn’t at all easy to use or understand.
** Convinced?**
There are many issues with using path names in access control systems. While the inability to unambiguously determine the access between a subject and an object is a fundamental flaw there are also numerous technical and usability pitfalls such as shared directories being unusable, home directories being difficult to isolate and the inability to analyze the policy. I hope this article has helped to convince the unbelievers out there that using paths in access control systems is indeed a very bad idea.
Note: I’ve deliberately chosen not to go into detail about specific OS or MAC implementations but instead make this a general article about the use of path names in access control systems. Aside from some minor examples this should apply to most implementations. There are other resources about the specific issues with path names in certain OS’s, probably mostly mailing lists.