How dangerous is image file metadata?
Unless your digital camera or camera equipped cellphone is more than fifteen (15) years old, the chances are good that any pictures taken with that device contain metadata; which describes the (who, what, where, when and how) conditions under which the picture was taken. The metadata is stored with the picture in an image file, and goes everywhere the file is copied, uploaded or downloaded. This metadata is meant to help us, document the moment a picture was taken, and also to maintain the fidelity of edited or printed copies. But as discussed in my article on Augmented Reality, once an image file containing metadata leaves your possession, there are a variety of ways in which that same metadata can be used against you.
So, just how dangerous is image file metadata? In the past, there have been numerous discussion, examples and demonstrations of how much useful information can be extracted from image files. But publicly, nobody has admitted performing a risk assessment of image file metadata. I suspect that such assessments already exist, but are probably classified as CUI (Controlled Unclassified Information). And so, I have undertaken the task of performing a qualitative risk assessment to answer the question.
For this assessment, our Information Assets are the usable information that can be derived from the metadata contained in an image file. The amount of metadata contained in each particular image file will vary and depends upon such factors as the equipment make & model used to capture or create the image, and whether or not the image has been transformed by other equipment or computer applications; such as Photoshop.
Using the following source references, a baseline of 296 metadata “tags” were selected for use in the assessment.
- JEITA CP-3451, Exif Version 2.2, April 2002
- AWare Systems on-line, TIFF Tag Reference
- Image Property Tag Constants
- Adobe Developers Association, TIFF Revision 6.0 Final, June 1992
- JEIDA-49-2-1998, Design Rule For Camera File System Version 1.0, December 1998
These 296 tags represent the most common pieces of metadata that are likely to be embedded into any JPEG or TIFF image file, or variants thereof, that was created during the last fifteen years. Of the 296 tags used in the baseline, 221 are “public” and 75 are “private.” Public tags are those with standardized meanings and use. Private tags are those which have non-standard meanings and use; and are usually proprietary to a particular equipment or software company.
Each metadata tag relates to a particular aspect or characteristic of the image file and its contents; and it is from these tags that useful information can be derived. Metadata tags can be categorized as being one of the following seven (7) types; Date & Time, Environment, Subject, Media, Primary Image, Product, and Thumbnail.
- Date & Time, relates to when an image was first captured or last modified.
- Environment, relates to physical conditions such as lighting, camera angle and location, that were present during image capture.
- Subject, relates to identification of the persons or objects that appear in the image.
- Media, relates to technical details concerning how information is stored; both internal to the image file, and external, if the image file is stored as part of a media storage system.
- Primary Image, relates to technical details concerning the actual primary image.
- Product, relates to identification of equipment or software used to capture or process the image.
- Thumbnail, relates to technical details concerning any thumbnail or secondary images.
The tags in each metadata type category were then ranked by their “raw harm potential.” Raw harm potential is the amount of damage that could happen, if a particular piece of metadata was disclosed, intentionally or unintentionally, to an adversary. For this assessment, there are four (4) levels of harm; Great, Certain, Possible, and Negligible.
- Great harm, infers that an adversary cancause immediate harm with the disclosed metadata.
- Certain harm, infers that an adversary can cause harm with the disclosed metadata, but only after it has been subjected to some kind of analysis & interpretation.
- Possible harm, infers that an adversary may be able to cause harm with the disclosed metadata, but only after a tremendous amount of analysis & interpretation has taken place.
- Negligible harm, infers that it is unlikely that an adversary can do anything to harm you with the disclosed metadata.
Whenever metadata is subjected to analysis and interpretation, it is transformed into derived information. Derived information does not necessarily correspond to any particular piece of metadata. It is a composite result, from the analysis and interpretation of one or more pieces of metadata For this assessment, there are three (3) types of derived information; Private, Personal, and Evidentiary.
- Private information, is that which, if publicly disclosed, could cause inconvenience and/or embarrassment.
- Personal information, is that which, if disclosed, could adversely affect a person’s lifestyle.
- Evidentiary information, is that which, if disclosed, could gravely affect a person’s finances, freedom and life.
Note: For those who are familiar with my previous article on Personal Security Classification Taxonomies (PSCT), these derived information types are not the same as classification levels. The reason is that only the individual PSCT user is in a position to properly review & classify each and every piece of information. Classification levels cannot be generically assigned outside of your own context. The derived information types are broad categories, that are applicable to all cases.
Metadata in an image file is 100% vulnerable to analysis and interpretation by an adversary. Nowadays, it is hard to find any kind of imaging device that does not create image file metadata. And by design, metadata is intended to facilitate the sharing of information.
To complicate matters, is the problem of metadata persistency. As Catherine (Cat) Schwartz learned in 2003, certain kinds of metadata can be rather embarrassing. In the case of every camera, cellphone or other imaging device examined during this assessment, there was no way to opt out from having any kind of metadata embedded in your image files. Some devices do allow you to opt out from having location information embedded in images; but for the most part, there is no master “on/off switch!”
- Inadvertent Disclosure, is a threat caused by the innocent disclosure of Private, Personal or Evidentiary Information; that can be derived with little or no analysis. An example of Inadvertent Disclosure is when you upload pictures of your daughter to Flickr, not realizing that you also uploaded the GPS coordinates of your house; and that the information is just mouse clicks away from being discovered by someone who thinks your daughter might enjoy his company. In other words, the person using the camera is their own worst enemy.
- Profiling & Stalking, is a threat caused when someone is able to derive Private, Personal or Evidentiary Information from image file metadata; using modest to moderate amounts of analysis, often referred to as “casual analysis.” An example of Profiling & Stalking is when you send your picture to a blind date; who then uses a product like Opanda IExif or the Firefox Exif Viewer Add-on to figure out not only was that picture taken six years ago, but that you used Photoshop to hide that birthmark on your forehead.
- Forensic Investigation, is a threat caused when somebody who just doesn’t like you, uses deep and thorough analysis tools on every last piece of metadata they can find in an image file; in an attempt to find Evidentiary Information. Using computational forensics, it is possible to reconstruct the approximate physical conditions under which a picture was originally taken; using information derived from image file metadata. DNG image files are particularly vulnerable to computational forensics, because they can contain the greatest amount of metadata.
The adversarial threat types are related to, but are not the same as, the raw harm potential levels. “Raw harm potential” measures the “harmfulness” of metadata. “Adversarial threat” measures the amount of effort an adversary needs to “expend” on deriving any Private, Personal, or Evidentiary Information from the metadata.
For all of this analysis and interpretation comes at a cost to our adversaries. This cost is measured in time, money and other resources necessary to derive useful information from the image file metadata. As the amount of analysis and interpretation increases, so does the cost. Societal factors can also affect the cost incurred by adversaries. In societies, such as the United States, the greatest threat is from Inadvertent Disclosure, followed by Profiling & Stalking, and Forensic Investigation. In other societies, particularly those in Eastern Europe and Asia, threat and cost levels may be different; or in some cases, reversed. The ranking of threat types by cost to adversary, creates a continuum of “threat levels.”
Threat / Risk Probability
To determine the probability of risk from each of the threats, the baseline metadata tags were sorted between the three threat levels; according to the amount of analysis and interpretation needed to derive any useful information from each tag. The tags in each threat level were then ranked, based on each tag’s raw harm potential, into three risk probability groups; High, Medium and Low.
- High risk, infers that an adversary can absolutely cause harm; with any information derived from that particular metadata tag.
- Medium risk, infers that an adversary could certainly cause harm; with any information derived from that particular metadata tag.
- Low risk, infers that an adversary might possibly be able to cause harm; with any information derived from that particular metadata tag.
It is important to realize that just because 46% of the baseline metadata tags are Low risk, it does not infer that any derived information from those tags is worthless to an adversary. Similarly, High risk metadata tags are not the proverbial smoking gun. “Risk levels” (X-axis) are a measure of how harmful information derived from each metadata tag is, relative to the others. The nine (9) “Risk probabilities” (X-Y coordinates) are a measure of how likely it is that information derived from a metadata tag can harm you; if an adversary expends the required amount of effort to derive Private, Personal, or Evidentiary Information from that metadata tag (Y-axis). For example, GPS coordinate metadata tags have a “High risk, Inadvertent Disclosure probability” because an adversary can derive information of great harm to you, with very little effort.
Derived Information Risk Profiles
Now that we have a basis for determining risk probabilities, we can move on to the “fun part” of the assessment!
By trolling websites such as Bing Images, Craigslist, eBay, Flickr, Google Images, Picasa, and WordPress.com; a number of random and “interesting” images were collected, and examined for the presence of metadata. Pictures in Craigslist postings do not contain any metadata, because they are automatically removed by the website; but responding to ads and asking for “better pictures” always yielded image files that were rich in metadata. The eBay results varied, depending upon whether or not the seller selected options such as “picture gallery,” “watermarks,” and “self hosting.”
Using the metadata contained in the image files it was possible, in most cases, to identify the equipment used to capture the image. In some cases, “Maker Note” information contained actual equipment serial numbers. By cross referencing the image file metadata tags to the baseline metadata tags, it was possible to create “risk profiles” for specific devices.
Analysis & Opinion
As risk profiles were created, a trend was observed in which; as consumer devices became more advanced, the amount of embedded image file metadata would increase. The notable exception to this was the Apple iPhone. Between the iPhone 3gs and iPhone 4, the number of embedded tags stayed the same. Interestingly though, the specific tags used had changed slightly between the 3gs and 4; resulting in an 11.5% reduction of High risk, Inadvertent Disclosure. Conversely, the Motorola Droid 2 embeds 16% more metadata than the original Motorola Droid, in its image files. This extra metadata increases the chances that an adversary can derive Private, Personal or Evidentiary Information.
This trend towards embedding more metadata is not reasonably justified for consumer devices. Many inexpensive consumer cameras and cellphones do not generate metadata. And yet there are no reports of interoperability problems due to the lack of image metadata.
Until the advent of Windows XP in 2001, the only way for most consumers to examine image file metadata was with third party applications, such as Exif Reader. Because a moderate amount of effort was required to extract metadata from image files, the Inadvertent Disclosure threat was practically non-existent. The ability of Windows Explorer, in Windows XP, to examine some image file metadata made it possible to derive information with little or no effort; thereby creating the Inadvertent Disclosure threat.
As consumer grade computer applications become more adept at processing metadata, the amount of effort required to analyze and interpret metadata will decrease. This in turn will increase the Inadvertent Disclosure threat, as metadata previously assigned to higher cost threat levels, begins to migrate towards the less costly threat levels. At some point, the Forensic Investigation threat may disappear entirely; if computational forensics ever becomes a “killer application.”
- Credibility verification. Let’s face it, what you see in a picture, is not always what you get. The digital equivalent of honesty & trust issues. The use of social networks, online dating, and online classifieds such as Craigslist & eBay will fuel the use of credibility verification.
- Identification of subjects. The most obvious use is in criminal investigations. But because metadata has been around for almost a generation; it can also have a legitimate use in genealogical research.
- Legal proceedings, both criminal and civil.
- Trolling for victims and/or targets. It is safe to assume people on both sides of any given conflict will examine image metadata as part of any intelligence operation. But because of lowered costs for analysis and interpretation; the ability to perform reasonably good intelligence analysis, on image metadata, is now in the hands of gang members, criminals and terrorists.
- Some people are just plain curious!
Image file metadata is an instrumentality, or means, by which information concerning the image can be communicated. How that information is used, depends on who is in possession of the image file, copy, that contains the metadata. As the costs associated with the analysis and interpretation of metadata decrease; the number of potential adversaries, at all threat levels, will increase.
As consumer and professional grade devices become more advanced, the amount of embedded metadata is constantly increasing. This trend is occurring an unchecked rate of growth. Yet manufacturers provide no means or option for consumers to “turn off” metadata. Why is this? Is it a classic case of “Geeks Gone Wild,” or is there some kind of government mandate at work?
As such, image file metadata can be considered “inherently dangerous,” because its very nature can cause harm; unless special precautions are taken to mitigate potential harm. If you are concerned about how metadata can affect you, please contact The Assurer for a consultation.