I wrote this survey for my course called: “Advanced Topics in Data Bases: Cloud Data Management”
ABSTRACT
In this survey, we examine a challenge for cloud service providers: designing privacy into their cloud computing applications. The survey describes different privacy risks and threats that must be taken into consideration when designing privacy-aware cloud applications, including: data loss, legal liabilities, unauthorized access to data, and more. Additionally, we identify privacy requirements for privacy-aware cloud applications, such as: data quality, accountability, openness and transparency, and many others. For software engineers, architects and designers, we provide different guidelines for designing privacy-aware cloud applications which include: recommended practices, tradeoffs of privacy-aware designs, and technologies that are useful for the design stage. Furthermore, we present six privacy designs that present different solutions to different privacy issues. To conclude, we discuss other issues related to designing privacy-aware applications and present our conclusions and opportunities for future research.
1. INTRODUCTION
For users or organizations that do not have the resources or purchasing power to store and manage large amounts of data by themselves, cloud computing is a tempting solution. However, as users store their information in data centers that they do not operate, privacy becomes an issue. Privacy is the right to protect sensitive data and personal information from unintentional and intentional attacks and disclosure [7].
According to [13], in a survey made to European citizens regarding perceptions on privacy, two-thirds of the participants expressed concerns that organizations holding their personal information would not handle them properly. Furthermore, the survey also showed that eighty percent of the citizens interviewed feared data leakage. These results demonstrate the need for privacy-aware applications that can combat current and future threats.
In most cases, privacy is not a primary design goal in software development [8]. Consequently, many companies insert privacy as an add-on, which fails to provide enough privacy guarantees.
In this survey we examine guidelines and requirements for designing privacy-aware cloud applications, as well as looking at the latest design mechanisms used to solve different privacy issues on cloud applications. This survey also discusses the different privacy threats and risks that privacy-aware cloud applications are exposed to. This survey will:
- provide an evaluation of privacy risks and threats in cloud computing;
- list a set of privacy requirements software engineers should take into consideration when designing privacy-aware cloud applications;
- provide a detailed review of privacy designs targeted to privacy-aware cloud applications;
- provide an enumeration of design guidelines for software engineers who design privacy into their cloud applications;
- and promote a discussion of future research directions.
The rest of this article is organized as follows: the background and the literature selection process are introduced in section 2; privacy risks and threats are reviewed in section 3; privacy requirements and guidelines for designing privacy-aware cloud applications are discussed in sections 4 and 5, respectively; privacy designs for different privacy issues are explained in section 6; some interesting privacy related issues are discussed in section 7, and this survey presents its conclusions in section 8.
2. BACKGROUND AND LITERATURE SELECTION
Prior to reviewing different designs, requirements and guidelines concerning privacy-aware cloud applications, we must define the problem and note some of the literature that has already been written on the topic.
2.1 Problem Definition
Privacy is one of the main concerns users or companies have about cloud computing. Protecting the privacy of a user is extremely important. Users are frustrated with data systems that do not define the behaviors that impact their privacy. Ignoring this frustration has led to an erosion of trust, negative press and even lawsuits [11].
Privacy aware data systems need to protect different types of information such as [14]:
- Personal Identifiable Information (PII): information that could be used to identify an individual. PII could be a: name, address, phone number, fax number, email address, and others.
- Sensitive information: information that must be specially protected because it could cause serious harm to a user. For example, information that could be used to discriminate against an individual based on religion,race, ethnic, background or political opinions. Furthermore, information that could facilitate identity theft or permit access to a users account (passwords or pins) is considered sensitive information [11].
- Usage data: refers to data collected from devices such as printers, visited websites or the usage history of a product.
- Unique device entities: information that might help to trace a user device such as: IP addresses or unique hardware identities.
Nowadays, cloud computing service providers have a very specific problem: design for privacy in order to decrease privacy risks and threats.
2.2 Literature Selection
The research was conducted in two electronic databases: the ACM Portal, and Google Scholar. These databases were chosen because of the broad range of computing disciplines they cover.
In this survey, the goal is to review research findings, and present a survey for designing privacy-aware cloud applications. The databases returned thousands of articles related to privacy, which were narrowed by reading the abstracts and looking for articles with keywords such as: privacy design, privacy guidelines, and others. Additionally, the list of references was evaluated to identify titles that met our search criteria. The result of this selection process allowed us to find different authors that investigated different aspects of designing privacy-aware cloud applications.
Pearson (2009) focused on the privacy challenges that software engineers face in a cloud computing environment. Additionally, Pearson suggested design principles that should be taken into consideration when software engineers work on cloud applications.
In 2007, Microsoft recognized that there were no industry-wide practices to protect a customer’s privacy. Consequently, they proposed a set of guidelines for respecting customer privacy, data integrity, and improving the level of trust between the industry and customers.
Gu and Cheung (2009) researched the development and testing of privacy-aware systems in a cloud environment. They believed that methodologies to design and test a system are a must and must be established on every cloud application.
Mowbray and Pearson (2009) developed a client based-privacy manager which helps to reduce the risk of storing sensitive information in a cloud.
Nyre and Jaatun (2009) proposed a way of analyzing policy enforcements made by cloud service providers by calculating the probability they will follow privacy policies.
Wang et al (2009) asserted a privacy-preserving public auditing system that is able to audit data without requiring a copy of the data.
Casassa et al (2003) worked on a privacy model to address two privacy issues: letting users control their personal information and make cloud service providers accountable of their behavior while they deal with user’s personal information.
Creese et al (2009) addressed whether there are opportunities to design data protection in the early stages of software development.
3. PRIVACY RISKS AND THREATS
In this section we discuss privacy threats and risks that software engineers must be aware of when designing cloud computing applications.
3.1 Privacy risks on cloud computing
In cloud computing applications, data is stored in a platform that is shared by multiple users and organizations. As a result, many risks arise from the fact that confidential information is stored outside of the boundaries of a user or an organization.
Pearson [14] correctly identifies the risks for parties involved in a cloud computing application such as: users, companies and cloud computing service providers.
Users using cloud applications face risks such as: being obligated or convinced to give personal information against their wishes. Furthermore, if gathered, their financial details and health data can be exploited against a user.
Companies using cloud applications face risks such as: data loss or leakage. As a result, companies using the services of a cloud provider are exposed to a loss of their reputation and credibility.
Nonetheless, cloud computing service providers are the ones that face greater risks such as: legal liability, loss of reputation and credibility, and lack of users trust.
Charlesworth and Pearson identified two privacy risks users are exposed to when using cloud computing applications: outsourcing, and offshoring.
Outsourcing of data processing raises governance and accountability questions [3]. For example, which party is responsible for ensuring legal requirements are observed, or that data is handled properly? To what degree can data processing be outsourced? How can users verify the identities of subcontractors?
It is very likely that cloud service providers that outsource their data processing to third parties will have weak trust relationships with their users. Furthermore, mechanisms such as data deletion will be hard to detect.
Offshoring data processing increases risks factors and legal complexity [3]. Questions of jurisdiction become relevant, including: in which country can a trial be conducted? Whose law applies?
A cloud service provider that combines outsourcing and offshoring may raise very complex issues [3].
3.2 Privacy threats on cloud computing
Privacy threats vary according to their scenario. Cloud applications can face low threats if the information, at some point, will become available to the public. On the other hand, services that are customized (based on location, user preferences, and others) face higher threats.
According to [14] there are several main threats software engineers should be aware of:
- Personal information about a user could be used, stored or propagated in a way that is not acceptable according to a user’s agreement.
- People outside of the cloud could get inappropriate or unauthorized access to personal data. This could happen by taking advantage of security holes or data being exposed. For businesses that stores sales data on cloud applications, they face threats that its data could be sold to business competitors, exposing confidential information about their business model.
- Legal non-compliance. For example, restrictions on transborder data flow may apply, and some data may be subject to additional regulations.
4. PRIVACY REQUIREMENTS
The Fair Information Practice Principles (FIPPs) is an effort from the United States Federal Trade Commission to establish privacy policies for online entities that collect personal information. The FIPPs are widely accepted by many foreign nations and international organizations. These principles can be applied to cloud computing, and provide a good foundation to establish minimal privacy requirements that every cloud computing application should provide. The FIPPs are:
- Accountability: organizations managing personal information should be accountable for taking steps to ensure that privacy practices and policies are followed. In addition, organizations need to audit their adherence to privacy principles, and monitor the controls used to manage privacy [5]. More details can be found in section 7.2.
- Security safeguards: cloud computing service providers should be responsible for protecting personal information from being lost, destructed, used, and modified through all phases of the software life cycle.
- Purpose specification: personal information should be limited to the purpose for which it was collected. The purpose of the collection should be specified, and if any changes occur, those changes need to be publicized as well.
- Use Limitation: data should not be disclosed or used for anything other than a specified purpose without the consent of the user. Data should only be kept as long as is needed. To accomplish use limitation our design needs to create roles to define who can access the information, audit access and use, manage access based on authorization, and log access requests [5].
- Openness and transparency: cloud computing service providers need to inform users what information they want to gather, how the information is going to be used, to whom it will be shared, and any other inquiries. Users should have the means to learn about how their personal information will be used.
- Individual participation: users should be able to access the information, request modifications, and challenge the cloud provider’s privacy policies.
- Data quality: users should be able to check the accuracy and completeness of their current personal information. Cloud providers have to guarantee the accuracy of the information held.
- Choice and consent: users must be given a choice whether they want to share certain information or not. Cloud service providers need to create methods to obtain consent from users and document the history of given and denied consents [5].
According to [12], privacy legislation varies according to the country. Furthermore, privacy laws can have different views. For example, in the European Union, privacy is a basic right, but in the Asia Pacific region, privacy legislation is focused on avoiding harm. As a result, depending on the country, legislation could impose requirements such as: agreeing to rules regarding data retention and disposal, data access, and more.
5. GUIDELINES FOR DESIGN
This section provides guidelines for software engineers who design and develop privacy-aware cloud applications. It is unrealistic to expect that developers will be trained on privacy standards, but they do have a responsibility to follow a minimum set of development practices to reduce privacy flaws.
5.1 Recommended practices
According to [14] there are six recommendations software engineers, system designers, developers, and architects should take into consideration when designing cloud computing applications.
5.1.1 Minimize personal information sent to and stored in the cloud
The best way to protect a customer’s privacy is to not store his data [11]. However, data needs to be stored. As a result, cloud designers can benefit from analyzing the minimal amount of information required from a customer in order for a cloud to operate. Cloud applications need to store only data which is planned to be used immediately, and is absolutely necessary to achieve a determined business purpose. When data is no longer needed it needs to be deleted [11].
Storage data mechanisms can be lessened if there is less information to store in a cloud. Nonetheless, when personal information is sent to the cloud it can be protected in the dataset by using encryption or data mining techniques.
5.1.2 Protect personal information in the cloud
Personal information has to be protected from any loss or theft. Employees or independent companies that access a user’s personal information need to have a business purpose for accessing the data. Additionally, employees or third parties should only be given access to information they need to fulfill their business purpose. To ensure this, security safeguards can be used in order to prevent unauthorized access, copying, or modification of personal information.
5.1.3 Maximize user control
Users or companies must be given access to control the data that is being stored about them. Lack of control generates distrust. Giving control to users about their information generates trust. There are several ways to give users control of their information. For example, users should be able to access a user interface to modify their personal information on the cloud at anytime. Also, users could choose a third party company to audit the way their information is being managed on a cloud. In order to respond to these requests, it is important to design a system that is able to show how data for a specific user is being stored and disclosed.
5.1.4 Allow user choice
Users must be presented with a choice whether they want to share their information or not. A user’s consent must be obtained. To accomplish this, designers can create opt in and opt out mechanisms, to allow users to decide if they want to share their information or not. However, legal requirements for opt in and opt out mechanisms can vary among the different places a design may be used. It is preferable to use rigid requirements,which can satisfy most of the places a design might be exposed.
5.1.5 Specify and limit the purpose of data usage
When the information is loaded into the cloud, it must be limited to the preferences and conditions set by a user or organization. Data usage has to be restricted only to the user’s specified purpose. A cloud application design should always validate the data usage against the allowed usage intentions.
5.1.6 Provide feedback
Cloud applications should be user friendly and clearly indicate privacy functionality by using icons, providing tutorials, help documents, and visual cues. Applications need to be designed in a way that provides users with feedback, allowing them to make knowledgeable decisions about their privacy.
5.2 Tradeoffs of privacy-aware design
Designers of cloud computing applications need to provide protected and efficient interaction between users and providers.
Nonetheless, some traditional solutions that aid software engineers, architects and others to build privacy-aware cloud applications, introduce some tradeoffs to the design. According to [8], solutions such as encryption, deprive cloud service providers the opportunity of merging identical data, which would reduce storage space. Additionally, encryption hinders the capability to index and process the data.
5.3 Privacy Impact Assessment
According to [14], in the early stages of the design phase, it is recommended that cloud service providers conduct a Privacy Impact Assessment (PIA). The PIA is one tool used to aid an organization in making sure that the choices made in the design stage meet the privacy requirements of a system [2].
There are five reasons why [9] believes organizations should do a PIA:
- Identifying and managing risks: PIA provides means of addressing project risk as part of the overall project management. Organizations may find it useful to plan a PIA within the context of risk management.
- Avoiding unnecessary costs: conducting a PIA helps to identify problems in the early stages of a project. As a result, the cost of the making changes will decrease, since it is only at later stages where the cost of making changes is higher.
- Inadequate solutions: when solutions for privacy risks are implemented at later stages, they are not as effective as those that are incorporated at the start of the project. Incorporating privacy solutions in the early stages can make the project more resistant and in a better position to recover from any possible failure.
- Avoiding loss of trust and reputation: PIA provides the means that ensure that systems are not deployed with privacy risks or flaws which could surface into the media. As a result, PIA could help an organization to maintain and increase their reputation.
- Informing the organization’s communications strategy: conducting a PIA should help the organization to understand the project, and evaluate the perspective of stakeholders. By understanding the concerns of the stakeholders, an organization can understand if further information is needed regarding a project, and can handle any misinformation campaigns created by an opponent.
Similar methodologies exist with a legal status in countries such as Australia, Canada, and the United States of America [14].
5.4 Adopt and Integrate Privacy-Enhancing Technologies (PETs)
Privacy Enhancing Technologies (PETs) is a set of tools or mechanisms that, when integrated or used alongside an application, reduces the risk of breaking privacy principles or legislations [10]. Additionally, PETs diminish the data a cloud service provider needs to store about a user and allows individuals to control their information [10].
According to [10], when it comes to handling personal information, PETs provide good design goals, offer demonstrable business benefits and a competitive advantage for cloud service providers that adopt them. PETS can be classified in two categories: privacy management and privacy protection tools.
5.4.1 Privacy Management Tools
Privacy management tools allow users to look at the procedures and practices used by cloud service providers that handle their information. Additionally, they tell the users the consequences of sharing their information which improves the user’s understanding of privacy-related issues.
5.4.2 Privacy Protection Tools
Privacy protection tools hide a user’s identity, reduces the information revealed to a cloud service provider, and covers-up network connections details. Privacy protection tools are able to authenticate online payments while making it impossible to find a
connection to the user originating the transaction [10]. Several software tools fall into this category such as:
- Anonymising tools: these minimize the information exposed to a cloud service provider. For example, they can hide the IP address of a user.
- Information security tools: these prevent unauthorized access to systems, files or communications in a network.
5.4.3 Drawbacks
For cloud service providers that use agile development for their privacy-aware systems, it is very difficult to agree and develop standards for PETs [10]. Some providers feel PETs introduce unnecessary complexity or that the technology itself could become obsolete in the near future. Also, legacy systems have a hard time integrating PETs since they are incompatible.
5.4.4 Future of PETs
According to [10], PETs researchers concur that there is a need to design systems in a privacy-friendly way, and for cloud service providers to incorporate PETs into their systems design.
In the future, [10] believes that research into user-centric identity management (U-Idm) in conjunction with PETs may represent a solution to manage and control personal information in a secure way. A U-Idm framework, in the most part, allows users to control their own data on a personal device they fully control. U-Idm frameworks can update information without revealing unnecessary identifying details. An important milestone for U-Idm frameworks is Microsoft Windows CardSpace, which is an identity platform which integrates U-Idm frameworks technologies.
6. PRIVACY DESIGNS
In this section, we review different designs for cloud computing applications. This section demonstrates different design models that can aid cloud application designers in dealing with different privacy scenarios such as data leakage, and data access.
Since users will use mechanisms to protect themselves against cloud applications, it is important to know the design of those systems a user may use. Considering that cloud applications will interact with any application users may use to communicate, it is worthwhile for designers to understand how these applications are designed.
As a result, this section discusses some privacy design targeted for users such as: determining the probability a cloud service provider will enforce its privacy policies, and a privacy model which makes cloud applications accountable for the way a user’s data is handled.
In addition, this section explains a design to protect cloud applications against third party auditing, and introduces the concept of sticky privacy policies.
6.1 A Client-Based Privacy Manager
Mowbay and Pearson, worked on a client-based privacy manager, whose goal was to reduce the risk of data leakage and the loss of privacy on sensitive data processed in a cloud. The privacy manager is on the client side to help the user protect his privacy when accessing cloud services [12]. Nonetheless, the privacy manager requires the help from a server-side component for effective operation.
6.1.1 Features
According to [12], the privacy manager provides five important features:
- Obfuscation: the privacy manager provides obfuscation and de-obfuscation of data. Using a key which is chosen by the user (and not revealed to cloud service providers), data can be obfuscated when it’s sent to the cloud. As a result, applications in the cloud or attackers will not be able to de-obfuscate the data. Obfuscation techniques are more attractive to users since they have full control over the data, and it hinders the cloud provider’s capability of using the user’s content for advertising purposes.
- Preference setting: this sets the user preferences regarding the handling of its personal data stored within the cloud. Nonetheless, for this feature to be useful, it needs policy enforcing mechanisms within the cloud.
- Data Access: a module designed to allow users to access personal information and see what is stored about them and its accuracy. It serves as an auditing mechanism to detect privacy violations. The module store logs on the client machine, when the personal information is accessed.
- Feedback: provides feedback to a user about the usage of its personal information in the cloud. This module monitors if the data is transferred outside of the cloud.
- Personae: allows users to choose among multiple personas when interacting with a cloud. In some contexts, a user might want to act in an anonymous manner, whereas in other situations he may want to reveal all or part of his identity.
6.1.2 Evaluation of the Client-Based Privacy Manager
Mowbay and Pearson’s client-based privacy manager solution meets some of the minimal privacy requirements a cloud application requires such as [12]:
- Limits the use of the data with the obfuscation module.
- Purpose specification is specified using the preference settings module.
- Openness and transparency is provided via the feedback and data access features
- Choice and consent is provided with a user-centric design [12]. The preference setting feature gives users control over their data and the personae feature makes it simpler.
- Security safeguards can be specified with the assumption that the data access module will be deployed on the service-side.
6.1.3 Drawbacks
The solution proposed by [12] is not appropriate for all cloud applications. The privacy manager needs the full cooperation of the cloud service provider. Cloud service providers that sell the user data to advertisers may not be willing to allow users to preserve their privacy. Furthermore, some service providers may be willing to respect a user’s privacy wishes, but may not agree to implement the service-side code necessary for the privacy manager’s feature to work.
6.2 A Virtual Private Data Repository
Nowadays, it is still a challenge to make a general mechanism to assure data privacy in clouds. In cloud applications both users and developers access data. Usually, users and developers organize data in different ways. Users manage data using file systems while developers use relational databases [8]. Solutions such as Amazon Simple Storage Service or Bigtable take a similar approach, while providing scalability. Nevertheless, these scalable solutions do not provide strong privacy guarantees or a friendly user interface.
Gu and Cheung recognized an opportunity to create an efficient, easy-to-use interface to access privacy-aware applications. They researched the architecture for a privacy-aware data service. Their goal was to design a privacy-aware general mechanism to access data in cloud environment applications.
To achieve a general privacy-aware data access mechanism, Gu and Cheung designed a “virtual private data repository” (VPDR) [8]. The VPDR provides a file system interface which is familiar to both users and developers. The data written into the VPDR is obfuscated and de-obfuscated with the aid of an access token. The VPDR architecture is based on three components: the virtual private disk (VPD), the virtual network buffer (VNB), and a virtual cloud storage (VCS).
The VPD is a privacy component that can reside in the cloud application or in the user’s computer. The VPD serves as an input/output device where the VPDR is constructed [Gu and Cheung].
To obfuscate the data the VPD slices the data at a bit level based on the access token. According to [8], there are several benefits to this approach. First, without the access token, an intruder would have to collect a large number of slices and perform many matching tests to get access to the data. Second, if multiple providers are used, the complexity increases for intruders. Third, a bit level slice mechanism creates an illegible but structural sequence. As a result, cloud applications could still perform certain operations including: compression, merging, and removing duplicates.
Gu and Cheung also argued that their solution should consider ways of preventing cloud service providers from accumulating data from users. If users are able to store their data on multiple cloud service providers they could avoid the fact that providers could accumulate their data. The VNB component addresses this problem by preventing cloud service providers from collecting data from users. To accomplish this, it communicates with the providers with a control uncertainty. Furthermore, its main function is to separate the link between the bit slices and its users.
The VCS component resides in the cloud, and it makes sure the user data is sliced and stored in different partitions, so an operator cannot easily combine slices to retrieve the original data.
One drawback to this design is that the data could be deciphered with vast computing resources. Additionally, the VCS component complicates the process of deleting and migrating user data. This is a consequence of the uncertainty between the data and its owner.
6.3 Probabilistic Privacy Manager
Nyre and Jaatun designed a system architecture that will give the users the probability that a specific cloud service provider will respect their requirements and enforce privacy policies. This model could be used to handle uncertainty in privacy enforcement and as a tool to interact with unreliable entities. The architecture is composed of five components: Personal Data Recorder (PDR), Personal Data Monitor (PDM), Trust Assessment Engine (TAE), Trust Monitor (TM), and a Policiy Decision Point (PDP).
The web provides many opportunities for information aggregation. An example would be where a user wants to stay unidentified but needs to provide his postal code and an anonymous e-mail address; later a user uses the same anonymous e-mail and additionally provides his age and given name. At this point, a given provider can combine the data and identify an anonymous user [13]. The PDR component solves this problem [13]. The PDR records what information is sent to one or more providers. Also, it gives the user an idea of how a cloud service provider sees this information, which allows them to judge if they are sending too much information or not.
The PDM calculates the probability that an entity will forward the information to another entity. Also, it updates the PDR with collected knowledge.
The TAE module assesses communicating parties by calculating a trust value for determining their trustworthiness.
The TM module detects events that could affect a perceived trustworthiness. This module decides, based on any given circumstance, if the entity has an acceptable level of trust. Additionally, it contains a repository in which it stores feedback from other entities regarding a provider.
The PDP decides if the information should be shared with an entity and under what conditions.
6.3.1 Benefits
According to [13], the probabilistic privacy manager design provides four key benefits to a privacy application: It informs the user about the trustworthiness of an entity. It provides anonymity when it’s necessary. It saves the users willingness to interact with an entity. It calculates the consequences of interacting with an entity.
6.3.2 Drawbacks
The solution designed by [13] has some drawbacks, including:
- Their TAE does not take into consideration risk willingness and vulnerabilities an entity can present when calculating their trust value score.
- The PDR is not able of handling redistribution of data (receiver forwards the data to other receivers).
- The solution does not include a privacy or trust model.
6.4 A Privacy-Preserving Public Auditing Scheme
In case a cloud service provider enables public auditability, users can hire a third party auditor (TPA) to audit their data on their behalf. Public auditability can be referred to as ensuring an external party other than the service provider ensures that the remotely stored data is correct and has not being modified.
Wang et al investigated the fact that cloud service providers do not have schemes that support privacy protection against external auditors. As a result, TPA could introduce new vulnerabilities for users, such as leakage of unauthorized information from their data. Therefore, [15] design goal is to allow TPAs to verify the correctness of the data inside a cloud without demanding the copy of the whole data.
Wang et al propose a privacy-preserving public auditing scheme that protects a user’s privacy against TPAs. According to [15], data encryption before storing data into the cloud is used as a complement to the proposed scheme. The reasoning is that encryption does not solve the problem; it only reduces it to managing encryption keys that can still be exposed.
The public auditing scheme consists of four algorithms: KeyGen, SigGen, GenProof, and VerifyProof.
The KeyGen is a generation algorithm that is run by the user to setup a scheme. The SigGen algorithm is used by the user to generate verification metadata that will be used for auditing [15]. The GenProof is run by the cloud service provider to generate a proof of data storage correctness. The VerifyProof algorithm is run by the TPA to audit the proof from the cloud service provider [15].
Based on the algorithms, the public auditing scheme can be constructed in two phases: setup and audit.
In the setup phase, the user initiates public and secret parameters by executing Keygen. Then, the user pre-processes the data file by using SigGen to generate verification metadata. At this point, the user can store the data file in the cloud, and publish the verification metadata to the TPA for later audit.
In the audit phase, the TPA confronts the cloud service provider to verify that the data file has been preserved appropriately. In this scenario, the cloud application will send a response message by executing the GenProof. The TPA can verify the response using the verification metadata via VerifyProof.
To support public auditability without retrieving the data blocks, the privacy-preserving public auditing scheme uses the homomorphic authenticator technique. The homomorphic authenticator generates an unforgeable verification metadata from individual data blocks, which assures an auditor that a linear combination of data blocks is correct by verifying the aggregated authenticator [15].
However, if sufficient linear combinations of a data block are collected, the TPA could decipher the user’s data by solving a system of linear equations. Consequently, the linear combination is masked with randomness generated by a pseudo random function (PRF). As a result, the TPA would not have all the necessary information to build up a correct group of linear equations to learn any knowledge about the data stored in the cloud [15].
According to [15], different performance benchmarks proved their solution is very efficient and secure.
6.5 Sticky Privacy Policies
Creese et al consider that every piece of data residing on equipment that is not managed by users needs to have its privacy addressed. As a result, Creese et al explored methods to design data protection into a cloud application in the early stages of development, avoiding costly future issues and poor protection from design decisions that disagree with data protection needs [4].
To build a data protection mechanism in clouds, Creese et al designed a pattern called Sticky Privacy Policies. The intent of the mechanism is to bind a specific privacy policy to data when it’s stored, processed, and shared.
The sticky policies make sure that multiple parties are aware of the data’s policies and act in accordance to them. For example, a sticky policy can specify that the data can only be used for a particular purpose, by certain people or that the user must be contacted before the data is used.
In sticky policies, the personal information is associated with machine-readable policies that can be composed and extended in flexible ways. For example, Cassasa et al showed how sticky policies can be represented in an XML-based format. In an XML format a sticky policy can contain:
- An owner tag: expressing information about the owner of the data, including an email address. This information can be encrypted using techniques such as Identifier-based Encryption, which is discussed in section 6.6.
- A validity tag: that contains the expiration date of the policy.
- Constraint and actions tags: the constraint tag can require a requestor or third party to authenticate before accessing the data. The action tag can notify the owner of the information if any usage of its data seems suspicious.
Creese et al identified some design issues their approach needs to address such as:
- To what level of granularity of data a policy should be attached? For example, a personal data element such as names or addresses can have a defined policy, but also a database could have a policy attached.
- The mechanism needs to be compatible with legacy systems.
- For practicality reasons, it might be better to have a reference to a policy, instead of an actual policy bound to the data.
The solution designed by [4] has several drawbacks such as: the data could be used by receivers in a way that the data owner or user would not like. Also, the policy could be ignored or detached from the data. Additionally, if the data is bonded with the policy, the data can be heavier and not compatible with some applications.
Despite these drawbacks, policy specification and verification tools such as The Enterprise Privacy Authorization Language (EPAL), W3C P3P, and others have already adopted the idea of sticky policies.
6.6 An Accountable Management of Identity and Privacy Model
Casassa et al researched the fact that users have little control over the destination of their data once it is released to a third party. Furthermore, organizations are not accountable for the information they share with other organizations.
For example, in an e-commerce scenario, users deal with transactions that span across multiple e-commerce websites. It starts when a user provides their identity to an e-commerce website to access their services. When the user interacts with a website, it may be that they are also interacting with other organizations. There is a chance that the website discloses personal data to other organizations in order to fulfill a transaction. To solve this problem, Casassa et al suggest a mechanism to associate disclosure policies for personal data but most importantly, increase the accountability of the implicated organizations.
The model proposed by [1] has several key aspects. First, [1] adopts the sticky policy paradigm, to allow users to agree to an applicable privacy along with opt-in and opt-out mechanisms. Also, the proposed model uses a Tracing Authority component which tracks the disclosure of data by an organization. Other key aspects include: obfuscation of personal information, disclosure of personal information if sticky policies constraints are followed, and enforced tracing and auditing of disclosures of personal data, to increase the accountability of the organization receiving the data.
A high level model of [1] proposed privacy model to enforce accountability on organizations can be explained on seven steps based on the e-commerce scenario:
- Users use graphical tools to define their sticky policies, obfuscate their data, and associate the obfuscated data to their customized policies.
- The user can start interacting with an e-commerce website by providing digital packages with the obfuscated data along with their sticky policies.
- The requestor (the e-commerce website) interacts with the Tracing Authority component to demonstrate that the involved terms and conditions are understood.
- The Tracing Authority receives a request and checks the integrity and trustworthiness of the requestor’s credentials.
- In the [1] model, nothing prevents the user from being involved in the disclosure process. As a result, the user can approve or disapprove the disclosure of their information.
- The actual disclosure of obfuscated data to a requestor (the e-commerce website), only happens if they can demonstrate to the Tracing Authority that they can obey the sticky policies set by the user
- Disclosure of personal information is logged and audited by the Tracing Authority. At this step, the accountability of the requestor is logged, and evidence about their knowledge of the users’ personal information is created. If the information is indiscriminately distributed to other organizations, the Tracing Authority has enough evidence for forensic analysis [1].
The accountable management of identity and privacy model proposed by [1], uses two technologies to accomplish the model explained above: Identifier-based Encryption (IBE) and Trusted Computing Platform Alliance (TCPA).
The IBE is an emerging cryptographic schema where any type of string (containing a name, role, terms and conditions, and many others) can be used as encryption keys. The TCPA is able to check that the receiver’s operating system is a trusted platform. Also, the TCPA can verify that the software installed on the computer is conformant with the disclosure policies and can implement the defined privacy management mechanisms.
7. OTHER ISSUES
In sections five and six, we discussed guidelines and privacy designs used for privacy-aware cloud applications. Other related issues that have an impact on privacy-aware cloud applications include: testing, accountability, terms of service and privacy policies. Those issues are discussed in this section.
7.1 Testing Privacy-Aware Clouds
In traditional software applications, users are only able to change configurations or options. In cloud applications, the user’s involvement is more significant, and more instantaneous. The software behavior changes continuously with the user behavior. As a result, users are closely involved with the design of cloud applications, by either changing the program state affecting the system behavior, or changing the logic of the application.
According to [8], in testing, user’s participation is even more active and direct. Users might not realize that they have become a powerful and indispensable part of the testing and quality assurance team. For privacy-aware applications in clouds, it is critical to provide good privacy mechanisms and avoid revealing internal information about the application to the parties involved in the testing.
7.1.1 A new testing paradigm
In traditional software companies, the developer-to-tester ratio is around one to one. However, leading internet application providers have more developers than testers [8]. The disparity shows a new paradigm in quality assurance methodologies. The new paradigm is a result of the capacity of both internet and cloud applications to rapidly release fixes for bugs, and new versions of software. As a result, it is a challenge to ensure that an application still holds high standards of software quality.
Gu and Cheung suggest that to solve this problems, cloud service providers have taken an incremental-release approach. Usually, cloud service providers will release a new set of changes to a small group of users. If the testing is successful, then the changes are usually applied to the production application. The incremental testing is unlikely to affect users if it is carefully managed. Furthermore, it enhances the communication among designers, developers, and software engineers while incorporating the users into the test environment.
7.2 Accountability
Cloud service providers should value accountability and include this principle in their design stages of their privacy-aware applications. Accountability can be defined as placing a legal responsibility to an organization that stores PII, and ensuring that if an organization supplies PII to a third party, it abides the previously agreed privacy policies.
According to [3], accountability may be a good principle to provide privacy towards cloud computing applications. Charlesworth and Pearson identify five elements to provide accountability in privacy-aware cloud applications:
- Transparency: the level of openness about a cloud service provider’s handling of PII that allows meaningful accountability. Users should be informed about how their information is used within the cloud.
- Assurance: through privacy policies, cloud service providers can provide assurances to respect contractual measures and audits.
- User Trust: accountability promotes user trust. When users are not clear why their information is requested or how it will be processed, this lack of information leads to suspicion and distrust.
- Responsibility: data protection requires a big share of responsibility from the cloud service provider. Establishing responsible and accountable privacy standards allows providers to assess risks in terms of financial losses and privacy breaches.
- Policy compliance: accountability ensures that cloud service providers fulfill the laws.
Finally, it can be said that incorporating accountability into privacy-aware cloud application guarantees that laws that apply to cloud computing are followed.
7.3 Terms of Service and Privacy Policy
According to [6], terms of service, from a privacy perspective, may be the most important feature of cloud computing for a user who is not subject to a legal obligation.
The terms of service is a key document that attempts to define the relationship between a user or customer and the provider of a service, the service itself, and the parameters used to define the performance of the cloud service provider [4].
Cloud service providers offer its services to users without individual contracts, but subject to their terms of service. If the terms of service give providers control over a user’s personal information, the user must respect those terms. As a result, cloud service providers may be able to copy, use, change, publish, distribute, display and share this information with their affiliates [6].
In addition, cloud service providers reserve the right to change policies without any limits at any time. Consequently, if a user agreed to a cloud provider’s terms of service, and the terms changed without the user’s awareness, a change could create legal liabilities to users.
The terms of service allows a cloud service provider to terminate the user at any time. Therefore, if a user does not have a backup of their information, it can be lost. For organizations or government agencies this could be disastrous [6].
Creese et al believe that by understanding and analyzing the terms of service, software engineers can formulate engineering requirements which can help in the process of design privacy-aware solutions for clouds.
8. CONCLUSIONS
After reviewing design practices for cloud computing applications, the lessons learned, and future opportunities to design privacy-aware applications can be summarized based on the existing research.
Taking privacy into account when designing cloud computing applications is critical if personal information is going to be collected, processed or shared. Privacy should be a fundamental design goal, and it should cover both users and service providers.
Furthermore, privacy should be built into every phase of the development process; it cannot be added at a later stage. Privacy-aware clouds need to have privacy testing methodologies to ensure that internal activities in the cloud and product features are not leaked to third parties.
In the context of the conducted research work, future work would be well advised to:
- Explore software development methodologies such as agile development. In cloud computing, requirements for an application change based on the user’s needs. Having a full design specification is not always possible. Applications need to be tested more frequently. As a result, designing privacy-aware cloud applications on agile environments will become relevant [14].
- Incorporate privacy templates. It may be useful for developers and software engineers not only to have guidelines such as the ones described in section 5, but to have privacy templates for the different privacy scenarios that can take place [14].
- Good privacy designs and accountability go together. It is a practical mechanism to reduce a user’s privacy risks and enhance a cloud service provider’s credibility.
- Create a language for privacy design. Efforts in the future should focus on a privacy language whose goal is to equip executives and technology professionals with a shared vocabulary that allows them to create and discuss privacy requirements in an understandable manner for all parties [10].
Different privacy guidelines, requirements and design models were suggested that may be used by software engineers, architects or developers, in order to reduce the risks and threats on cloud applications. Nonetheless, there are still opportunities for improvement. Future work might address one or more of the following:
- How consent and revocation of consent can be provided within privacy-aware cloud applications? A United Kingdom project called EnCoRe (Ensuring Consent and Revocation) is trying to answer this question by examining solutions in the areas of consent and revocation of personal information. [12]
- Cloud applications must consider scalability, exploit parallelism, and at the same time protect a user’s privacy. To address this challenge, parallelization models such as MapReduce have become popular. Considering MapReduce does not provide a mechanism to protect a user’s privacy, how can we enhance existing parallelization frameworks to provide privacy protection?
- Testing a software component to provide privacy guarantees involves designing a set of test scenarios [8]. Can we exhaustively test a piece of software against a set of privacy policies? How can we define exhaustive criterion? How can we evaluate the quality of the testing performed?
- When software engineers incorporate terms of service into their applications, how can they determine design pattern properties and the details required by the terms of services to satisfy a user?
- How can cloud service providers recruit individuals with accredited skills in privacy management and designing? According to [10], in the United Kingdom there is no body that is recognized as providing such accreditation. Therefore, there is a clear need to establish a professional body for privacy professionals not only in the United Kingdom but across the globe.
Designing privacy into cloud applications is a win-win situation, in which users and cloud service providers are the beneficiaries. Designing privacy-aware cloud applications provides a more efficient management of personal information which reduces processing costs, provides a more precise data, and creates a competitive advantage gained through trust and responsible management of personal information. When cloud service provides adopt privacy as their design priority, privacy risks and threats will be a thing of the past.
9. REFERENCES
[1] Casassa, Marco., Pearson, Siani., and Bramhall, Pete. “Towards Accountable Management of Identity and Privacy: Sticky Policies and Enforceable Tracing Services”. In: DEXA 2003, pp. 377-382. IEEE Computer Society. 2003
[2] Cauvoukian, Ann. “Privacy By Design”. January 2009. Available via http://www.privacybydesign.ca/pbdbook/PrivacybyDesignBook-ch17.pdf
[3] Charlesworth, Andrew .,and Pearson, Siani. “Accountability as a Way Forward for Privacy Protection in the Cloud”. HP Labs. December 2009.
[4] Creese, Sadie., Hopkins,Paul., Pearson, Siani, and Shen, Yun. “Data Protection-Aware Design for Cloud Computing”. HP Labs. August 2009.
[5] EXOCOM Group, Inc., “Privacy Technology Review”, August 2001. Available via http://www.hc-sc.gc.ca/hcs-sss/pubs/ehealth-esante/2001-priv-tech/index-eng.php
[6] Gellman, Robert. “Privacy in the Clouds: Risks to Privacy and Confidentiality from Cloud Computing”. World Privacy Forum. February 2009.
[7] Griffin, Lavonne. “Technology Definitions”. May 2009. Available via http://www.brownfield.org/auditor/index.cfm?a=114449&c=42101
[8] Gu, Lin.,and Cheung , Shing-Chi. “Constructing and Testing Privacy-Aware Services in a Cloud Computing Environment – Challenges and Opportunities”. October 2009.
[9] Information Commissioner’s Office. “Privacy Impact Assessment Handbook”. December 2007. Available via http://www.ico.gov.uk/upload/documents/pia_handbook_html_v2/html/1-Chap1-2.html
[10] Information Commissioners Office, “Privacy by Design: An overview of privacy enhancing technologies”, November 2008. Available via http://www.ico.gov.uk/upload/documents/pdb_report_html/privacy_by_design_report_v2.pdf
[11] Microsoft Corporation. “Privacy Guidelines for Developing Software Products and Services”. 26th April 2007. Available via http://www.microsoft.com/Downloads/details.aspx?FamilyID=c48cf80f-6e87-48f5-83ec-a18d1ad2fc1f&displaylang=en
[12] Mowbay, Miranda., and Pearson, Siani. “A Client-Based Privacy Manager for Cloud Computing” . HP Labs. June 2009.
[13] Nyre, Asmund., and Jaatun ,Martin. “Privacy in a Semantic Cloud: What’s Trust Got to Do with It?”. Proceedings of the 1st International Conference on Cloud Computing. 2009.
[14] Pearson, Siani. “Taking Account of Privacy when Designing Cloud Computing Services”. HP Labs. ICSE’09 Workshop. May 2009.
[15] Wang, Cong., Wang, Quian., Ren, Kui., and Lou, Wenjing. “Privacy-Preserving Public Auditing for Data Storage Security in Cloud Computing”. Illinois Institute of Technology. IEEE Infocom. November 2009.