IEEE 2012 Projects: October 2012

Tuesday, 30 October 2012

Slicing A New Approach to Privacy Preserving Data Publishing

Abstract:

Several anonymization techniques, such as generalization and bucketization,

have been designed for privacy preserving microdata publishing. Recent

work has shown that generalization loses considerable amount of information,

especially for high-dimensional data. Bucketization, on the other hand, does

not prevent membership disclosure and does not apply for data that do not

have a clear separation between quasi-identifying attributes and sensitive

attributes. In this paper, we present a novel technique called slicing, which

partitions the data both horizontally and vertically. We show that slicing

preserves better data utility than generalization and can be used for

membership disclosure protection. Another important advantage of slicing is

that it can handle high-dimensional data. We show how slicing can be used

for attribute disclosure protection and develop an efficient algorithm for

computing the sliced data that obey the ℓ-diversity requirement. Our workload

experiments confirm that slicing preserves better utility than generalization

and is more effective than bucketization in workloads involving the sensitive

attribute. Our experiments also demonstrate that slicing can be used to

prevent membership disclosure.

Algorithm Used:

Slicing Algorithms

Advantage of slicing is its ability to handle high-dimensional data. By

partitioning attributes into columns, slicing reduces the dimensionality of the

data. Each column of the table can be viewed as a sub-table with a lower

dimensionality. Slicing is also different from the approach of publishing

multiple independent sub-tables in that these sub-tables are linked by the

buckets in slicing.

Risk Aware Mitigation for MANET Routing Attacks

Abstract:

Mobile Ad hoc Networks (MANET) have been highly vulnerable to attacks

due to the dynamic nature of its network infrastructure. Among these attacks,

routing attacks have received considerable attention since it could cause the

most devastating damage to MANET. Even though there exist several intrusion

response techniques to mitigate such critical attacks, existing solutions

typically attempt to isolate malicious nodes based on binary or naive fuzzy

response decisions.

However, binary responses may result in the unexpected network partition,

causing additional damages to the network infrastructure, and naive fuzzy

responses could lead to uncertainty in countering routing attacks in MANET. In

this paper, we propose a risk-aware response mechanism to systematically

cope with the identified routing attacks. Our risk-aware approach is based on

an extended Dempster-Shafer mathematical theory of evidence introducing a

notion of importance factors. In addition, our experiments demonstrate the

effectiveness of our approach with the consideration of several performance

metrics.

Fig. Risk-aware response mechanism

Ranking Model Adaptation For Domain-Specific Search

ABSTRACT:

With the explosive emergence of vertical search domains, applying the broad-

based ranking model directly to different domains is no longer desirable due to

domain differences, while building a unique ranking model for each domain is

both laborious for labeling data and time-consuming for training models. In this

paper, we address these difficulties by proposing a regularization based

algorithm called ranking adaptation SVM (RA-SVM), through which we can

adapt an existing ranking model to a new domain, so that the amount of

labeled data and the training cost is reduced while the performance is still

guaranteed. Our algorithm only requires the Prediction from the existing

ranking models, rather than their internal representations or the data from

auxiliary domains. In addition, we assume that documents similar in the

domain-specific feature space should have consistent rankings, and add some

constraints to control the margin and slack variables of RA-SVM adaptively.

Finally, ranking adaptability measurement is proposed to quantitatively

estimate if an existing ranking model can be adapted to a new domain.

Experiments performed over Letor and two large scale datasets crawled from

a commercial search engine demonstrate the applicabilities of the proposed

ranking adaptation algorithms and the ranking adaptability

measurement.

Organizing User Search Histories

Organizing User Search Histories

Abstract:

Users are increasingly pursuing complex task-oriented goals on the Web, such as making travel arrangements, managing finances or planning purchases. To this end, they usually break down the tasks into a few co-dependent steps and issue multiple queries around these steps repeatedly over long periods of time. To better support users in their long-term information quests on the Web, search engines keep track of their queries and clicks while searching online. In this paper, we study the problem of organizing a user’s historical queries into groups in a dynamic and automated fashion. Automatically identifying query groups is helpful for a number of different search engine components and applications, such as query suggestions, result ranking, query alterations, sessionization, and collaborative search. In our approach, we go beyond approaches that rely on textual similarity or time thresholds, and we propose a more robust approach that leverages search query logs. We experimentally study the performance of different techniques, and showcase their potential, especially when combined together.

Algorithm Used:

Page Rank Algorithms

MULTIPARTY ACCESS CONTROL FOR ONLINE SOCIAL NETWORKS:MODEL AND MECHANISMS

ABSTRACT:

Online social networks (OSNs) have experienced tremendous growth in recent

years and become a de facto portal for hundreds of millions of Internet users.

These OSNs offer attractive means for digital social interactions and

information sharing, but also raise a number of security and privacy issues.

While OSNs allow users to restrict access to shared data, they currently do not

provide any mechanism to enforce privacy concerns over data associated with

multiple users. To this end, we propose an approach to enable the protection of

shared data associated with multiple users in OSNs. We formulate an access

control model to capture the essence of multiparty authorization requirements,

along with a multiparty policy specification scheme and a policy enforcement

mechanism. Besides, we present a logical representation of our access control

model which allows us to leverage the features of existing logic solvers to

perform various analysis tasks on our model. We also discuss a proof-of-

concept prototype of our approach as part of an application in Facebook and

provide usability study and system evaluation of our method.

Handwritten Chinese Text Recognition by Integrating Multiple Contexts

Abstract:

This paper presents an effective approach for the offline recognition of

unconstrained handwritten Chinese texts. Under the general integrated

segmentation-and-recognition framework with character oversegmentation,

we investigate three important issues: candidate path evaluation, path search,

and parameter estimation.For path evaluation,we combine multiple contexts

(character recognition scores, geometric and linguistic contexts) from the

Bayesian decision view, and convert the classifier outputs to posterior

probabilities via confidence transformation. In path search, we use a refined

beam search algorithm to improve the search efficiency and, meanwhile, use a

candidate character augmentation strategy to improve the recognition

accuracy. The combining weights of the path evaluation function are optimized

by supervised learning using a Maximum Character Accuracy criterion. We

evaluated the recognition performance on a Chinese handwriting database

CASIA-HWDB, which contains nearly four million character samples of 7,356

classes and 5,091 pages of unconstrained handwritten texts. The

experimental results show that confidence transformation and combining

multiple contexts improve the text line recognition performance significantly.

On a test set of 1,015 handwritten pages, the proposed approach achieved

character-level accurate rate of 90.75 percent and correct rate of 91.39

percent, which are superior by far to the best results reported in the literature.

System diagram of handwritten Chinese text line recognition

A page of handwritten Chinese text

Footprint: Detecting Sybil Attacks in Urban Vehicular Networks

Abstract:

In urban vehicular networks, where privacy, especially the location privacy of

anonymous vehicles is highly concerned, anonymous verification of vehicles is

indispensable. Consequently, an attacker who succeeds in forging multiple

hostile identifies can easily launch a Sybil attack, gaining a disproportionately

large influence. In this paper, we propose a novel Sybil attack detection

mechanism, Footprint, using the trajectories of vehicles for identification while

still preserving their location privacy. More specifically, when a vehicle

approaches a road-side unit (RSU), it actively demands an authorized message

from the RSU as the proof of the appearance time at this RSU. We design a

location-hidden authorized message generation   scheme for two objectives:

first, RSU signatures on messages are signer ambiguous so that the RSU

location information is concealed from the resulted authorized message;

second, two authorized messages signed by the same RSU within the same

given period of time (temporarily linkable) are recognizable so that they can

be used for identification. With the temporal limitation on the likability of two

authorized messages, authorized messages used for long-term identification

are prohibited. With this scheme, vehicles can generate a location-hidden

trajectory for location-privacy-preserved identification by collecting a

consecutive series of authorized   messages. Utilizing social relationship among

trajectories according to the similarity definition of two trajectories, Footprint

can recognize and therefore dismiss “communities” of Sybil trajectories.

Rigorous security analysis and extensive trace-driven simulations demonstrate

the efficacy of Footprint.

The design of a Sybil attack detection scheme in urban vehicular networks should achieve three goals:

1. Location privacy preservation: a particular vehicle would not like to

expose its location information to other vehicles and RSUs as well since such

information can be confidential. The detection scheme should prevent the

location information of vehicles from being leaked.

2. Online detection: when a Sybil attack is launched, the detection scheme

should react before the attack has terminated. Otherwise, the attacker could

already achieve its purpose.

3. Independent detection: the essence of Sybil attack happening is that the

decision is made based on group negotiations. To eliminate the possibility that

a Sybil attack is launched against the detection itself, the detection should be

conducted independently by the verifier without collaboration with others.

Fast Data Collection in Tree Based Wireless Sensor Networks

Abstract:

We investigate the following fundamental question - how fast can information

be collected from a wireless sensor network organized as tree? To address

this, we explore and evaluate a number of different techniques using realistic

simulation models under the many-to-one communication paradigm known as

converge cast. We first consider time scheduling on a single frequency channel

with the aim of minimizing the number of time slots required (schedule length)

to complete a converge cast. Next, we combine scheduling with transmission

power control to mitigate the effects of interference, and show that while

power control helps in reducing the schedule length under a single frequency,

scheduling transmissions using multiple frequencies is more efficient. We give

lower bounds on the schedule length when interference is completely

eliminated, and propose algorithms that achieve these bounds. We also

evaluate the performance of various channel assignment methods and find

empirically that for moderate size networks of about 100 nodes, the use of

multi-frequency scheduling can suffice to eliminate most of the interference.

Then, the data collection rate no longer remains limited by interference but by

the topology of the routing tree. To this end, we construct degree-constrained

spanning trees and capacitated minimal spanning trees, and show significant

improvement in scheduling performance over different deployment densities.

Lastly, we evaluate the impact of different interference and channel models on

the schedule length.

Algorithm used:

1. BFS TIME SLOT ASSIGNMENT

2. LOCAL-TIME SLOT ASSIGNMENT

Efficient Fuzzy Type Ahead Search in XML Data

Abstract:

In a traditional keyword-search system over XML data, a user composes a

keyword query, submits it to the system, and retrieves relevant answers. In

the case where the user has limited knowledge about the data, often the user

feels “left in the dark” when issuing queries, and has to use a try-and-see

approach for finding information. In this paper, we study fuzzy type-ahead

search in XML data, a new information-access paradigm in which the system

searches XML data on the fly as the user types in query keywords. It allows

users to explore data as they type, even in the presence of minor errors of

their keywords. Our proposed method has the following features:

1) Search as you type: It extends Auto complete by supporting queries with multiple keywords in XML data.

2) Fuzzy: It can find high-quality answers that have keywords matching query keywords approximately.

3) Efficient: Our effective index structures and searching algorithms can achieve a very high interactive speed.

We study research challenges in this new search framework. We propose

effective index structures and top-k algorithms to achieve a high interactive

speed. We examine effective ranking functions and early termination

techniques to progressively identify the top-k relevant answers. We have

implemented our method on real data sets, and the experimental results show

that our method achieves high search efficiency and result quality.

TRUST MODELING IN SOCIAL TAGGING OF MULTIMEDIA CONTENT

ABSTRACT:

Tagging in online social networks is very popular these days, as it

facilitates search and retrieval of multimedia content. However, noisy and

spam annotations often make it difficult to perform an efficient search. Users

may make mistakes in tagging and irrelevant tags and content may be

maliciously added for advertisement or self-promotion. This article surveys

recent advances in techniques for combatting such noise and spam in social

tagging. We classify the state-of-the-art approaches into a few categories and

study representative examples in each. We also qualitatively compare and

contrast them and outline open issues for future research.

CONCLUSION:

In this article, we dealt with one of the key issues in social tagging systems:

combatting noise and spam. We classified existing studies in the literature into

two categories, i.e., content and user trust modeling. Representative

techniques in each category were analyzed and compared. In addition, existing

databases and evaluation protocols were re viewed. An example system was

presented to demonstrate how trust modeling can be particularly employed in

a popular application of image sharing and geotagging. Finally, open issues and

future research trends were prospected. As online social networks and content

sharing services evolve rapidly, we believe that the research on enhancing

reliability and trustworthiness of such services will become increasingly

important.

Self Adaptive Contention Aware Routing Protocol for Intermittently Connected Mobile Networks

Abstract:

This paper introduces a novel multi-copy routing protocol, called Self

Adaptive Utility-based Routing Protocol (SAURP), for Delay Tolerant Networks

(DTNs) that are possibly composed of a vast number of devices in miniature

such as smart phones of heterogeneous capacities in terms of energy resources

and buffer spaces. SAURP is characterized by the ability of identifying potential

opportunities for forwarding messages to their destinations via a novel utility

function based mechanism, in which a suite of environment parameters, such

as wireless channel condition, nodal buffer occupancy, and encounter statistics,

are jointly considered. Thus, SAURP can reroute messages around nodes

experiencing high buffer occupancy, wireless interference, and/or congestion,

while taking a considerably small number of transmissions. The developed

utility function in SAURP is proved to be able to achieve optimal performance,

which is further analyzed via a stochastic modeling approach. Extensive

simulations are conducted to verify the developed analytical model and

compare the proposed SAURP with a number of recently reported encounter-

based routing approaches in terms of delivery ratio, delivery delay, and the

number of transmissions required for each message delivery. The simulation

results show that SAURP outperforms all the counterpart multi-copy encounter-

based routing protocols considered in the study.

Packet-Hiding Methods for Preventing Selective Jamming Attacks

Abstract:

The open nature of the wireless medium leaves it vulnerable to intentional

interference attacks, typically referred to as jamming. This intentional

interference with wireless transmissions can be used as a launchpad for

mounting Denial-of-Service attacks on wireless networks. Typically, jamming

has been addressed under an external threat model. However, adversaries

with internal knowledge of protocol specifications and network secrets can

launch low-effort jamming attacks that are difficult to detect and counter. In

this work, we address the problem of selective jamming attacks in wireless

networks. In these attacks, the adversary is active only for a short period of

time, selectively targeting messages of high importance. We illustrate the

advantages of selective jamming in terms of network performance degradation

and adversary effort by presenting two case studies; a selective attack on TCP

and one on routing.We show that selective jamming attacks can be launched

by performing real-time packet classification at the physical layer. To mitigate

these attacks, we develop three schemes that prevent real-time packet

classification by combining cryptographic primitives with physical-layer

attributes. We analyze the security of our methods and evaluate their

computational and communication overhead.

Modules:-

1. Network module

2. Real Time Packet Classification

3. Selective Jamming Module

4. Strong Hiding Commitment Scheme (SHCS)

5. Cryptographic Puzzle Hiding Scheme (CPHS)

Online Modeling of Proactive Moderation System for Auction Fraud Detection

ABSTRACT:

We consider the problem of building online machine-learned models for

detecting auction frauds in e-commence web sites. Since the emergence of the

world wide web, online shopping and online auction have gained more and

more popularity. While people are enjoying the benefits from online trading,

criminals are also taking advantages to conduct fraudulent activities against

honest parties to obtain illegal profit. Hence proactive fraud-detection

moderation systems are commonly applied in practice to detect and prevent

such illegal and fraud activities. Machine-learned models, especially those

that are learned online, are able to catch frauds more efficiently and quickly

than human-tuned rule-based systems. In this paper, we propose an online

probit model framework which takes online feature selection, coefficient

bounds from human knowledge and multiple instance learning into account

simultaneously. By empirical experiments on a real-world online auction fraud

detection data we show that this model can potentially detect more frauds

and significantly reduce customer complaints compared to several baseline

models and the human-tuned rule-based system.

Modules:

• Rule-based features:

Human experts with years of experience created many rules

to detect whether a user is fraud or not. An example of such rules is

“blacklist”, i.e. whether the user has been detected or complained as fraud

before. Each rule can be regarded as a binary feature that indicates the fraud

likeliness.

• Selective labeling:

If the fraud score is above a certain threshold, the case will

enter a queue for further investigation by human experts. Once it is

reviewed,the final result will be labeled as boolean, i.e. fraud or clean. Cases

with higher scores have higher priorities in the queue to be reviewed. The

cases whose fraud score are below the threshold are determined as clean by

the system without any human judgment.

• Fraud churn:

Once one case is labeled as fraud by human experts, it is very

likely that the seller is not trustable and may be also selling other frauds;

hence all the items submitted by the same seller are labeled as fraud too. The

fraudulent seller along with his/her cases will be removed from the website

immediately once detected.

• User Complaint:

Buyers can file complaints to claim loss if they are recently

deceived by fraudulent sellers. The Administrator view the various type of

complaints and the percentage of various type complaints. The complaints

values of a products increase some threshold value the administrator set the

trustability of the product as Untrusted or banded. If the products set as

banaded, the user cannot view the products in the website.

CONCLUSION:

In this paper we build online models for the auction fraud moderation and

detection system designed for a major Asian online auction website. By

empirical experiments on a real world online auction fraud detection data, we

show that our proposed online probit model framework, which combines online

feature selection, bounding coefficients from expert knowledge and multiple

instance learning, can significantly improve over baselines and the human-

tuned model. Note that this online modeling framework can be easily

extended to many other applications, such as web spam detection, content

optimization and so forth. Regarding to future work, one direction is to include

the adjustment of the selection bias in the online model training process. It

has been proven to be very effective for offline models. The main idea there is

to assume all the unlabeled samples have response equal to 0 with a very

small weight. Since the unlabeled samples are obtained from an effective

moderation system, it is reasonable to assume that with high probabilities

they are non-fraud. Another future work is to deploy the online models

described in this paper to the real production system, and also other

applications.

Pages

Tuesday, 30 October 2012