Pages

Tuesday, 30 October 2012

Slicing A New Approach to Privacy Preserving Data Publishing


Abstract:

Several anonymization techniques, such as generalization and bucketization, 

have been designed for privacy preserving microdata publishing. Recent 

work has shown that generalization loses considerable amount of information, 

especially for high-dimensional data. Bucketization, on the other hand, does 

not prevent membership disclosure and does not apply for data that do not 

have a clear separation between quasi-identifying attributes and sensitive 

attributes. In this paper, we present a novel technique called slicing, which 

partitions the data both horizontally and vertically. We show that slicing 

preserves better data utility than generalization and can be used for 

membership disclosure protection. Another important advantage of slicing is 

that it can handle high-dimensional data. We show how slicing can be used 

for attribute disclosure protection and develop an efficient algorithm for 

computing the sliced data that obey the ℓ-diversity requirement. Our workload 

experiments confirm that slicing preserves better utility than generalization 

and is more effective than bucketization in workloads involving the sensitive 

attribute. Our experiments also demonstrate that slicing can be used to 

prevent membership disclosure.

Algorithm Used:

Slicing Algorithms


 


Advantage of slicing is its ability to handle high-dimensional data. By 

partitioning attributes into columns, slicing reduces the dimensionality of the 

data. Each column of the table can be viewed as a sub-table with a lower 

dimensionality. Slicing is also different from the approach of publishing 

multiple independent sub-tables in that these sub-tables are linked by the 

buckets in slicing.



Risk Aware Mitigation for MANET Routing Attacks


Abstract:

Mobile Ad hoc Networks (MANET) have been highly vulnerable to attacks 

due to the dynamic nature of its network infrastructure. Among these attacks, 

routing attacks have received considerable attention since it could cause the 

most devastating damage to MANET. Even though there exist several intrusion 

response techniques to mitigate such critical attacks, existing solutions 

typically attempt to isolate malicious nodes based on binary or naive fuzzy 

response decisions.


 However, binary responses may result in the unexpected network partition, 

causing additional damages to the network infrastructure, and naive fuzzy 

responses could lead to uncertainty in countering routing attacks in MANET. In 

this paper, we propose a risk-aware response mechanism to systematically 

cope with the identified routing attacks. Our risk-aware approach is based on 

an extended Dempster-Shafer mathematical theory of evidence introducing a 

notion of importance factors. In addition, our experiments demonstrate the 

effectiveness of our approach with the consideration of several performance 

metrics.


Fig.  Risk-aware response mechanism






Apply to be a Chitika Publisher!

Ranking Model Adaptation For Domain-Specific Search


ABSTRACT:


With the explosive emergence of vertical search domains, applying the broad-

based ranking model directly to different domains is no longer desirable due to 

domain differences, while building a unique ranking model for each domain is 

both laborious for labeling data and time-consuming for training models. In this 

paper, we address these difficulties by proposing a regularization based 

algorithm called ranking adaptation SVM (RA-SVM), through which we can 

adapt an existing ranking model to a new domain, so that the amount of 

labeled data and the training cost is reduced while the performance is still 

guaranteed. Our algorithm only requires the Prediction from the existing 

ranking models, rather than their internal representations or the data from 

auxiliary domains. In addition, we assume that documents similar in the 

domain-specific feature space should have consistent rankings, and add some 

constraints to control the margin and slack variables of RA-SVM adaptively. 

Finally, ranking adaptability measurement is proposed to quantitatively 

estimate if an existing ranking model can be adapted to a new domain. 

Experiments performed over Letor and two large scale datasets crawled from 

a commercial search engine demonstrate the applicabilities of the proposed 

ranking adaptation algorithms and the ranking adaptability 

measurement.

Organizing User Search Histories

Organizing User Search Histories

Abstract:

Users are increasingly pursuing complex task-oriented goals on the Web, such as making travel arrangements, managing finances or planning purchases. To this end, they usually break down the tasks into a few co-dependent steps and issue multiple queries around these steps repeatedly over long periods of time. To better support users in their long-term information quests on the Web, search engines keep track of their queries and clicks while searching online. In this paper, we study the problem of organizing a user’s historical queries into groups in a dynamic and automated fashion. Automatically identifying query groups is helpful for a number of different search engine components and applications, such as query suggestions, result ranking, query alterations, sessionization, and collaborative search. In our approach, we go beyond approaches that rely on textual similarity or time thresholds, and we propose a more robust approach that leverages search query logs. We experimentally study the performance of different techniques, and showcase their potential, especially when combined together.

Algorithm Used:

Page Rank Algorithms

MULTIPARTY ACCESS CONTROL FOR ONLINE SOCIAL NETWORKS:MODEL AND MECHANISMS


ABSTRACT:


Online social networks (OSNs) have experienced tremendous growth in recent 

years and become a de facto portal for hundreds of millions of Internet users. 

These OSNs offer attractive means for digital social interactions and 

information sharing, but also raise a number of security and privacy issues. 

While OSNs allow users to restrict access to shared data, they currently do not 

provide any mechanism to enforce privacy concerns over data associated with 

multiple users. To this end, we propose an approach to enable the protection of 

shared data associated with multiple users in OSNs. We formulate an access 

control model to capture the essence of multiparty authorization requirements, 

along with a multiparty policy specification scheme and a policy enforcement 

mechanism. Besides, we present a logical representation of our access control 

model which allows us to leverage the features of existing logic solvers to 

perform various analysis tasks on our model. We also discuss a proof-of-

concept prototype of our approach as part of an application in Facebook and 

provide usability study and system evaluation of our method.

Handwritten Chinese Text Recognition by Integrating Multiple Contexts


Abstract:


            This paper presents an effective approach for the offline recognition of 

unconstrained handwritten Chinese texts. Under the general integrated 

segmentation-and-recognition framework with character oversegmentation, 

we investigate three important issues: candidate path evaluation, path search, 

and parameter estimation.For path evaluation,we combine multiple contexts 

(character recognition scores, geometric and linguistic contexts) from the 

Bayesian decision view, and convert the classifier outputs to posterior 

probabilities via confidence transformation. In path search, we use a refined 

beam search algorithm to improve the search efficiency and, meanwhile, use a 

candidate character augmentation strategy to improve the recognition 

accuracy. The combining weights of the path evaluation function are optimized 

by supervised learning using a Maximum Character Accuracy criterion. We 

evaluated the recognition performance on a Chinese handwriting database 

CASIA-HWDB, which contains nearly four million character samples of 7,356 

classes and 5,091 pages of unconstrained handwritten texts. The 

experimental results show that confidence transformation and combining 

multiple contexts improve the text line recognition performance significantly. 

On a test set of 1,015 handwritten pages, the proposed approach achieved 

character-level accurate rate of 90.75 percent and correct rate of 91.39 

percent, which are superior by far to the best results reported in the literature.


System diagram of handwritten Chinese text line recognition

A page of handwritten Chinese text

Footprint: Detecting Sybil Attacks in Urban Vehicular Networks

Abstract:    

In urban vehicular networks, where privacy, especially the location privacy of  

anonymous vehicles is highly concerned, anonymous verification of vehicles is 

indispensable. Consequently, an attacker who succeeds in forging multiple 

hostile identifies can easily launch a Sybil attack, gaining a disproportionately 

large influence. In this paper, we propose a novel Sybil attack detection 

mechanism, Footprint, using the trajectories of vehicles for identification while 

still preserving their location privacy. More specifically, when a vehicle 

approaches a road-side unit (RSU), it actively demands an authorized message 

from the RSU as the proof of the appearance time at this RSU. We design a 

location-hidden authorized message generation   scheme for two objectives: 

first, RSU signatures on messages are signer ambiguous so that the RSU 

location information is concealed from the resulted authorized message; 

second, two  authorized messages signed by the same RSU within the same 

given period of time (temporarily linkable) are recognizable so that they can 

be used for identification. With the temporal limitation on the likability of two 

authorized messages, authorized messages used for long-term identification 

are prohibited. With this scheme, vehicles can generate a location-hidden 

trajectory for location-privacy-preserved identification by collecting a 

consecutive series of authorized   messages. Utilizing social relationship among 

trajectories according to the similarity definition of two trajectories, Footprint 

can recognize and therefore dismiss “communities” of Sybil trajectories. 

Rigorous security analysis and extensive trace-driven simulations demonstrate 

the efficacy of Footprint.


The design of a Sybil attack detection scheme in urban vehicular networks should achieve three goals:


1. Location privacy preservation: a particular vehicle would not like to 

expose  its location information to other vehicles and RSUs as well since such 

information can be confidential. The detection scheme should prevent the 

location information of vehicles from being leaked.

2. Online detection: when a Sybil attack is launched, the detection scheme 

should react before the attack has terminated. Otherwise, the attacker could 

already achieve its purpose.

3. Independent detection: the essence of Sybil attack happening is that the 

decision is made based on group negotiations. To eliminate the possibility that 

a Sybil attack is launched against the detection itself, the detection should be 

conducted independently by the verifier without collaboration with others.


Fast Data Collection in Tree Based Wireless Sensor Networks


Abstract:

We investigate the following fundamental question - how fast can information 

be collected from a wireless sensor network organized as tree? To address 

this, we explore and evaluate a number of different techniques using realistic 

simulation models under the many-to-one communication paradigm known as 

converge cast. We first consider time scheduling on a single frequency channel 

with the aim of minimizing the number of time slots required (schedule length) 

to complete a converge cast. Next, we combine scheduling with transmission 

power control to mitigate the effects of interference, and show that while 

power control helps in reducing the schedule length under a single frequency, 

scheduling transmissions using multiple frequencies is more efficient. We give 

lower bounds on the schedule length when interference is completely 

eliminated, and propose algorithms that achieve these bounds. We also 

evaluate the performance of various channel assignment methods and find 

empirically that for moderate size networks of about 100 nodes, the use of 

multi-frequency scheduling can suffice to eliminate most of the interference. 

Then, the data collection rate no longer remains limited by interference but by 

the topology of the routing tree. To this end, we construct degree-constrained 

spanning trees and capacitated minimal spanning trees, and show significant 

improvement in scheduling performance over different deployment densities. 

Lastly, we evaluate the impact of different interference and channel models on 

the schedule length.


Algorithm used:

1. BFS TIME SLOT ASSIGNMENT
2. LOCAL-TIME SLOT ASSIGNMENT

Efficient Fuzzy Type Ahead Search in XML Data


Abstract:       


In a traditional keyword-search system over XML data, a user composes a 

keyword query, submits it to the system, and retrieves relevant answers. In 

the case where the user has limited knowledge about the data, often the user 

feels “left in the dark” when issuing queries, and has to use a try-and-see 

approach for finding information. In this paper, we study fuzzy type-ahead 

search in XML data, a new information-access paradigm in which the system 

searches XML data on the fly as the user types in query keywords. It allows 

users to explore data as they type, even in the presence of minor errors of 

their keywords. Our proposed method has the following features: 


1) Search as you type: It extends Auto complete by supporting queries with multiple keywords in XML data. 

2) Fuzzy: It can find high-quality answers that have keywords matching query keywords approximately. 

3) Efficient: Our effective index structures and searching algorithms can achieve a very high interactive speed. 

We study research challenges in this new search framework. We propose 

effective index structures and top-k algorithms to achieve a high interactive 

speed. We examine effective ranking functions and early termination 

techniques to progressively identify the top-k relevant answers. We have 

implemented our method on real data sets, and the experimental results show 

that our method achieves high search efficiency and result quality.

TRUST MODELING IN SOCIAL TAGGING OF MULTIMEDIA CONTENT


ABSTRACT:

                Tagging in online social networks is very popular these days, as it 

facilitates search and retrieval of multimedia content. However, noisy and 

spam annotations often make it difficult to perform an efficient search. Users 

may make mistakes in tagging and irrelevant tags and content may be 

maliciously added for advertisement or self-promotion. This article surveys 

recent advances in techniques for combatting such noise and spam in social 

tagging. We classify the state-of-the-art approaches into a few categories and 

study representative examples in each. We also qualitatively compare and 

contrast them and outline open issues for future research.



CONCLUSION:                               
                               
In this article, we dealt with one of the key issues in social tagging systems: 

combatting noise and spam. We classified existing studies in the literature into 

two categories, i.e., content and user trust modeling. Representative 

techniques in each category were analyzed and compared. In addition, existing 

databases and evaluation protocols were re viewed. An example system was 

presented to demonstrate how trust modeling can be particularly employed in 

a popular application of image sharing and geotagging. Finally, open issues and 

future research trends were prospected. As online social networks and content 

sharing services evolve rapidly, we believe that the research on enhancing 

reliability and trustworthiness of such services will become increasingly 

important.

Self Adaptive Contention Aware Routing Protocol for Intermittently Connected Mobile Networks


Abstract:

            This paper introduces a novel multi-copy routing protocol, called Self 

Adaptive Utility-based Routing Protocol (SAURP), for Delay Tolerant Networks 

(DTNs) that are possibly composed of a vast number of devices in miniature 

such as smart phones of heterogeneous capacities in terms of energy resources 

and buffer spaces. SAURP is characterized by the ability of identifying potential 

opportunities for forwarding messages to their destinations via a novel utility 

function based mechanism, in which a suite of environment parameters, such 

as wireless channel condition, nodal buffer occupancy, and encounter statistics, 

are jointly considered. Thus, SAURP can reroute messages around nodes 

experiencing high buffer occupancy, wireless interference, and/or congestion, 

while taking a considerably small number of transmissions. The developed 

utility function in SAURP is proved to be able to achieve optimal performance, 

which is further analyzed via a stochastic modeling approach. Extensive 

simulations are conducted to verify the developed analytical model and 

compare the proposed SAURP with a number of recently reported encounter-

based routing approaches in terms of delivery ratio, delivery delay, and the 

number of transmissions required for each message delivery. The simulation 

results show that SAURP outperforms all the counterpart multi-copy encounter-

based routing protocols considered in the study.

Packet-Hiding Methods for Preventing Selective Jamming Attacks


Abstract:

The open nature of the wireless medium leaves it vulnerable to intentional 

interference attacks, typically referred to as jamming. This intentional 

interference with wireless transmissions can be used as a launchpad for 

mounting Denial-of-Service attacks on wireless networks. Typically, jamming 

has been addressed under an external threat model. However, adversaries 

with internal knowledge of protocol specifications and network secrets can 

launch low-effort jamming attacks that are difficult to detect and counter. In 

this work, we address the problem of selective jamming attacks in wireless 

networks. In these attacks, the adversary is active only for a short period of 

time, selectively targeting messages of high importance. We illustrate the 

advantages of selective jamming in terms of network performance degradation 

and adversary effort by presenting two case studies; a selective attack on TCP 

and one on routing.We show that selective jamming attacks can be launched 

by performing real-time packet classification at the physical layer. To mitigate 

these attacks, we develop three schemes that prevent real-time packet 

classification by combining cryptographic primitives with physical-layer 

attributes. We analyze the security of our methods and evaluate their 

computational and communication overhead.




Modules:-
1. Network module
2. Real Time Packet Classification
3. Selective Jamming Module
4. Strong Hiding Commitment Scheme (SHCS)
5. Cryptographic Puzzle Hiding Scheme (CPHS)

Online Modeling of Proactive Moderation System for Auction Fraud Detection



ABSTRACT:

We consider the problem of building online machine-learned models for 

detecting auction frauds in e-commence web sites. Since the emergence of the

world wide web, online shopping and online auction have gained more and 

more popularity. While people are enjoying the benefits from online trading, 

criminals are also taking advantages to conduct fraudulent activities against 

honest parties to obtain illegal profit. Hence proactive fraud-detection 

moderation systems are commonly applied in practice to detect and prevent 

such illegal and fraud activities. Machine-learned models, especially those 

that are learned online, are able to catch frauds  more efficiently and quickly 

than human-tuned rule-based systems. In this paper, we propose an online 

probit model framework which takes online feature selection, coefficient 

bounds from human knowledge and multiple instance learning into account 

simultaneously. By empirical experiments on a real-world online auction fraud 

detection data we show that this model can potentially detect more frauds 

and significantly reduce customer complaints compared to several baseline 

models and the human-tuned rule-based system. 



Modules:

             

       • Rule-based features:
                  
    Human experts with years of experience created many rules 

to detect whether a user is fraud or not. An example of such rules is 

“blacklist”, i.e. whether the user has been detected or complained as fraud 

before. Each rule can be regarded as a binary feature that indicates the fraud 

likeliness.
   
      • Selective labeling: 

                     If the fraud score is above a certain threshold, the case will 

enter a queue for further investigation by human experts. Once it is 

reviewed,the final result will be labeled as boolean, i.e. fraud or clean. Cases 

with higher scores have higher priorities in the queue to be reviewed. The 

cases whose fraud score are below the threshold are determined as clean by 

the system without any human judgment.

  
   • Fraud churn:

                      Once one case is labeled as fraud by human experts, it is very 

likely that the seller is not trustable and may be also selling other frauds; 

hence all the items submitted by the same seller are labeled as fraud too. The  

fraudulent seller along with his/her cases will be removed from the website 

immediately once detected.

 • User Complaint:
 

                    Buyers can file complaints to claim loss if they are recently 

deceived by fraudulent sellers. The Administrator view the various type of 

complaints and the percentage of various type complaints. The complaints 

values of a products increase some threshold value the administrator set the 

trustability of the product as Untrusted or banded. If the products set as 

banaded, the user cannot view the products in the website.


CONCLUSION:
                       
In this paper we build online models for the auction fraud moderation and 

detection system designed for a major Asian online auction website. By 

empirical experiments on a real world online auction fraud detection data, we 

show that our proposed online probit model framework, which combines online

feature selection, bounding coefficients from expert knowledge and multiple 

instance learning, can significantly improve over baselines and the human-

tuned model. Note that this online modeling framework can be easily 

extended to many other applications, such as web spam detection, content 

optimization and so forth. Regarding to future work, one direction is to include

the adjustment of the selection bias in the online model training process. It 

has been proven to be very effective for offline models. The main idea there is

to assume all the unlabeled samples have response equal to 0 with a very 

small weight. Since the unlabeled samples are obtained from an effective 

moderation system, it is reasonable to assume that with high probabilities 

they are non-fraud. Another future work is to deploy the online models 

described in this paper to the real production system, and also other 

applications.