Pages

Tuesday 30 October 2012

Online Modeling of Proactive Moderation System for Auction Fraud Detection



ABSTRACT:

We consider the problem of building online machine-learned models for 

detecting auction frauds in e-commence web sites. Since the emergence of the

world wide web, online shopping and online auction have gained more and 

more popularity. While people are enjoying the benefits from online trading, 

criminals are also taking advantages to conduct fraudulent activities against 

honest parties to obtain illegal profit. Hence proactive fraud-detection 

moderation systems are commonly applied in practice to detect and prevent 

such illegal and fraud activities. Machine-learned models, especially those 

that are learned online, are able to catch frauds  more efficiently and quickly 

than human-tuned rule-based systems. In this paper, we propose an online 

probit model framework which takes online feature selection, coefficient 

bounds from human knowledge and multiple instance learning into account 

simultaneously. By empirical experiments on a real-world online auction fraud 

detection data we show that this model can potentially detect more frauds 

and significantly reduce customer complaints compared to several baseline 

models and the human-tuned rule-based system. 



Modules:

             

       • Rule-based features:
                  
    Human experts with years of experience created many rules 

to detect whether a user is fraud or not. An example of such rules is 

“blacklist”, i.e. whether the user has been detected or complained as fraud 

before. Each rule can be regarded as a binary feature that indicates the fraud 

likeliness.
   
      • Selective labeling: 

                     If the fraud score is above a certain threshold, the case will 

enter a queue for further investigation by human experts. Once it is 

reviewed,the final result will be labeled as boolean, i.e. fraud or clean. Cases 

with higher scores have higher priorities in the queue to be reviewed. The 

cases whose fraud score are below the threshold are determined as clean by 

the system without any human judgment.

  
   • Fraud churn:

                      Once one case is labeled as fraud by human experts, it is very 

likely that the seller is not trustable and may be also selling other frauds; 

hence all the items submitted by the same seller are labeled as fraud too. The  

fraudulent seller along with his/her cases will be removed from the website 

immediately once detected.

 • User Complaint:
 

                    Buyers can file complaints to claim loss if they are recently 

deceived by fraudulent sellers. The Administrator view the various type of 

complaints and the percentage of various type complaints. The complaints 

values of a products increase some threshold value the administrator set the 

trustability of the product as Untrusted or banded. If the products set as 

banaded, the user cannot view the products in the website.


CONCLUSION:
                       
In this paper we build online models for the auction fraud moderation and 

detection system designed for a major Asian online auction website. By 

empirical experiments on a real world online auction fraud detection data, we 

show that our proposed online probit model framework, which combines online

feature selection, bounding coefficients from expert knowledge and multiple 

instance learning, can significantly improve over baselines and the human-

tuned model. Note that this online modeling framework can be easily 

extended to many other applications, such as web spam detection, content 

optimization and so forth. Regarding to future work, one direction is to include

the adjustment of the selection bias in the online model training process. It 

has been proven to be very effective for offline models. The main idea there is

to assume all the unlabeled samples have response equal to 0 with a very 

small weight. Since the unlabeled samples are obtained from an effective 

moderation system, it is reasonable to assume that with high probabilities 

they are non-fraud. Another future work is to deploy the online models 

described in this paper to the real production system, and also other 

applications.

No comments:

Post a Comment