Outlier Detection and Description
The goal of the workshop on Outlier Detection and Description (ODD) is to address outlier mining as the twofold task of outlier detection, and outlier description. In other words, the quantitiave and qualitative analysis of anomalies in data. These topics are rarely considered in unison, and literature for these tasks is spread over different research communities. The main goal of ODD is to bridge this gap and provide a venue for knowledge exchange between these different research areas for a corroborative union of quantitative and qualitative analyses for the study of outlier mining.
We are proud to have Charu Aggarwal and Raymond Ng as keynote speakers.
Charu is a Research Scientist at IBM T.J. Watson, New York. His research interests include outlier analysis, graph mining, social networks, data stream mining, and mining high dimensional data. He has published over 200 papers in refereed conferences and journals, 8 books, and has applied for or been granted over 80 patents. His h-index is 56. In January 2013 he published a monograph on Outlier Analysis.
Raymond is a Professor of Computer Science at the University of British Columbia, Canada. His research areas include data mining, health informatics and data bases. In recent years, he has been focusing on the analysis of genomics data and text data. Amongst many contributions, he is one of the co-authors of the famous LOF outlier detection algorithm and one of the first outlier description methods on Finding Intensional Knowledge.
Each keynote will be 30 minutes long, including questions.
ODD is a half-day workshop on August 11th, organized in conjunction with ACM SIGKDD 2013.
The program for ODD is:
'Outlier Detection in Personalized Medicine'
by Raymond Ng
Personalized medicine has been hailed as one of the main directions for medical research in this century. In the first half of the talk, we give an overview on our personalized medicine projects that use gene expression, proteomics, DNA and clinical features. In the second half, we give two applications where outlier detection is valuable for the success of our work. The first one focuses on identifying mislabeled patients, and the second one deals with quality control of microarrays.
'Enhancing One-class Support Vector Machines for Unsupervised Anomaly Detection' (slides)
by Mennatallah Amer, Markus Goldstein, Slim Abdennadher
'Systematic Construction of Anomaly Detection Benchmarks from Real Data' (slides)
by Andrew Emmott, Shubhomoy Das, Thomas Dietterich, Alan Fern, Weng-Keen Wong
'Outlier Ensembles' [Related Paper]
by Charu Aggarwal
Ensemble analysis is a widely used meta-algorithm for many data mining problems such as classification and clustering. Numerous ensemble-based algorithms have been proposed in the literature for these problems. Compared to the cluster- ing and classification problems, ensemble analysis has been studied in a limited way in the outlier detection literature. In some cases, ensemble analysis techniques have been implicitly used by many outlier analysis algorithms, but the approach is often buried deep into the algorithm and not formally recognized as a general-purpose meta-algorithm. This is in spite of the fact that this problem is rather important in the context of outlier analysis. This talk discusses the various methods which are used in the literature for outlier ensembles and the general principles by which such analysis can be made more effective. A discussion is also provided on how outlier ensembles relate to the ensemble-techniques used commonly for other data mining problems.
A [summary] of this talk was very recently published in the SIGKDD Explorations.
'Anomaly Detection on ITS Data via View Association' (slides)
by Junaidillah Fadlil, Hsing-Kuo Pao, Yuh-Jye Lee
'On-line relevant anomaly detection in the Twitter stream: An Efficient Bursty Keyword Detection Model' (slides)
by Jheser Guzman, Barbara Poblete
'Distinguishing the Unexplainable from the Merely Unusual: Adding Explanations to Outliers to Discover and Detect Significant Complex Rare Events'
by Ted Senator, Henry Goldberg, Alex Memory
'Latent Outlier Detection and the Low Precision Problem' (slides)
by Fei Wang, Sanjay Chawla, Didi Surian
|12:00||Discussion & Closing|
|Lunch (on your own)|
|Submission Deadline||4th of June 2013, 23:59 PST (extended)|
|Notification to Authors||22st of June 2013, 23:59 PST|
|Camera-ready Deadline||3rd of July 2013, 23:59 PST|
|Workshop day||11th of August 2013|
Traditionally, outlier mining and anomaly detection focused on the automatic detection of highly deviating objects. It has been studied for several decades in statistics, machine learning, data mining, and database systems, and led to a lot of insight as well as automated systems for the detection of outliers.
However, for today's applications to be successful, mere identification of anomalies alone is not enough. With more and more applications using outlier analysis for data exploration and knowledge discovery, the demand for manual verification and understanding of outliers is steadily increasing. Examples include applications such as health surveillance, customer segmentation, fraud analysis, or sensor monitoring, where one is particularly interested in why an object seems outlying.
Example: Consider outlier analysis in the domain of health surveillance. An outlier might be a patient that shows high deviation in specific vital signals like "heart beat rate" and "skin humidity". If this patient is only detected by a traditional algorithm, this is not sufficient in case of health surveillance: health professionals have to be able to verify the reasons for why this patient stands out in order to provide proper medical treatment accordingly. It is a major task for outlier analysis to assist in such a manual verification. Hence, outlier mining algorithms should provide additional descriptive information. These outlier descriptions should be easy to understand and should highlight the specific deviation of an outlier in contrast to regular patients.
Even though outlier detection has been studied for several decades, awareness for the need of outlier descriptions has only recently raised attention in the data mining community. Mining outlier descriptions is currently being studied in different forms in contrast mining, pattern mining, data compression, graph outlier mining, subspace outlier mining, in addition to other fields including data visualization, image saliency detection, and astronomy. We strongly believe there is a significant overlap in the techniques of these different fields and that developments in either setting can have a significant impact on the other. Therefore, the goal of this workshop is to bring together researchers with a shared interest in outlier detection and outlier description methods, whether for use in traditional databases, graph databases, data streams or in the processing of other large and complex data sources.
Our aim is hence to bring these and other communities together in one venue. With ODD, our objectives are to: 1) further increase the general interest on this important topic in the broader research community; 2) bring together experts from closely related areas (e.g., outlier detection and contrast mining) to shed light on how this emerging new research direction can benefit from other well-established areas; 3) provide a venue for active researchers to exchange ideas and explore important research issues in this area. Overall, the idea behind ODD is that outlier detection and description together will provide novel techniques that assist humans in manual outlier verification by easy-to-understand descriptions, and so will help to advance the state of the art and applicability of outlier mining.
Call for PapersTopics of interests for the workshop include, but are not limited to:
- Interleaved detection and description of outliers
- Description models for given outliers
- Pattern and local information based outlier description
- Subspace outliers, feature selection, and space transformations
- Ensemble methods for anomaly detection and description
- Descriptive local outlier ranking
- Identification of outlier rules
- Finding intensional knowledge
- Contextual and community outliers
- Human-in-the-loop modeling and learning
- Visualization techniques for interactive exploration of outliers
- Comparative studies on outlier description
- Related research fields
- Contrast mining
- Change and novelty detection
- Causality analysis
- Frequent itemset mining
- Compression theory
- Subgroup mining
- Subspace learning
- Formal outlier mining models
- Supervised, semi-supervised, and unsupervised models
- Statistical models
- Distance-based models
- Density-based models
- Spectral models
- Constraint-based models
- Ensemble models
- Outlier mining for complex databases
- Graph data (e.g. community outliers)
- Spatio-temporal data
- Time series and sequential data
- Online processing of stream data
- Scalability to high dimensional data
- Applications of outlier detection and description
- Fraud in financial data
- Intrusions in communication networks
- Sensor network analysis
- Social network analysis
- Health surveillance
- Customer profiling
- ... and many more ...
Submission is closed.
- Fabrizio Angiulli, University of Calabria
- Ira Assent, Aarhus University
- James Bailey, University of Melbourne
- Arindam Banerjee, University of Minnesota
- Albert Bifet, Yahoo! Labs Barcelona
- Christian Böhm, LMU Munich
- Rajmonda Caceres, MIT
- Varun Chandola, Oak Ridge Nat. Lab.
- Polo Chau, Georgia Tech
- Sanjay Chawla, University of Syndey
- Tijl De Bie, University of Bristol
- Christos Faloutsos, Carnegie Mellon University
- Jing Gao, University of Buffalo
- Manish Gupta, Microsoft, India
- Jaakko Holmén, Aalto University
- Eamonn Keogh, University of California – Riverside
- Matthijs van Leeuwen, KU Leuven
- Daniel B. Neill, Carnegie Mellon University
- Naren Ramakrishnan, Virginia Tech
- Spiros Papadimitriou, Rutgers University
- Koen Smets, University of Antwerp
- Hanghang Tong, CUNY
- Ye Wang, The Ohio State University
- Arthur Zimek, LMU Munich
- Leman Akoglu (Stony Brook University)
- Emmanuel Müller (Karlsruhe Institute of Technology)
- Jilles Vreeken (Universiteit Antwerpen)
odd13kdd (at) gmail.com