Monday, May 16, 2005

Analysis of the ANF Paper

ANF: A Fast and Scalable Tool for Data Mining in Massive Graphs

Authors
Christopher R. Palmer,
Phillip B. Gibbons
Christos Faloutsos, KDD 2002

Overview
What is it?
- fast and memory efficient algorithm for approximating the complete neighborhood function for a graph
- can use it answer "questions" on graph-represented data

Sample applications
- How robust is the Internet to failures?
- Analyze calling graph from telephone companies to detect fraud or marketing opportunities
- Compare graphs (&subgraphs)- find similarities
- Analyze structure - is it hierarchical...
- Perform Clustering
- Determine important verticies

Claim to fame
- very, very fast
- approximate algorithm but still very accurate compared to others
- can handle disk resident graphs

Sample Usage Scenario in Paper
Tic Tac Toe
- graph where vertices are valid boards (|V|= ?)
- edge between vertices indicates a possible move
- compute the speed by which each of the 9 possible starting moves can lead to victory
- each position on the board is given an "ANF importance" factor - center had the highest factor

How can this be used for Anomaly detection for IDS?
High Level Approach
Represent your attack free training data as a graph, the testing data as another graph. Perform delta analysis to find anomalies.
Does not seem like a natural fit as what should your vertices and edges represent?
Should vertices be unique ip addresses? Does an undirected edge represent that the two ip addresses have communicated? How does the edge map to the features of that message communication.

0 Comments:

Post a Comment

<< Home