Training Data – Mutual Independence – Is this a myth?

Assumptions that the one training sample is totally unrelated to another seems to be a myth. Modelling these relationships will lead to the next generation of AI applications claims University of Washington in their description of “Statistical Relational Learning” project.

This paper nicely introduces not only SRL, but several other related terms such as:

  • Markov Model: Assuming that the next state depends solely upon the present state (a set of variables), we predict the next state by modelling the present state.
  • Monte Carlo Methods: Iterate “random sampling & observation” several times and heuristically reach acceptable approximations.
  • First Order Logic: Known as predicate logic and in contrast to propositional logic, here, we assume that the universe is a set of objects, relations and functions. For eg: count(riversInIndia) > count(riversInSrilanka).

In my subsequent posts, I will discuss these items.


ML Projects at CMU

Browsing through the projects at CMU ML Department, the NELL project titled “Read the web” particularly caught my attention. Considering web as a massive collection of facts, learning from these is a very interesting problem. This is a good read on Exploratory Learning. While dealing with huge data its natural to expect several clusters to exist. This paper on EL claims to extend the existing Semi supervised learning (SSL) algorithms into what they call as “Expectation Maximization”. This paper also gave me a summary of well known SSL techniques such as:

  • Naive Bayes: Assuming features are independent of each other, existence of each of the features contributes to the probability of the item’s classification.
  • K-Means: NP Hard problem of clustering items to their nearest mean.
  • Von Mises Fisher

Another interesting project was to detect novel topics from all available topics in time t – 1.