Word2graph2vec by shashankg7

Project Information

This project is done as part of major project in Information Retreival and Extraction course at IIIT Hyderabad for session 2016-17.

The course is taught by Prof. Vasudeva Varma and co-taught by Prof. Manish Gupta.

Problem Statement.

Recently there has been an increasing attention to use Deep Learning(DL) techniques to analyze social graphs, such as Flickr, Youtube, Twitter and so on. The beauty of such solution is that once DL is applied, several network mining tasks such as node classification, link prediction, node visualization, node recommendation can be solved by conventional machine learning algorithms.

In this project, we will build a model that can capture the network information of a node in an efficient and scalable manner. These learned representations will be used to do nodes classification in our project.

We will exploit the labelled information in the data to learn a better representation targeted specifically for classification task.

Project gist

This project studies the problem of embedding very large information networks into low-dimensional vector spaces, which is useful in many tasks such as visualization, node classification, and link prediction. Most existing graph em- bedding methods do not scale for real world information networks which usually contain millions of nodes. In this paper, we implemented a network embedding method called the “PTE(predictive text embedding),” which is suitable for arbitrary types of informa- tion networks: undirected, directed, and/or weighted.

The method optimizes a carefully designed objective function that preserves both the local and global network structures. An edge-sampling algorithm is proposed that addresses the limitation of the classical stochastic gradient descent and improves both the effectiveness and the efficiency of the inference. We test the method on IMDB movie review dataset. The novelty in PTE is that it exploits the labelled information in the graph to fine-tune the embeddings.

Applications

Better representation of nodes helps in solving various network mining tasks by conventional machine learning algorithms. It can be used for:

Node classification Link prediction Node visualization Node recommendation

Support or Contact

For any query related to project please feel free to contact any of the author of the project. Contact information is :

Shashank Gupta - shashank.gupta@research.iiit.ac.in

Nishant Prateek - nishant.prateek@research.iiit.ac.in

Karan Chandnani - karanchandnani21@gmail.com

Word2graph2vec

Representation learning for words and Labelled Documents by modelling them as graph (part of IRE Project at IIIT Hyderabad)

Project Information

Problem Statement.

Project gist

Applications

Links to project resources

Authors

References

Support or Contact