Ravee Malla

News Corpus Analysis and Exploration (IIT Delhi Thesis Project) : Prof. Maya Ramanath & Prof. Amitabha Bagchi, CSE, IIT Delhi
We are working on a system to effectively visualize the time evolution of a news corpus. We want to make the news browsing experience extremely intuitive and contextual for an end-user. We have tried various approaches, and have reached some interesting conclusions. At this stage, we are building an interface based on our experiences and inferences of what should make a good news browsing experience. The data set of articles we have used have been New York Times articles([5]) from 1987 to 2007, as well articles from The Hindu([6]). For extracting entities (person, organization, topics, etc), we have experimented with various tools like OpenCalais([2]), Stanford NER([3]), Alchemy API([4]) and have found that OpenCalais works best for a range of entities.
The work has the following broad themes involved, which have been explored separately.
- News representation as a Directed Graph
- The primary reference for our work has been [1]. We can think of news articles as nodes of a directed graph, which are arranged sequentially in time. There are links between articles based on the transformations that common "actors" undergo between these articles. There are 5 types of interactions defined between any 2 actors.
  
  Birth
  First occurrence of an actor in the recent past
  
  Cease
  Last occurence of an actor in the near future
  
  Merge
  First co-occurence of 2 actors in the recent past
  
  Split
  Last co-occurence of 2 actors in the near future
  
  Continue
  Continued co-occurrence of 2 actors, having co-occurred in the past and future
  
  The overall algorithm is to take in a set of articles, mine entities from them and then find and score all the different transformations among these entities. There on, we generate a graph with nodes as the articles labeled by the entities in the article, and the smallest possible edge set which covers all these transformations. We implemented this algorithm to run on NYTimes articles from 2000. Some graphs can be seen here and here. Some interaction graphs between dominant actors involved in US Elections 2000 can be found here.
  Observations: The directed graph, as you can see from the examples, becomes too densely connected and is very sensitive to the actor quality. Additionally, the emphasis is still on the nodes i.e. the articles, and the user is expected to navigate through each and get a sense of the story. Some more observations have been summarized in this report.
- News representation as topic-actor interaction map
- We are currently working on an interface to visualize news as a sequence of interactions among mined topics and actors, using the graph as a back-end alongside other scoring techniques. For this, we have designed a survey available here. More information can be found here.

Image Warping : Prof. Prem Kalra, CSE, IIT Delhi
Given an image which has been deformed using a Projective Transformation, we need to restore it back to the original image. We implemented the SIGGRAPH 2006 paper Image Deformation using Moving Least Squares and compared the result to simple Delaunay Triangulation. Using the implementation of the paper, Marilyn Monroe can be made to smile like this.

Image Sketching : Prof. Prem Kalra, CSE, IIT Delhi
In this assignment, we implemented a tool to create a pencil sketched version of a city map or a portrait. Various references like this and this were studied. The final result is something that makes my favourite guitarist (Brad Delson) look like this.

Image & Video Compression : Prof. Prem Kalra, CSE, IIT Delhi
We implemented and compared different compression schemes for images and videos like Gaussian/LaPlacian Pyramids (as described here) and standard dictionary based approaches like LZW. For video, we achieved further compression by computing interframe correlation and subtracting all the static components of the video.

Cortical Image Flattening : Prof. Shantanu Ghosh & Prof. Sumantra Dutta Roy
We considered the problem of image flattening of the Cortical Surface of the Human Brain. Images of the Brain are acquired through MRI (Structural and Functional) scans, which operates on the principle of Nuclear Magnetic Resonance.

The output is a 3D image of the brain, which is analyzed as slices of 2D images. However, since the surface of the brain is highly convoluted, this distorts distance between two points on the surface and makes it difficult to see the tumours that lie deep in a Sulcus(depression). Our approach was to think of the flattening as a mapping from a 3D surface to a 2D plane, preserving neighbourhood topology and minimizing surface distortion.

The first three projects were done as part of the course Digital Image Analysis along with Jatinder.
Cortical Surface Flattening was pursued during summers of 2010 along with Utkarsh.

Assignments on PintOS : Prof. Sorav Bansal, CSE, IIT Delhi
We implemented thread scheduling, system calls (fork, exit, wait, etc), virtual memory management (using page tables, swap disks & memory maps), ability to run user programs & file system control. Coding was done in C, and SVN was used for version control. The project helped me to understand large-scale code organization, appreciate synchronization protocols (the very very hard way), criticality of backing up your code (after Rahul almost murdured me when I lost some of it) and the importance of designing code on paper before coding it.
I did 2 parts each with my batch mates Rahul & Pranay, both of whom have the gift of psychic intuition & was guided by Ashwin, our unofficial mentor. :~)
Vulnerabilities in C : Prof. Huzur Saran, CSE, IIT Delhi
We exploited certain well known vulnerabilities that result due to unsafe buffer operations in C. Though most of these problems are now solved, legacy code still poses a threat. Some of them include buffer overflows, format string attacks, return-to-libc, GOT-table attack, double free bug, etc.
Heap Spraying attack : Prof. Huzur Saran, CSE, IIT Delhi
We performed a Heap Spraying attack on the Ice weasel browser. As a result, we could remotely have the user open a particularly crafted malformed HTML file, which would spray the browser's heap with malicious code (shell), and have the user unknowingly execute it, gaining access to the user's shell.
Fuzzing : Prof. Huzur Saran, CSE, IIT Delhi
We explored the various tools and techniques that constitute Fuzzing, a general approach to hack into any software.

Bandwidth Assignments on Blackberry : Prof. Aaditeshwar Seth, CSE, IIT Delhi
Our group implemented a few applications on the Blackberry RIM platform to study network characteristics for mobile devices inside the IIT campus. In my group were the ever enthusiastic Rajwinder, Narinder and KnowledgeOcean (GyanSagar in Hindi). Our compiled report is available here.
P2P Bluetooth transfers : Prof. Aaditeshwar Seth, CSE, IIT Delhi
We studied some interesting properties of a community network of bluetooth devices. We investigated the possibility of file transfer in a P2P fashion within the network. The final report is available here.
Activity Recognition using Sun SPOT : Prof. Vinay Ribeiro, CSE, IIT Delhi
We considered the problem of Activity Recognition of an individual in real time and classifying it as one of the predefined activities like Standing, Walking, Running, Jumping, etc. We used Sun SPOT, a wireless accelerometer device which communicated the normalized accleration values along the three directions to a basestation using Radio Frequency. We experimented with Feature Vector Modelling and applying Bayesian Classification. However, since our data was stochastic in nature, we also analyzed the Statistical patterns observed for each activity. This project was pursued as part of Wireless Networks course along with Rahul, Arvind and Jagjeet. More details can be found in our report.
ARP Cache Poisoning and Spoofing : Prof. Huzur Saran, CSE, IIT Delhi
We wrote a small script to poison the ARP cache of a particular IP on the LAN by sending garbage ARP reply packets. With this, the unsuspecting host sends IP packets meant for some other IP to the attacker, since the attacker has changed the IP-to-MAC mapping, writing attacker's MAC in place of destination MAC. This can be used to take over any TCP connection and become a Man-in-the-middle.

Logisim H/W Design Tool : Prof. Anshul Kumar, CSE, IIT Delhi
Our group made changes to the Logisim design tool used at IIT Delhi and many other universities as part of the architecture course. The original software, written in Java, has been designed by Prof. C. Burch. The project required us to understand the architecture and design of the software, and build on top of it. In my group were Arunim & Kshiteej, both brilliant programmers. The modified software is available here.
2-D Ping Pong on SVGA : Prof. M Balakrishnan, CSE, IIT Delhi
As part of my digital hardware course, we used the Virtex 2 Pro Digilent board (FPGA) to simulate a 2D ping pong game onto the VGA screen, coded up in VHDL. We also incorporated speed, level and 1 or 2 player modes. I worked with Utkarsh, who incidentally has the ability to communicate and talk in binary.
MIPS Simulator : Prof. Anshul Kumar, CSE, IIT Delhi
We designed a MIPS simulator using Logisim design software which implemented all the basic instructions of MIPS in the Multi-Cycle design approach. We extended this design to include exception handling of all the common exceptions like arithmetic, bad instruction, wrong address and external exceptions.

These assignments were done as part of the software design course along with my group partner Simranjit, who doesn't mind being called Simran :)

VLSI Chip Router : Prof. Preeti Ranjan Panda, CSE, IIT Delhi
We designed a VLSI Chip router which took an input grid of size upto 5000X5000 along with the input pairs to be routed using A* algorithm with some heuristics. The output was displayed onto the screen as well as written onto a text file.
Operation Scheduler : Prof. Preeti Ranjan Panda, CSE, IIT Delhi
The objective was to come up with a time and/or resource optimum scheduling given a set of operations of certain type(s) (addition, division, etc), and a set of resource instances (2 adders, 3 dividers, etc) along with the delay, pre-requisite & maximum resource/time constraints.

Time evolution of stock price correlations using Multidimensional Scaling : Prof. Anirban Chakraborti, MAS, École Centrale Paris
Given a set of companies, we model distances between them using their stock price correlations and visualize the data using Multidimensional Scaling. This analysis was extended for different periods of time to highlight how the correlations vary by generating differential graphs. We found that such a method provides useful insight into how the market is behaving at one glance. A theoretical report was submitted on the general idea as part of my Colloquium presentation.
Correlating share prices & company ratios in the Indian market : Prof. Shveta Singh, DMS, IIT Delhi
We studied the correlation patterns between company ratios and the share prices of certain large scale companies in various sectors of the Indian market. The correlation measure gives an insight into how fundamental the Indian market is. The sectors we considered were Banking, Oil, FMCG and Automobile. This idea was an extension of the work above to the Indian market. A report can be found here. Our conclusion was that even though there is a huge frequency difference between stock price change and company ratios, there still is a general correlation between them.

Auto-site
A python script to create a barebones web presence, based on the template of this page and Bootstrap. Just change the parameters and have the pages ready! Then you can individually tweak the pages to your needs. Version 1 is up here.
Company visualization
An experiment with d3 Tree layout to visualize the companies that open typically for placement. See it here.

Effects of Domain and Search Expertise on Web Search : Monojit Choudhury and Kalika Bali
We conducted a study to measure how web Search Expertise (SE) and Domain Expertise (DE) affects the overall search performance and search behavior of a user on domain specific search tasks. Search performance is a measure of how fast a user is able to arrive at an accurate answer for a given fact finding search tasks. Domain Expertise is a measure of the knowledge of the user in the domain from which the online search task was created.
Clustering Questions using their Answers : Sachindra Joshi, IBM IRL
Our problem was to aggregate related questions asked on multiple forums and create a holistic FAQ/Question repository. This helps in deduplicating similar questions and answering them if a related question was already answered elsewhere. However, tf-idf text clustering performs poorly when the text is in the form of questions, chiefly because questions are usually brief. For eg., How do I activate this product? and How do I obtain a license key? would have similar answers, but have poor word overlap. However, if we had access to their answers we can learn that questions like this are very similar. We implemented a clustering algorithm which used the information in the answers domain to guide and improve the clustering of the question. The idea was inspired from [1].

Projects