top of page


Computer Vision/ Machine Learning Researcher

Neuschwanstein Castle, Germany - September 2018



Computer Vision Researcher

I  am a Research Fellow at Machine Intelligence in Medical Imaging (MI2) lab, Mayo Clinic, AZ. I received my Ph.D. from the Center for Research in Computer Vision (CRCV) at the University of Central Florida, where I was advised by Dr. Mubarak Shah and Dr. Niels Da Vitoria Lobo. I also worked at MIT-IBM Watson AI lab as a Research Scientist Intern in Summers 2020-2021. My research interests include multimodal understanding tasks involving language and vision with focus on representation learning, visual question answering, and visual grounding. My recent work is focused on learning joint representations from imaging and clinical reports and developing efficient AI systems to assist radiologists in their diagnosis.
Some of my previous research topics include semantic segmentation, weakly/unsupervised representation learning, hand detection, and hand-based action recognition.


For many applications in First Person Vision, it is necessary to accurately segment not only hands of the camera wearer but also the hands of others with whom he is interacting. In this project, we take a deep look into hand segmentation and hand-based action recognition.

Aisha Urooj Khan, Ali Borji

Analysis of Hand Segmentation in the Wild. (CVPR 2018) [pdf] [Project]


The quality of scene parsing, particularly sky classification, decreases in night time images, images involving varying weather conditions, and scene changes due to seasonal weather. We focus specifically on sky segmentation, the task of determining sky and not-sky pixels. As a result of our efforts, we have seen an improvement of 10-15% in the average MCR compared to the prior methods on SkyFinder dataset. We have also improved from an off-the shelf-model in terms of average mIOU by nearly 35%. Further, we analyze our trained models on images w.r.t two aspects: times of day and weather.

Cecilia La Place*, Aisha Urooj Khan*, Ali Borji

Segmenting Sky Pixels in Images. (WACV 2019) [arxiv]


 Height estimation in first-person videos can be a useful feature for both soft-biometrics and object tracking. Here, we propose a method of estimating the height of an egocentric camera without any calibration or reference points. We used both traditional computer vision approaches and deep learning in order to determine the visual cues that results in best height estimation. Here, we introduce a framework inspired by two stream networks comprising of two Convolutional Neural Networks, one based on spatial information, and one based on information given by optical flow in a frame. Given an egocentric video as an input to the framework, our model yields a height estimate as an output. 

Jessica Finocchiaro, Aisha Urooj Khan, Ali Borji.
Egocentric Height Estimation.(WACV 2017) [pdf]


bottom of page