Alok Singh

Research Associate in Machine Learning, Oxford Sustainable Finance Group


Alok is a research associate in Machine Learning and Data Science at the Oxford Sustainable Finance Group. He completed his PhD in 2022 from the National Institute of Technology Silchar, India, during PhD focused on bridging a gap between computer vision and natural language processing. He worked on designing models which can understand the visual scene (image or video) and describing them in natural language (specifically Hindi and English). Before this, he received a master’s degree in computer science and engineering from NIT Silchar, India, in 2019. He invested valuable time during his master’s by working on Shot Boundary Detection. His research interests lie in natural language processing, multimodal machine learning, video captioning and temporal boundary detection. He is interested in enabling machines to learn from multiple modalities of data like text, audio, video, and semantics, as humans naturally do.


  • Meetei, L. S., Singh, A., Singh, T. D., & Bandyopadhyay, S. (2023). Does cues in a video help in handling rare words in a machine translation system under a low-resource setting? Natural Language Processing Journal, 100016.
  • Singh, Alok, Thoudam Doren Singh, and Sivaji Bandyopadhyay. "V2t: video to text framework using a novel automatic shot boundary detection algorithm." Multimedia Tools and Applications 81.13 (2022): 17989-18009.
  • Singh, A., Singh, T.D. & Bandyopadhyay, S. An encoder-decoder based framework for Hindi image caption generation. Multimed Tools Appl (2021). (SCIE, IF 2.757)
  • Singh, A., Singh, T.D. & Bandyopadhyay, S. Attention based video captioning framework for Hindi. Multimedia Systems (2021). (SCI, IF-1.935)
  • Chakraborty, S., Singh, A.& Thounaojam, D.M. A novel bifold-stage shot boundary detection algorithm: invariant to motion and illumination. Vis Comput(2021). (SCI, IF -2.601)
  • Singh, A., Thounaojam, D. M., & Chakraborty, S. (2019). A novel automatic shot boundary detection algorithm: robust to illumination and motion effect. Signal, Image and Video Processing, 1-9. (SCI, IF 2.157). [Code!]
  • Singh, A., Singh, S. M., Meetei, L. S., Das, R., Singh, T. D., & Bandyopadhyay, S. (2023). VATEX2020: pLSTM framework for video captioning. Procedia Computer Science218, 1229-1237.
  • Meetei, Loitongbam Sanayai, et al. "Hindi to English Multimodal Machine Translation on News Dataset in Low Resource Setting." Procedia Computer Science 218 (2023): 2102-2109.
  • Singh, A., Meetei, L. S., Singh, S.M., Singh, T.D., & Bandyopadhyay, S. An efficient keyframes selection based framework for video captioning. In Proceedings of the International Conference on Natural Language Processing ICON-2021
  • Meetei, L. S., Singh, S.M.,  Singh, A., Singh, T.D., & Bandyopadhyay, S. An Experiment on Speech-to-Text Translation Systems for Manipuri to English on Low Resource Setting. In Proceedings of the International Conference on Natural Language Processing ICON-2021 
  • Singh, S.M., Meetei, L. S., Singh, A., Singh, T.D., & Bandyopadhyay, S. On the Transferability of Massively Multilingual Pretrained Models in the Pretext of the Indo-Aryan and Tibeto-Burman Languages. In Proceedings of the International Conference on Natural Language Processing ICON-2021
  • Singh, A., Meetei, L.S., Singh, T.D., & Bandyopadhyay, S. Generation and Evaluation of Hindi Image             Captioning of Visual Genome. In Proceedings of I3CS 2021 33-4084-8_7.
  • Chakraborty, S., Thounaojam, D.M., Singh, A., Pal, G., ALO-SBD: A Hybrid Shot Boundary Detection Technique for video surveillance System. In Proceedings of ADCOM         2020 (Accepted Rank- B)