Parakrant Sarkar

About Me

Hello World, Thank you for visiting my page. I am a 4th-year Ph.D. student at the City University of Hong Kong. The main area of focus is on neural audio effect modelling and music enhancement.

I worked as a software developer level II at Vocera Communication. Previous to my current position, I graduated with an MS (by research) degree from Department of Computer Science and Engineering, at Indian Institute of Technology Kharagpur , India. The topic of my master's thesis dissertation was Prosody Modeling for Storytelling Style Speech Synthesis and the research work was carried out under the supervision of Dr. Krothapalli Sreenivas Rao. I received my B.Tech in Information Technology from Department of Information Technology at North Eastern Hill University, India.

My area of research interest includes speech synthesis, speech recognition, natural language processing and machine learning.

Google scholar | DBLP

Professional Experience

● Audio Research Intern | Huawei, Hong Kong SAR, China | AudioLab, Huawei Hong Kong Research Center (Apr '24 - Dec '24)

● Speech Recognition Developer Level 2 | Vocera, Bangalore, India | R&D Products Engineering Group (Oct '16 - Dec '20)

● Senior Scientific Officer | SRIC-IIT Kharagpur, West Bengal, India | Department of Information Technology, Govt. of India (Sep '12 - Sep '16)

Thesis

Topic: Prosody Modeling for Storytelling Style Speech Synthesis
Abstract | Demos

Major contributions

Development of story TTS using a neutral TTS system in Hindi with appropriate story-specific information generation and incorporation modules. It includes design, development, and integration of story-specific prosody rule-set generation and incorporation to neutral TTS.

Development of story TTS using story speech corpus.

Modeling of story-specific pause patterns is proposed with and without discourse information.

Modeling of story-specific prosody (i.e., duration, intonation, and intensity) is proposed based on story genre information.

Publications

● Parakrant Sarkar, and Permagnus Lindborg, "Diff-DEQ: Differentiable Dynamic Equalization for Studio-Quality Speech Processing", in Proceedings of the IEEE 33rd European Signal Processing Conference (EUSIPCO 2025), Palermo, Itlay, 08-12 September, 2025. [Soon to be uploaded]

● Parakrant Sarkar, and Permagnus Lindborg, "Neural-Driven Multi-Band Proessing for Automatic Equalization and Style Transfer", in Proceedings of the IEEE 27th International COnference on Digital Audio Effects (DAFx 2025), Ancona, Itlay, 02-05 September, 2025. [Soon to be uploaded]

● Parakrant Sarkar, and Permagnus Lindborg, "Dynamic EQ Simplified: A Rule-Based Method for Frequency Selective Audio Processing", in 20th Workshop on Computer Music and Audio Technology (WOCMAT 2024), Ancona, Itlay, 02-05 September, 2025. [Soon to be uploaded]

● Kumud Tripathi, Parakrant Sarkar, and K. Sreenivasa Rao "Sentence Based Discourse Classification for Hindi Story Text-to-Speech (TTS) System", in Thirteenth International Conference on Natural Language Processing (ICON-2016), IIT (BHU), Varanasi, India, 19-20 December, 2016. pdf

● Parakrant Sarkar, and K. Sreenivasa Rao, "Development of Story Text-to-Speech System based on Story Genres", in Workshop on Machine Learning in Speech and Language Processing (MLSLP 2016), Google San Fransisco, USA, 13 September, 2016. pdf

● Parakrant Sarkar, and K. Sreenivasa Rao, "Analysis and Modeling Pauses for Synthesis of Storytelling Speech based on Discourse modes", in Proceedings of the IEEE International Conference on Contemporary Computing (IC3 2015), JIIT Noida, India, 11-13 August, 2015. pdf

● Parakrant Sarkar, and K. Sreenivasa Rao, "Modeling Pauses for Synthesis of Storytelling Style Speech Using Unsupervised Word Features", Second International Symposium on Computer Vision and the Internet (VisionNet'15), Procedia Computer Science, Volume 58, Pages 42-49, Kochi, India, August, 2015. pdf

● Parakrant Sarkar, and K. Sreenivasa Rao, "Data-Driven Pause Prediction for Synthesis of Storytelling Style Speech based on Discourse Modes", in Proceedings of the IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT 2015), IIIT Bangalore, India, 10-11 July, 2015. pdf

● Parakrant Sarkar, and K. Sreenivasa Rao,"Data-Driven Pause Prediction for Speech Synthesis in Storytelling Style Speech" , in Proceedings of the IEEE 21st National Conference on Communication (NCC-2015), IIT Bombay, India, 27 February to 1 March, 2015. pdf

● Rashmi Verma, Parakrant Sarkar, and K. Sreenivasa Rao, "Conversion of Neutral speech to Storytelling Style speech", in Proceedings of the eighth IEEE International Conference on Advances in Pattern Recognition (ICAPR 2015), ISI Kolkata, India, 04-07 January, 2015. pdf

● Parakrant Sarkar Arijul Haque, Arup Kumar Dutta, Gurunath Reddy M., Harikrishna D. M., Prasenjit Dhara, Rashmi Verma, N. P. Narendra, Sunil Kr. S.B., Jainath Yadav, K. Sreenivasa Rao, "Designing prosody rule-set for converting neutral TTS speech to storytelling style speech for Indian languages Bengali, Hindi and Telugu", in Proceedings of the IEEE International Conference on Contemporary Computing (IC3 2014), JIIT Noida, India, 20-24 August, 2014. pdf