HCI-FACE Backend
This backend provides an API for stt, tts, and chatbot functionality to the HCI-FACE.
Getting Started
To start up the api, run:
The stt, tts, and chatbot functionality is placed into the utils folder. Each can be used and tested individually.
Detailed descriptions of each module can be found in the Modules Section.
Full code references are included in the Reference Section.
Tasks
- finish face control api (AU and left-right control)
Roadmap
Chatbot
- Additional local backends
- currently just have support for hugginface models and pipeline, will add support for custom pytorch or other ml models
- Additional cloud backends
- currently just support OpenAI, but in future will add support for other classification and chatbot services, starting with aws.
STT
The current STT module is stable, but future plans include:
- Live diarization
- Diarization is currently only available for transcribing full audio files. Future work will enable integrate the stt more closely with the conversation, so that speakers can be tracked while they are transcribed from audio clips.
- Stream handling
- Stream segmentation has been moved to the frontend for now. Future work will enable command line support for transcribing live streams.
- Cloud STT & Diarization
- Currently the stt module is entirely local, as it doesn't require heavy computational resources or a GPU. In the future we will add support for cloud transcription to better support use on lightweight/edge devices.
TTS
The current TTS module is stable, particularly with polly. Coqui and Viseme generation will need a fair amount of work, but that should not change the fundamental module api.
- Improved viseme processing
- This will bee done in conjunction with the frontend, to improve the accuracy and number of visemes provided. Additional work needs to be done to improve the timing of the custom viseme generation.
- Coqui documentation
- The current module is a mess, I'll clean it up and standardize the documentation.
Recording
In the future we will support more sophisticated logging accross all modules.