Assessing ASR efficiency with that means preservation


Which means preservation in its place metric

Our analysis leveraged the Challenge Euphonia corpus, a repository of disordered speech encompassing over 1.2 million utterances from roughly 2,000 people with various speech impairments. To develop knowledge assortment to Spanish audio system, Challenge Euphonia partnered with the Worldwide Alliance of ALS/MND Associations, which facilitated the contribution of speech samples from people dwelling with ALS in Mexico, Colombia, and Peru. Equally, Challenge Euphonia expanded to French audio system by way of a partnership with Romain Gombert from the Paris Mind Institute to gather knowledge from individuals with atypical speech in France.

For our experiments, we generated a dataset of 4,731 examples consisting of floor fact and transcription error pairs together with a human label figuring out whether or not these pairs can be that means preserving or not (see particulars in our paper). We cut up the dataset into coaching, take a look at, and validation units (80% / 10% / 10%, respectively) guaranteeing the three units wouldn’t overlap on the bottom fact phrase degree.

With this knowledge, we skilled a classifier for that means preservation on high of a base LLM. Utilizing prompt-tuning — a parameter-efficient technique to adapt LLMs — we conditioned our base LLM on our coaching set to foretell the labels “sure” or “no” to point whether or not the that means has been preserved or not.

We use the next format to characterize the info to the LLM: