Voice data is the raw material of artificial intelligence systems that process, recognise and synthesise human speech. The quality of an automatic speech recognition model, a text to speech system or a voice-enabled virtual assistant depends directly on the quality, diversity and annotation rigour of the data it was trained on. Without good data, there is no good model.
At Voices & Media Solutions, we supply voice data to technology companies, artificial intelligence startups and research teams developing or improving vocal processing systems. Our unique position in the Portuguese language market — with coverage of Portugal, Brazil, Angola, Cape Verde and Mozambique — makes us a difficult partner to replace for those who need data representative of Portuguese variants. We also work with native voices across more than 70 languages.
Voice data consists of sets of human speech recordings that are organised, transcribed and annotated for use in training artificial intelligence models. These are not random recordings or low-quality captures. A voice dataset usable for AI training must meet a rigorous set of technical and linguistic criteria: controlled audio quality, voice diversity, coverage of speech contexts and styles, accurate transcriptions and annotations that enable the model to learn relevant patterns.
This data feeds three main categories of systems: automatic speech recognition, which converts speech to text; voice synthesis, which converts text to speech; and spoken language understanding models, which interpret the meaning of what is said. Each of these applications has specific requirements in terms of the type of data needed.
Our service focuses on Custom voice data collection. When a client needs data with specific characteristics, we manage the entire process: defining the required speaker profiles, creating reading scripts or spontaneous speech scenarios, conducting recording sessions under controlled conditions, and transcribing and annotating the material produced.
Custom collection is the right solution when a project requires a precise demographic profile, a specific accent, a particular subject domain or a volume of data not available from other sources. It is a more time-intensive process than accessing pre-existing datasets, but it guarantees data fully aligned with the requirements of the model to be trained.
Portuguese is the fifth most spoken language in the world, with over 260 million native speakers across three continents. Yet in the context of voice data for AI, it remains an under-represented language — particularly in its African variants. Most available datasets cover European Portuguese or Brazilian Portuguese reasonably well. Representative data for Angolan, Cape Verdean or Mozambican Portuguese is scarce.
Voices & Media Solutions has the capacity to supply voice data across all major Portuguese variants: European, Brazilian, Angolan, Cape Verdean and Mozambican. This coverage results from years of work with native speakers from each region and a network of voice professionals across Portuguese-speaking countries. For companies developing speech recognition or voice synthesis systems in Portuguese, this coverage is a critical resource.
The quality of a voice dataset is not measured solely by the number of hours recorded. The criteria that determine the real utility of data for AI model training include:
The voice data we supply is used across a wide variety of projects:
Every voice data project begins with a technical conversation. We need to understand the model the client intends to train, the languages and variants required, the volume of data needed, the level of annotation required and the available timeline. With that information, we present a proposal with the most suitable options.
Every project has different requirements. Some need volume. Others need linguistic specificity. Others need both. Our technical team is available to assess what you need and recommend the most efficient approach — whether Custom collection, access to existing datasets or a combination of both.
Clients