GenderCrawl includes monolingual text corpora for Spanish, French, and Italian. These corpora are derived from ParaCrawl, from which we automatically selected sentences with speaker-dependent words that clarify the speaker’s gender (e.g., Spanish: Soy nueva en esta zona). For each language we collected two gender-specific corpora, one for feminine and one for masculine forms.

For comprehensive statistics and detailed information about these corpora, see the reference paper below.


These datasets are released under the Creative Commons Attribution 4.0 International license (CC BY 4.0). Please review the full license terms for more details on how you can use and share this data while giving appropriate attribution.


