3. Subcorpus

This section provides guidance on building a customized subcorpus of EANC:

3.1. Subcorpus window
3.2. Subcorpus criteria
3.3. Authors and Titles
3.4. Year / Period
3.5. Genre
3.6. Subcorpus preview

3.1. Subcorpus window

A subcorpus is a limited subset of EANC that is defined by specific metatext criteria, such as author, title, year of creation, or genre. Any query in EANC can be directed to a custom-built subcorpus rather than to the entire EANC.

See query example

To create, modify, or reset a subcorpus, click Specify Subcorpus to open the subcorpus window. To use several subcorpuses simultaneously, open several search windows by clicking Search in new window and specify a different subcorpus in each of them.

3.2. Subcorpus criteria

The Subcorpus window has six areas:
  • Authors and Titles
  • Period
  • Text Genre
  • Prose/Poetry
  • Original/Translated
  • Orthography 

Selections within any one area will add up. For example:
  • If you select 4 authors and 7 titles, your subcorpus will include all titles by those 4 authors plus the 7 titles specified regardless of whether any of the 7 titles are written by any of those 4 authors;
  • If you select Short stories and Memoirs in the Text Genre Area, your subcorpus will include all texts tagged as Short stories plus all texts tagged as Memoirs regardless of whether those genres overlap in EANC.

Selections in different areas will intersect. For example:
  • If you select Fiction in the Genre Area and Translated in the Original/Translated Area, your Subcorpus will include only translated fiction, i.e. only those texts that have both metatext tags;
  • Similarly, if you select 1920-1930 in the Period Area and Poetry in the Prose/Poetry Area, your subcorpus will only include poetry written between 1920 and 1930.

See query example

Note that specifying intersecting criteria in different areas may result in an empty subcorpus. To check the size of the selected subcorpus, use the Preview button.

3.3. Authors and Titles

You can limit your subcorpus to texts written by one or several authors.
To do so, click Select Author and select individual authors you want to include. If you want to undo your selection, click Deselect All.

You can also limit your search to one or several titles included in EANC. To do so, click Select Title. In the window that opens, all EANC titles are grouped by four major genres: fiction, nonfiction, oral, and press. Within each genre, titles are broken into groups of 50 in alphabetical order:

Click on a group header to select individual titles within that group:

You can undo your selection by clicking Deselect All.  Note that if no selection is made in the Authors and Titles Area, no restriction will be placed on authors/titles when defining a subcorpus.

3.4.Year / Period

You can limit your subcorpus to texts created before or after a certain year, or created during a given period of time. The years must be specified in ‘YYYY’  format (e.g. 1973). If you fill in the from field only, the subcorpus will include only texts created in that year or later. If you fill in the to field only, the subcorpus will include only texts created in that year or earlier. Finally, if you fill in both fields, only texts created between the two years will be included.

When working with the Period Area, please note that:
  • For many texts in EANC, no reliable data is available on the exact year of creation;
  • Some EANC texts were created over a period of time, not a single year;
  • For some texts, the year when the book was published is used;
  • For yet other texts, a proxy based on the author’s age (age 18 – death) is used;
  • Translations are dated by the year of translation (or its publication), not the year of the creation of the original text.

Texts dated by a period of several years (rather than a single year) will be included in the subcorpus if that period falls entirely within the period selected by the user for the subcorpus.

3.5. Genre

In the Genre Area you can limit your subcorpus to one or several genres or subgenres.

When you open the Subcorpus window all genres and subgenres are checked and all dropdown fields are set to ‘any’. This means that no genre limitations are applied by default. If you want to limit the search to specific genres, you need to uncheck the types of texts you want to exclude.

You can also use the Select All and Deselect All buttons to manage the genre selection effectively.

3.6. Subcorpus preview

Once you have specified the subcorpus selection criteria, you can verify the size of the selected subcorpus by clicking the Preview button.

The Preview window offers a snapshot of your subcorpus:

  • The Preview window will notify you if your subcorpus is empty
  • If it is not empty, the Preview window will show the number of titles, sentences and tokens in your subcorpus; it will also display the number of tokens in the subcorpus as a percentage of the total number of EANC tokens (relative size)
  • If there are less than 1,000 titles in your subcorpus, the Preview window will also provide an option to fine-tune your selection by unchecking individual titles.

Click Confirm to close the window and save your subcorpus selection.