A function to obtain variable data and perform transformations on the EDH dataset.

edhw(x = "EDH", vars, as = c("df", "list"), type = c("long", "wide", "narrow"), 
     split, select, addID, limit, id, na.rm, ldf, province, gender, rp, ...)

Arguments

x

a list object name with fragments of the EDH dataset (optional)

vars

vector of variables of interest from x; if x=NULL, the entire EDH dataset is taken (optional)

as

format to return the output; either as a "list" or a data frame "df" object.

type

type format of data frame; either "long" or "wide" ("narrow" not yet implemented)

split

divide the data into groups by id? (optional and logical)

select

vector with "people" variables (optional)

addID

add identification to the output? (optional and logical)

limit

integer or vector to limit the returned output. Ignored if id is specified (optional)

id

select only hd_nr records (optional, integer or character)

na.rm

remove entries with NA data? (optional and logical)

ldf

is x list of data frames? (optional and logical)

province

name or abbreviation of Roman province in EDH as in rp dataset

gender

gender of people in EDH: male or female

rp

customized list of Roman provinces as in rp dataset

...

optional arguments if needed.

Details

This is an interface to extract attribute variables in vars from the EDH dataset attached to this package. However, the input in x can be fragments of the EDH dataset or from the Epigraphic Database Heidelberg API obtained by functions get.edh or get.edhw with the "rjson" format, or transformed data organized, for example, by provinces. When x is explicit, it must be at least a list object with a comparable structure to the EDH dataset.

Through vars argument and return the output either as a list with list or a data frame with df. In case that argument vars is missing, then all entries in x are taken.

By default, a list object is returned, with or without an `ID' identification provided by the addID argument. When the input list is converted into a data frame, the ordering of the variables is given alphabetically. If desired, it is also possible to remove missing data from the output by activating na.rm and work with complete cases.

Arguments id and limit serve to reduce the returned output either to some Epigraphic Database number or to numbers, which are specified by hd_nr, or else by limiting the amount of the returned output. limit here is like the limit argument of function get.edh, but in this case the offset can be specified as a sequence. While limit is a faster way to get to entries in the EDH dataset, argument id is for referring to precisely one or more hd_nrs in the Epigraphic Database Heidelberg API.

Component "people" is a separated list in the EDH dataset, and it should be considered as a separate case from the rest of the variables. In the case that the output is a data frame, the default output is a `long' type table; that is records can appear in different rows and each variable is assigned into a single column, and with this option is possible to select "people" variables like gender and origin.

When choosing people variables with select and a data frame output, then "people" attribute must be in vars.

By setting "wide" in type, it is possible to place the different people from a single entry column by column in the data frame and each record has a single row. Finally, argument split allows for dividing the data in the data frame into groups by `id', which corresponds to the HD number of inscription in the EDH dataset.

Ad hoc arguments are the EDH entries province and gender for entering a Roman province and people's gender in x as a data frame; otherwise, these arguments are ignored. When province is used, it is possible to refer to a customized list of provinces with argument "rp"; otherwise, dataset rp is the default where names and abbreviations are accepted.

Argument ldf is a flag when the input in x is a created list of data frames that is organized by variables rather than by records as in the EDH dataset.

Value

A list or a data frame with a long or wide format, depending on the input arguments.

Argument province with no vars returns a list of lists.

References

https://edh.ub.uni-heidelberg.de/data/api

Author

Antonio Rivero Ostoic

Note

Warning messages are given for the EDH dataset as the input, and when choosing the province argument alone.

Examples

if (FALSE) {
## load dataset
data(EDH)

## make a list for three variables in 'EDH' for first 4 entries
edhw(vars=c("type_of_inscription", "not_after", "not_before"), limit=4 )

## as before, but also select 'gender' from 'people'
edhw(vars=c("people", "not_after", "not_before"), select="gender", limit=4 )}