A function to re-encode Greek (and other) characters and to remove symbols.

cln(x, level, what, na.rm, case, repl, unlist)

Arguments

x

a vector, list or dataframe

level

optional clean level, either 0 for no-clean, default 1 to most strict 9 (see details)

what

additional characters to clean (optional)

na.rm

remove entries with NA data? (optional and logical)

case

case for text 1 for 1st uppercase, code2 for lowercase, code3 for uppercase (optional)

repl

data frame with text to replace (optional)

unlist

return a vector? (optional and logical, for vector input)

Details

This function is meant to re-encode Greek (and other) characters in the EDH set given either as list format, vector, or a dataframe produced with function edhw for example.

By default, the symbols "?" "*" "+" placed at the end of each record are removed after the re-encoding. However, when level is 0 only re-encoding is performed, and level 2 is either to force an extra iteration in the re-encoding, to remove extra spaces, or what is in what at the end of a record when clean what is invoked. With level 9 all content after an opening parenthesis is removed with all the consequences for the input text.

With repl, is possible to replace a list of text in two columns, for `text to replace' and for `text that replaces'.

Disabling option unlist returns a vector in case that x is also a vector; otherwise, it returns a list with the two versions of the input.

Value

Depending on the input, a vector, list or dataframe.

Author

Antonio Rivero Ostoic

Warning

Encoding more than once the same input requires re-starting the console; otherwise, the re-encoding is not complete.

See also

Examples

# clean Greek characters
cln("Caesar?*+")
#> [1] "Caesar*+"