ASCA is a Sound-Change applier written in Java. It is similar to Mark Rosenfelder's Sounds, or MUBA's VSCA, the program to which this is the spiritual successor.
The development of ASCA is currently in hiatus – it has be deprioritized because of my development of the MCA(R)S package, but I hope to move toward an improved ASCA v0.2.0 in the near future.
ASCA is currently in version 0.1.6 and supports a number of useful features:
You can download the latest version (v0.1.6) from here.
ASCA is command-line operated, with no future plans for a GUI; rather than running ASCA through a terminal, it is recommended that you use batch or script files to run it. ASCA is available in JAR format, as ASCA.jar and requires JRE 1.6 or greater to run.
You will need to specify three parameters for ASCA to run: an input lexicon (-f or --file), an output lexicon (-o or --out), and a rules file (-r or --rules).
java -jar ASCA.jar -f PIE_LEX.txt -o PKK_LEX_ASCA.txt -r PKK_RULES_ASCA.txt
These files must be specified or ASCA will not run. You can use either local or absolute paths, so these files could be located anywhere on your system. There are two more commands which can be used in addition to these: the CSV switch (-c or --csv), and help (-h or --help), which will also be displayed by default if the file paths are not specified. I have recently added two new features, which are engaged by the commands --diff and --changes. The former prints the source word in addition to the resulting form:
deh₂-w- > dāu deh₃- > dō dēh₃-r- > dōr deh₄i- > dai dēh₄-mos > dāmos
The --changes command requires a file parmeter, i.e. the file you want to write the log to. This will write your output file normally, but will also print a log showing each word and how it is affected by each applicable rule:
( bʱakʲóh₄- )
bʱakʲóh4- > bʱakʲoh₄- á é í ó ú > a e i o u
bʱakʲoh4- > bʱakʲoh₄ - > 0
bʱakʲoh4 > bʱakʲoʕ h₂ h₄ > ʔ ʕ
bʱakʲoʕ > pʰakʲoʕ bʱ dʱ gʲʱ gʱ gʷʱ > pʰ tʰ kʲʱ kʰ kʷʰ
pʰakʲoʕ > pʰacoʕ kj gj > c ɟ
pʰacoʕ > pʰacoʔ ʕ > ʔ
pʰacoʔ > pʰacōʔ @VS > @VL / _X{C #}
pʰacōʔ > pʰacō X > 0 / @VL_{C # u}
pʰacō
( bʱardʱ-eh₄- )
bʱardʱ-eh₄- > bɦardɦeh4 - > 0
bʱardʱeh₄ > bʱardʱeʕ hₓ h₄ > ʔ ʕ
bʱardʱeʕ > pʰartʰeʕ bʱ dʱ gʲʱ gʱ gʷʱ > pʰ tʰ kʲʰ kʰ kʷʰ
pʰartʰeʕ > pʰartʰaʕ [E] > [A] / _{x ʕ}
pʰartʰaʕ > pʰartʰaʔ ʕ > ʔ
pʰartʰaʔ > pʰartʰāʔ @VS > @VL / _X{C #}
pʰartʰāʔ > pʰartʰā X > 0 / @VL_{C # u}
pʰartʰā > bartʰā @CH > @J / #(C)(C)_VR(s)@CH
barthā
The first thing you will need is a lexicon, whose entries are separated by line breaks. Linux (LF), DOS (CR-LF), and Mac (CR) breaks are all accepted in input, but only DOS breaks are written in the output, as I beleive these are compatible with all systems (since it uses both options). If you are managing your lexicon in a spreadsheet like LibreOffice, the line-break format should not matter. If you are using a PC, I would recommend that you get Notepad++, which is like Notepad but much more flexible.
Using the -c or --csv switch allows the lexicon to be read and written in comma-separated format, so that word-boundaries are recognized correctly even in tables.
The next thing you will need to run ASCA is the rules file, which is really the meat of the whole program. An ASCA rules file has three types of commands, which I will discuss in turn.
ASCA allows the use of full-line comments as well as inline-comments. Full line comments are delineated by the percent sign %:
% This is a full-line comment N = m n % This is an inline comment
It remains possible to use the hash sign # for a full line comment and double-hash-sign ## for full-line comments, though this functionality should be considered depricated.
ASCA allows you to define variables with the command like A = a b c d. There are no restrictions on Variable naming, but there are three recommended formats. Variables identified by a single character can remain as-is; Variables whose names are longer should be preceded by a non-reserved sigil like @ $ ( other characters, namely # _ \ > % are reserved); Variables with especially long names, such as those which name a phonetic class, are best wrapped in square brackets [ ]. Square-bracket names are not fully protected in the current version, and thus cannot contain reserved characters like _ # > / or spaces.
The assignment command consists of three parts: the Key, the assignment operator =, and the space-separated list of Values. The Values may also consist of other Variables. It is recommended that variable names consist only of capital letters and numbers, unless bracketed. However, ASCA should be able to handle any Unicode values; you just want to avoid any confusion with string-literals and shorter variable names — in the event of a conflict, the longest interpretation is always preferred. Some Variable assignments look like this:
L = r l N = m n R = L N @Q = kʷʰ kʷ gʷ @K = kʰ k g @KY = cʰ c ɟ @T = tʰ t d @P = pʰ p b [Plosive] = @P @T @KY @K @Q [Round] = o ō u w ū [-Round] = e ē i y ī [Semivowel] = y w [Consonant] = [Plosive] s R [Semivowel]
Other than the fact that these lists must be space-delimited, ASCA is whitespace insensitive, so you should be able to use multiple space-characters to align elements in the rules file, as you can see above (any more than one space is treated like a single space). Also, when defining variables which contain di- or -trigraphs, it is a good practice to place the longest strings first. Later versions will attempt to ensure that this is doen automatically.
Also note that you can redefine variables one you have defined them. This is helpful if a Variable corresponds to a set of sounds in a language, and the membership of this set is altered due to sound changes, such as a merger:
X = ʔ x ʕ ʕ > ʔ ... X = ʔ x
It is also possible to add elements to a variable. If you already have a variable [Consonant], you can use the command [Consonant] = [Consonant] S to add the contents of S to the existing variable[Consonant].
Rules are the most complicated structure in the ASCA rules file. These consist of two principal parts: the Transformation and the Condition. The Transformation itself consists of a list of Initials and Finals, while the Condition consists of a Precondition and a Postcondition.
It is possible for the Condition to be omitted entirely (for an uncondition change), and thus the simplest rules may consist of a Transformation alone:
h₁ h₂ h₃ h₄ > ʔ x x ʕ
The Initial and Final are separated from one-another by the right-angle-bracket operator >. As with Variables, the Initials and Finals consist of lists of character strings separeted by spaces, and ASCA is insensitive to the presence of more than one space character.
It is vital that the number of Initials and Finals match in a Transformation, with one exception. It is possible to write many-to-one transformations, where there is more than one Initial, but only one Final, in which all the Initials will be merged into the Final. Thus the following two rules are equivalent:
ct ɟt > ɕt ct ɟt > ɕt ɕt
It is also vital that the Initial and Finals contain the same number of Variables, and than these Varaibles correspond to the same number of Values. The only exception to this rule is if the Finals contain no variables, in which case the rule is treated like a many-to-one transformation:
iX uX > ī ū % i + any @X > i ;; equivalent to iʔ ix uʔ ux > ī ī ū ū @Ks > ks % Any velar + s becomes voiceless ;; equivalent to khs ks gs > ks ks ks
There is, however, one potential problem in how n-grams are handled by ASCA. If your lexicon contains instances of ph, and a rule p > b it will affect every instance of p, and you will end up with bh. This is a common problem in SCAs which support n-grams because they have no knowledge of phonetics. Just bear this problem in mind when writing your rules, until I can add UNLESS statements to the Condition or add a command-line switch to perform intelligent segmentation of Unicode strings. The next version of ASCA will avoid this problem entirely.
Most sound-change rules will need a Condition, which is set off from the Transformation by the slash character /. The Condition consists of a Precondition and Postcondition, separated by the underscore chracter _. Typical Conditions might look like this:
i u > y w / _VC s > ʃ / @RUKI_
Unlike in the Transformation, there are no restrictions on using Variables. As before, the Condition is insensitive to whitespace.
The Condition can also contain word boundaries, represented by the hash-sign #, which can be used to specify that a rule only be applied at the beginning or end of a word:
mr wr ml wl > br br bl bl / #_V m@X n@X > m. n. / #_C
ASCA was designed to support a number of advanced features not present in some other sound-change appliers. These are by no means exclusive to ASCA, but not all of them are common.
This refers to the use of the zero character 0 to represent an empty string. It can be used to delete characters:
X > 0 / C_ % Delete @X folloing a consonant
Using Zeroes to insert characters has been temporarily removed, but there are other ways to write epenthesis rules
ASCA's Sets may be one of its most powerful features. It allows you to define ad hoc sets of symbols which are processed like Variables, but not stored in memory. These are demarcated by sets of curly-brace characters {...} which should enclose a space-delimited list.
ə > u / {@K @Q}_
i u > y w / {X C}_V
The most powerful thing about Sets is that, in addition to Variables, they can also contain word Boundaries:
y w > i u / _{C #}
@VS > @VL / _X{C #}
X > 0 / @VL_{C #}
I may add support for using Sets in the Transformation, but they are currently only permitted in the Condition. Also, do not place more than one variable in the same element of a set: conditions like _{CV #} are not parsed correctly at the present time, but will be in later versions.
Finally, Optionals are a more common feature of SCAs in general, demarcated by parentheses (...), the contents of which will be both included and excluded from the condition. Thus, every use of Optionals corresponds to two rules:
ph th ch kh > b d j g / _VR(s)@CH % Is equivalent to the following ph th ch kh > b d j g / _VRs@CH ph th ch kh > b d j g / _VR@CH
Optionals in ASCA may only include a single Variable, but may contain a string of Literals. In the future, I will expand the kinds of things one can use inside Optionals.
It is now possible for rules to be blocked in certain environments, as is possible in VSCA. This can be done in a rule by following it with UNLESS or EXCEPT and the environment in which the rule won't be applied.
Version 1.5.0T added partial support for a new class of operators in the Condition. These operators are based on the functionality of the Kleene Star, which is used in regular languages to represent zero-or-more of a preceding expression. There are five of these operators in ASCA, only two of which are currently supported.
The Geminate Operator < is used to match exactly two of the preceding item. When applied to Sets or Variables, it matches exactly two of the same element of the Set or Variable. Thus, a statement like C< will match pp tt kk ... but not pt tk kp ...
The Star Operator * is used to match zero or more of the preceding item. When applied to Sets or Variables, it matches any sequence of elements of the Set or Variable. Thus, a statement like C* will match nothing, any sequence of elements, such as any of p t k pp tt kk pt pk tp tk kp kt ppp ttt kkk ppt ppk ptp ptt ptk ...
The Star Operator ** is used to match zero or more of the preceding item. When applied to Sets or Variables, it matches a sequence of the same element of the Set or Variable. Thus, a statement like C* will match nothing, any sequence of elements, such as any of p t k pp tt kk ppp ttt kkk pppp tttt kkkk... but not pt pk tp tk kp kt ppt ppk ptp ptt ptk ... How useful this operator is can be debated, but it available should it be helpful.
The Plus Operator + is used to match one or more of the preceding item. When applied to Sets or Variables, it matches any sequence of elements of the Set or Variable. Thus, a statement like C+ will any sequence of elements, such as any of p t k pp tt kk pt pk tp tk kp kt ppp ttt kkk ppt ppk ptp ptt ptk ... but not zero.
The Plus Operator ++ is used to match one or more of the preceding item. When applied to Sets or Variables, it matches a sequence of the same element of the Set or Variable. Thus, a statement like C++ any sequence of elements, such as any of p t k pp tt kk ppp ttt kkk pppp tttt kkkk... but not pt pk tp tk kp kt ppt ppk ptp ptt ptk ... or zero. How useful this operator is can be debated, but it available should it be helpful.
The major errors that are likely to occur in a rules file will be caught and reported to the user, along with the offending line number (counting from 1, not 0). These error reports will usually also indicate the nature of the error, such as a mismatch in the size of the Initials and Finals, or the number of Values in a Variable.
Error in Rules file on line (18):
Initial/Final Mismatch: a b c > d e
Error in Rules file on line (127):
Maformed Condition: s
Error in Rules file on line (44):
Malformed Transformation: a b c / d e f / _
The system will also wait until you press enter before exiting, which makes it easier to read errors when running ASCA using batches or scripts through your window-manager.