PICK, a program for analyzing contra dances
-- Stan Swanson, January 2013; License: GPL
PICK helps in the process of picking dances with specific moves or with
a desired set of moves. It transforms dances from a dance card format to
a spreadsheet format, one line per dance. Information available in columns
of the spreadsheet includes title, author, type (improper, proper, etc.), a
summary, and statistics columns definable by the user. Possible statistics
columns include lists of who swings, who allemandes, direction of circles or
stars (R,L), a count of wave figures, type of heys, out of minor set moves,
etc. The order of occurence of a figure in a dance can be tabulated. An
estimate of the piece count has been attempted. The dance cards are
transformed into shortened symbols which are user definable.
Usage and Output
(default input dance file: "dances.txt", output: "out.csv")
(example input dance file "select.dances",
example output: "select.csv" )
PICK is a command line program. If the default file names are used for
dances, dictionary, and analysis, only the program name need be typed,
"> pick" (">" is the system command prompt)
An alternative dance file can be specified as a single argument:
"> pick mydances.in",
or copied to the default file "dances.txt", or referenced by a link.
Other options can be specified, "> pick [options]".
These options include "-h" to get help, and will be discussed below.
Sample input files for dances.txt, dance-dict.txt, and dance-stat.txt
are furnished with the distribution, and their formats are discussed in greater
detail below. The most important output file is "out.csv" which is in
the portable "comma-separated-variable" format which can be read into most
spreadsheets (e.g. Open Office, Excel). The optional information output file
"info.out" contains five-line listings for each dance, along with two summaries
and debug and diagnostic messages (helpful if the interpretation seems
weird). "info.out" reproduces the counts of output symbols seen on the
terminal screen as the program finishes. These counts give you an idea of
the frequency of figures and qualifiers (who, direction, etc.) in a
collection of dances.
We now discuss the presentation of the analysis as seen in the sample
spreadsheet, "select.csv" generated from
"select.dances" using
"select-dict.txt" and
" select-stat.txt ". We comment on the columns seen in
"select.csv" under the colunm headers following "title" and "author".
"type" gives a single character code for the dance type (I P B ...)
and an indication of double or triple progression (2 3)
"summary" is the dance reduced to an abbreviated code (usually between 40 and
100 characters). Figures are separated by commas within a 16 beat
section. Semicolons separate A1: from A2: and so on. The persons
doing the figure are given first in caps (P = partner, N = neighbor,
1 = actives, W = ladies, etc.), then a lower case abbreviation for
the figure (sw = swing, a = allemande, o = circle, * = star, etc.),
followed by the direction (-L = left, -R = right, -x = across, etc.),
and finally a digit giving the amount of turning in quarters
( 3 = 3/4, 6 = 6/4 (1 1/2), etc.). These abbreviations are given in
the file dance-dict.txt and are user definable.
Click this link for summaries
of "select.dances".
The seventeen columns between the explicitly blank columns headed by
"_" and before the columns headed by "pieces" contain the position(s)
in the dance of the figures indicated by the column headings
("sw a o oR ... out rare).
These columns were generated with the statistic "order".
Note that sometimes a number is absent from the sequence in a given
row. This may happen if pseudo-phrase does not contain a figure, or if
the figure has not been specified in any of the "order" statements.
"sw" says where swings occur. For example. in "The Baby Rose", "' 1 4" tells
us that there are two swings in the dance, one at the beginning.
"a" says whether and where allemandes occur.
"o" and "*" are for circles and stars.
"oR" is a more specific indication for "circle right". See "
select-stat.txt
" for how this was done.
... ... ...
"out" are moves out of the minor set, like "contra corners", shadow
interactions, a chain or "R&L trhu" on a diagonal.
"rare" attempts to catalog infrequent moves like "contra corners",
"petronella", "mad robin", or "orbit".
"pieces" Two columns, the first of which reproduces any explicit estimate of
the piece count from the title line ("#..."). The second is pick's
calculation according to the scheme discussed in the section
"STATISTICS and ANALYSIS".
Click this link for positions
of moves in "select.dances".
Following the "pieces" columns are other statistics on the dance
figures, illustrating the statistics
"qualify count first-char text short group".
"swing" tells which persons swing (P = partners, N = neighbors, S = shadow,
etc.). Uses statistic "qualify".
"allem" tabulates who does the allemande (M = men, W = ladies, ...).
"circle" and "star" indicates the direction (L = left, R = right).
"ch_rl_pr" counts the number chains (ladies or otherwise), R&L's, and
promenades. Uses statistic "count"
"lines" counts LLFB, or just LL.
"D/R" counts the traditional "down 4 in line", "return", etc. These can be
hard to determine from dance card descriptions. More discussion of
this problem later.
"dosido" gives dosido (d), gypsy (g), and seesaw (z). Uses statistic
"first-char".
"hey" indicates the type of hey (h = full, 2 = half, g = gypsy, ... ).
"wave" counts the appearance of the string "wav" in the dance description.
"rare" indicates the appearance of specific figures (e.g. bfy = butterfly
whirl, pet = petronella, cc = contra corners). Uses statistic "text".
"out** " attempts to flag out of minor set interactions by looking for
shadow (S), diagonal (\), contra corners (c), grand (g) as in
grand chain or grand right and left. We have found this sort of
figure particularly troubling for beginners.
"hwro" uses the "group" statistic which reports whether specified columns are
non-blank. (1 = hey, 2 = wave, 3 = D/R, 4 = out**,
0 = none of these, 12 = both hey and wave, etc.). Sorting on this
column results in relatively large groupings of "similar" dances.
Simple, more or less glossary, dances are indicated by a "0".
"*cD" is another group reporting stars (1 = *),
(2 = chain, r&l, or promenade), and (3 = D/R [down and return] ).
"synopsis" is another single character representation of figures specified in
the "short" statistic. Unspecified (presumably low frequency) figures
appear as "X".
"title" repeats the title for reference with the statistics columns.
Click this link to see these statistics
applied to "select.dances".
"AA BB AABB" [appears optionally with the command line flag "-e"]
An attempt to give sorting keys for the unique figures in
A1+A2, B1+B2, and the dance, coded as the first character of the
figure name. Some ambiguity occurs. It was hoped that these sorts
would give groups of dances with similar moves and structures, but the
reality is that they are more diverse than that. Groupings are better
found with the analysis key "group" to be discussed below.
"0" the last column is the original order, so that a sort on this column
will restore the spreadsheet to its original dance card sequence.
INPUT FORMAT FOR DANCES
(default file: dances.txt)
(example file: "select.dances" )
The format of dances in the input file "dances.txt" is easily convertible
from files specifying figures for A1:, A2:, B1:, and B2: The program works
fairly well with existing dance cards (tested on several collections of 130,
200, and 440 dances). In some cases the syntax can be sharpened to help
with the analysis. Of particular difficulty are waves and balances, and
the traditional "down 4 in line" with a variety of return possibilities.
IF the cards are in MS Word format, they can be exported in .txt format and
massaged slightly with a text editor. A comparison of the summaries with
the original input helps to proof read dance cards, as will an examination
of the alphabetized file "occ.out" generated by the "-w" option.
Title, author, and dance type information is expected by the program to be
on a single line starting with the character "=". Title and author are
separated by a space delimited dash, " - ". Two words are taken from the
author field, then the rest of the line is scanned for words like "proper,
improper, mixer, circle, triplet, Sicilian, longways, x face x, square", and
for double or triple progression. Capital letters are preserved in the
title and author, but ignored for type and progression. Some abbreviations
are recognized (imp prop prog). We have added an option for "#nn" on this
line, where "nn" is the piece count, since estimation of "nn" is hard.
Other lines in the file describe either dance figures or are ignored as
comments. Comment lines start with the character "%" in the first column.
Lines between the title lines and A1: and after the B2: sequence are also
treated as comments, even without the "%".
The sequence of lines containing A1:, A2:, B1:, and B2: is taken as dance
information, until broken by a blank line or a comment line following B2:.
The section markers A1:, A2:, B1:, and B2: must have three characters, the
first being "A" or "b" (in either case), the second being a digit ("1" or "2"),
and the last being a punctuation character.
Within the dance description, instructions or hints enclosed within
parentheses "(...)" are ignored. The description is broken into pseudo phrases
delineated by commas, semicolons, or ends of lines. Words are separated
by white space or by one of the separators " , ; end-of-line ( ) ". Other
punctuation is left as written, with the possibility of being incorporated
into strings used in the analysis (e.g. "3/4" "R&L" ). For example,
=Sample Dance - Anony Mous improper
A1: circle left 1x; ladies chain (to partner)
A2: hey (ladies start, R shoulder)
B1: partner balance and swing
B2: circle left 3/4
balance the ring, pass thru (to next neighbor)
is summarized as
o-L4, W ch; hey; P bal_sw; o-L3, ring_b, pass
Note that the amount of a circle, star, or allemande is given by the number
of fourths: "4" means 4/4 or once around, "3" means 3/4.
THE DANCE DICTIONARY
(default file: dance-dict.txt)
(example file: "select-dict.txt")
The words or word fragments which are extracted from the input text are
specified in the file "dance-dict.txt". A sample dictionary is furnished,
but it can be customized by the user. To get a count of the unique "words"
in a dance file, the program has the option "-w" which counts separate
words and fragments, putting the information in "word.out" and "occ.out".
Separating characters are space, tab, end-of-line, comma, semicolon, and
parentheses. Punctuation other than separating characters and the comment
flag "%" is left as written and can be part of a word (or string). For
example: colon (e.g. A1:), slash (e.g. 3/4), ampersand (e.g. R&L)
The general structure of a dictionary line is
outsymbol # category = insymbol[s]
e.g.
P # who = partner ptr p % multiple inputs all go to "P"
sw # action = swing sw
<thr # part = thr % "thr" matches through, thru (and three !!)
Comments follow the symbol "%" either at the start of a line or later.
Input symbols are in lower case, since the dance descriptions are converted
to lower case. Some output symbols have been capitalized to highlight them,
notably P, N, ... (who) and R, L, ... (direction). Choice of the output
symbols (and input symbols) is left up to the writer of the dictionary.
Categories currently defined are "who", "direction", "how-much",
"action", "part", "combo", and "ignore".
"who" specifies the persons involved,
"direction" includes right, left, up, down, across,
along, forward, back, diagonal.
"how-much" gives the amount of turn for
allemandes, circles, stars (in 1/4 increments).
"action" is the designator
for a major dance figure (e.g. swing, circle, wave) and
"part" modifies an action, or may be one of several words used together to
specify an action. Usually parts will be combined with other symbols
with "combo", and replaced with a single symbol designating an action.
When several words are needed to specify a figure, these are given in a
"combo" line. All the symbols to the right of "=" must be present for
substitution of the outsymbol to the left of "#". The symbols on the right
must be previously defined output names of various input symbols. For example
r&l # combo = R L <thr % reduces "right and left thru" to "r&l"
The trick in setting up the dictionary is to find word fragments which will
match all of the possible words and abbreviations for a figure or modifier
on the dance card. For example, "thr" will match both "thru" and "through",
but also the word "three", which rarely occurs in a dance description.
"sw" will match "swing" and "sw", but also "swat" as in "swat the flea".
"ll" will match "LL" (for "long lines") but also "all" and "hall".
To circumvent possible unwanted multiple matches, the longer fragments are
tested first, and any
single character fragments must match a single character in the dance
description. Finally, there is the "ignore" directive which specifies
particular strings to ignore. The words "start" or "starting" are sometimes
used in a hint for who begins a hey. They are selected by "star" used to
search for stars. To fix this, either edit the dance, putting the hint
inside parentheses, or use:
?st # ignore = start
The symbol "?st" will appear in a list at the end of the run with a count
of the times "start" was ignored.
To help in the design or revision of the dictionary, the command line option
"-w" is furnished to generate lists of strings (words and fragments)
used in a collection of dances (i.e. a dance archive). One list, "words.out",
is sorted by frequency of occurrence. The second list, "occ.out",
is in alphabetical order.
STATISTICS and ANALYSIS
(default statistics file: dance-stat.txt)
(example file: "select-stat.txt", with output
"select.csv" ]
The analysis can be customized by the user in the file dance-stat.txt.
Options include counting the number of times a move (or symbol) occurs,
listing either the first character or full output text of special moves,
tabulating which of a specified set of symbols "qualifies" an action, noting
which groups of moves occur, and in what sequence the moves occur.
General form:
col_header # statistic = output_symbol[s]
As in the dictionary lines, the "#" and "=" must be flanked by white space
(space or tab), and symbols to the right of "=" must be separated by spaces.
"statistic" can be "count", "first-char", "text", "short", "qualify",
"group", "order", or "pieces". All of the symbols to the right
of "=" must have been previously defined in the dictionary file.
Examples from file "select-stat.txt" which give some of the results discussed
above in the section on "Usage and Output"
wave # count = wv
Counts the number of occurences of the symbol "wv" in the dance. The symbol
"wv" has previously been translated from the fragment "wav" as specified
in the dictionary file:
wv # action = wav
Tip: you can introduce a blank column with the line " _ # count =".
swing # qualify = sw > P N 1 2 M W
Individual pseudo phrases (delimited by separators) are checked for the
action "sw" and the occurrence of one of the person designators "P" ... "W".
A list of the first characters of any of symbols to the right of ">"
appearing in the dance is put into the spreadsheet cell for that dance.
out** # first-char = S \ cc gr-ch gr-rl gr-RL
We take the occurrence of "shadow", "diagonal", "contra corners" or a "grand
chain" or "grand right and left" as indicating an out of the minor set
interaction (there may be other markers). We find these figures hard for
beginners.
hwro # group = 31 32 33 34
Reports whether the columns designated on the right are non-blank.
Column 31 is non-blank if there is a hey, column 32 is for waves, column 33
is for "rare" movements, and column 34 is for the out-of-minor-set figures.
A new statistic "order" has been added, which records the position in the
dance where a figure occurs. The syntax is similar to that for "qualify":
swing # order = sw > sw % records when any swing occurs
o* # order = o * > o * % occurrences of circles and stars
oR # order = o > R % only circle rights
The first part of "select-stat.txt" gives position or sequence of various
moves in the dances, using the "order" statistic The "position" in .
the dance is only approximate, since the descriptions do not adhere to a
rigid format from which 8 beat phrases can be deduced.
The second part illustrates various counts and other statistics.
Strain boundaries (16 beats) are defined by the sections A1:, etc.
But the number of pseudo moves or figures within a strain is not restricted
(e.g. wave balances and allemandes, hey hints, down and return (with turns)).
We have taken the divisions indicated by commas, semicolons, and line ends
to define pseudo phrases. We count these from the beginning of the dance.
The first move will always be "1", and others will be in sequence.
Occasionally a pseudo phrase will not contain an "action" and will not
appear in the sequence. Some moves like "balance and swing" will be counted
as one move, even though they take 16 beats.
Another requested statistic is piece count, similar to that given in
Zesty Contras by Larry Jennings. He states "piece count is subjective and
dependent on the locale". Moreover, we have the problems discussed above
with the "order" statistic, and problems of combining identical or similar
moves (such as a Ladies chain; Ladies chain back). A preliminary
implementation is given with the statistic "pieces".
pc # pieces = ret % plus other flags
This produces two columns. The first contains any title line strings in
the format "#nnxxx" as in Zesty Contras. The "#" is mandatory, "nn" is a one
or two digit number, and "xxx" are any additional letter codes, to a limit of
ten characters. The second column contains our naive estimate of the piece
count of the dance. Any strains with a single figure (e.g. balance and swing,
hey) have one piece. Any strains with only one type of figure (e.g. circle L,
circle R) are also one additional piece. Finally, any strain containing any
of the flags to the right of "=" counts as one piece. In this example,
"ret" signifies "return" so that "down 4-in-line, turn alone; return,
bend the line" counts as one piece, even though it nominally has four.
RUNNING and COMPILING the PROGRAM
In addition to "> pick" with all default file names and "> pick dances",
other options can be specified. All output goes to the default
files: csv.out, info.out, words.out, and occ.out,
These default output file names cannot be changed from the command line.
The resulting files must be renamed to preserve them.
I still STRONGLY recommend making backup copies of your dance archive.
Command line:
"> pick [options]".
These options include
"-h" to get help and exit.
"-s your_statistics_file" reads an alternative statistics file.
"-d your_dictionary_file" reads an alternative dictionary.
"-i" prints additional parsing and debug information to "info.out".
"-w" tabulates word/fragments, and writes sorted output to default
files "words.out" and "occ.out". Useful for proof reading the dance file
and for modifying your dictionary.
"-e" enables experimental features. At this writing, this is the "AA BB AABB"
single character figure lists as discussed above.
"-b [nn]" prints detailed parsing information for the first "nn" lines
of input. Output to "info.out".
The command line "> pick {options] dances" also works and should
pose no danger to the file "dances", since one can not redefine output files.
Alternatively, one can copy or rename input files to the default names,
or use soft links, "> ln -s mydict.txt dance-dict.txt", in Linux
or MAC OS.
[What is the equivalent in Windows???]
--------------------
The distribution can be downloaded as a single ZIP files, or individually
downloadable files. It is more thoroughly described in the
README file.
The distribution includes executables for Linux (Fedora), Mac OS 10.6 (Snow Leopard),
and Windows (compiled with gcc in MinGW, which run on 2000 XP, Vista Home, and
Windows 7 Starter), along with the compilable code in C.
Compilation is straight forward:
"> gcc -g -o pick pick.c".
There are a number of spreadsheet tricks involving sorting of the entire
sheet (or parts thereof), cutting, and pasting, which I am still learning.
Suggestions are solicited, along with comments on the program and on
the dictionary and statistics files.