A few days ago when I was searching for good online dictionaries, I stumbled upon Oxford Dictionary API for developers. I decided to check it out and registered for a free account. This allows me to make 3000 API calls in a month. Since I am not planning to use this service commercially, that limit is more than sufficient for me.
I decided to write a convenient Lisp client-side wrapper to access the various features exposed by the API layer. With Edi Weitz’s Drakma and a couple of other utilities, this is a fairly straightforward job. I used LispWorks 7.0 for this implementation.
Although I have not implemented the full API set, I thought I would share what I have done so far. Hence this post.
I started with the wordlist API because that would allow me to fetch words using filters (specific lexical categories, domains, etc.). I also decided to support the common options in the filters group.
Let us try to fetch 10 conjunctions:
CL-USER 1 > (get-wordlist (make-wordlist-filter :lex-category ‘(conjunction)) :limit 10)
((“%27cept” . “‘cept”) (“acause” . “acause”) (“after” . “after”) (“albeit” . “albeit”) (“although” . “although”) (“and” . “and”) (“as” . “as”) (“assuming” . “assuming”) (“because” . “because”) (“before” . “before”))
((:LIMIT . 10) (:PROVIDER . “Oxford University Press”) (:OFFSET . 0) (:SOURCE-LANGUAGE . “en”) (:TOTAL . 68))
This function returns two values: First is the list of (<word-id> <word>) pairs. The second is metadata about the result. In this case, the metadata shows that the dictionary contains 68 conjunctions, but our request was limited to just 10.
It is possible to specify multiple categories, implying anding of the categories.
CL-USER 1 > (get-wordlist (make-wordlist-filter :lex-category ‘(noun verb)) :limit 10)
((“abandon” . “abandon”) (“abode” . “abode”) (“abort” . “abort”) (“about-turn” . “about-turn”) (“abseil” . “abseil”) (“abstract” . “abstract”) (“abuse” . “abuse”) (“accent” . “accent”) (“access” . “access”) (“accession” . “accession”))
((:OFFSET . 0) (:LIMIT . 10) (:SOURCE-LANGUAGE . “en”) (:PROVIDER . “Oxford University Press”) (:TOTAL . 5709))
The result shows 10 nouns that are also verbs (out of a total of 5709 such words).
Filtering based on domains is also possible.
CL-USER 6 > (get-wordlist (make-wordlist-filter :domains ‘(Buddhism)) :limit 10)
((“acharya” . “acharya”) (“ahimsa” . “ahimsa”) (“ananda” . “ananda”) (“arhat” . “arhat”) (“asoka” . “asoka”) (“asoka pillar” . “asoka_pillar”) (“bardo” . “bardo”) (“bhikkhu” . “bhikkhu”) (“bodhgaya” . “bodhgaya”) (“bodhisattva” . “bodhisattva”))
((:OFFSET . 0) (:LIMIT . 10) (:SOURCE-LANGUAGE . “en”) (:PROVIDER . “Oxford University Press”) (:TOTAL . 87))
We can combine domain and lexical category when filtering words.
CL-USER 7 > (get-wordlist (make-wordlist-filter :lex-category ‘(verb) :domains ‘(Palaeontology)) :limit 10)
((“fossilize” . “fossilize”) (“permineralize” . “permineralize”) (“procline” . “procline”))
((:OFFSET . 0) (:LIMIT . 10) (:SOURCE-LANGUAGE . “en”) (:PROVIDER . “Oxford University Press”) (:TOTAL . 3))
The dictionary has information about just three words that are verbs in the domain of palaentology.
One more filter type is called Registers. This allows us to select words in the categories of Archaic, Euphemism, Formal, etc.
CL-USER 27 > (get-wordlist (make-wordlist-filter :registers ‘(formal)) :limit 10)
((“ab_initio” . “ab initio”) (“abjuration” . “abjuration”) (“abjure” . “abjure”) (“ablution” . “ablution”) (“ablutionary” . “ablutionary”) (“abnegate” . “abnegate”) (“abnegator” . “abnegator”) (“abode” . “abode”) (“abominate” . “abominate”) (“abominator” . “abominator”))
((:LIMIT . 10) (:PROVIDER . “Oxford University Press”) (:OFFSET . 0) (:SOURCE-LANGUAGE . “en”) (:TOTAL . 678))
OK, how do we know the allowed values for lexical categories, domains and registers? There are APIs for that as well.
CL-USER 28 > (get-lexical-categories)
(“Adjective” “Adverb” “Combining Form” “Conjunction” “Contraction” “Determiner” “Idiomatic” “Interjection” “Noun” “Numeral” …. “Prefix” “Preposition” “Pronoun” “Residual” “Suffix” “Verb”)
CL-USER 8 > (get-registers)
(“Allusive” “Allusively” “Archaic” “Army Slang” “Black English” “Cant” “Children%27S Slang” “Coarse Slang” “College Slang” “Concrete” “Contemptuous” “Criminals%27 Slang” “Dated” “Depreciative” “Depreciatively” “Derogatory” “Dialect” “Dismissive” “Disused” “Emphatically” “Especially” “Euphemism” “Euphemistic” “Figurative” “Formal” “Generally” “Historical” “Humorous” …….)
CL-USER 5 > (get-domains)
(“Air Force” “Alcoholic” “American Civil War” “American Football” “Amerindian” “Anatomy” “Ancient History” “Angling” “Anthropology” “Archaeology” “Archery” “Architecture” “Art” “Artefacts” “Arts And Humanities” “Astrology” “Astronomy” “Athletics” “Audio” “Australian Rules” “Aviation” “Ballet” “Baseball” “Basketball” “Bellringing” “Biblical” “Billiards” “Biochemistry” “Biology” “Bird” ….)
I have omitted several entries in the result set to conserve space.
Sometimes we might be interested in fetching words that start with a known prefix, say “sug”, and may be, in a specific lexical category. We can do this easily:
CL-USER 30 > (get-wordlist-prefix “sug” (make-wordlist-filter :lex-category ‘(noun)) :limit 20)
((“sugan” . “sugan”) (“sugar” . “sugar”) (“sugar_apple” . “sugar apple”) (“sugar_bag” . “sugar bag”) (“sugar_bean” . “sugar bean”) (“sugar_beet” . “sugar beet”) (“sugarbird” . “sugarbird”) (“sugar_bush” . “sugar bush”) (“sugar_cane” . “sugar cane”) (“sugar_cookie” . “sugar cookie”) (“sugarcraft” . “sugarcraft”) (“sugar_cube” . “sugar cube”) (“sugar_daddy” . “sugar daddy”) (“sugar_glider” . “sugar glider”) (“sugar_gum” . “sugar gum”) (“sugariness” . “sugariness”) (“sugaring” . “sugaring”) (“sugar_kelp” . “sugar kelp”) (“sugarloaf” . “sugarloaf”) (“sugar_loaf_mountain” . “sugar loaf mountain”))
((:OFFSET . 0) (:LIMIT . 20) (:SOURCE-LANGUAGE . “en”) (:PROVIDER . “Oxford University Press”) (:TOTAL . 42))
You can even restrict it to a specific domain:
CL-USER 35 > (get-wordlist-prefix “lith” (make-wordlist-filter :lex-category ‘(noun) :domains ‘(medicine)) :limit 20)
((“lithotomy_position” . “lithotomy position”))
((:OFFSET . 0) (:LIMIT . 20) (:SOURCE-LANGUAGE . “en”) (:PROVIDER . “Oxford University Press”) (:TOTAL . 1))
As a final example, let us try to get the lemma of a word:
CL-USER 38 > (get-lemma “fumbling”)
(“fumble” . “fumble”)
The first element of the result pair is the word ID and the second is the word itself.
It was great fun working on this implementation. You can download the source file and play with it. Remember to substitute your <APP-ID> and <APP-KEY> before trying the code.
Recent Comments