Joseph Corneli Copyright (C) 2005-2010 Joseph Corneli <>
transferred to the public domain.
(Last revised: December 26, 2020)

A tool for building hackable semantic hypertext platforms. Source code and mailing lists are at

1 Introduction

Note 1.1 (What is “Arxana”?).

Arxana is the name of a “next generation” hypertext system that emphasizes annotation. Every object in this system is annotatable. Because of this, I sometimes call Arxana’s core “the scholium system”, but the name “Arxana” better reflects our aim: to explore the mysterious world of links, attachments, correspondences, and side-effects.

Note 1.2 (The idea).

A scholia-based document model for commons-based peer production will inform the development of our system.11 1 In this model, texts are made up of smaller texts until you get to atomic texts; user actions are built in the same way. Multiple users should interact with a shared persistent data-store, through functional annotation, not destructive modification. We should pursue the asynchronous interaction model until we arrive at live, synchronous, settings, where we facilitate real-time computer-mediated interactions between users, and between users and running hackable programs.

Note 1.3 (The data model).

Start by storing a collection of strings. Now add in pairs and triples which point at 2 and 3 objects respectively. (We can extend to n-tuples if that turns out to be convenient.) Finally, we will maintain a collection of lists, each of which points at an unlimited number of objects.

Note 1.4 (History).

Thinking about how to improve existing systems for peer-based collaboration in 2004, I designed a simple version of the scholium system that treated textual commentary and markup as scholia.22 2 In 2006, I put together a single-user version of this system that ran exclusively under Emacs.33 3 The current system is an almost-completely rewritten variant, bringing in a shared database and various other enhancements to support multi-user interaction.

Note 1.5 (A brisk review of the programming literature).

Many years before I started working on this project, there was something called the Emacs HyperText System.44 4 What we’re doing here updates for modern database methods, uses a more interesting data storage format, and also considers multiple front-ends to the same database (for example, a web interface).

Contemporary Emacs-based hypertext creation systems include Muse and Emacs Wiki.55 5,66 6 The browsing side features old standbys, Info and Emacs/w3m77 7 Not to be confused with Emacs-w3m, which is not entirely “Emacs-based”.. These packages provide ways to author or view what what we should now call “traditional” hypertext documents.

An another legacy tool worth mentioning is HyperCard88 8 This system was oriented around the idea of using hypertext to create software, a vision we share, but like just about everyone else working in the field at the time, it used uni-directional links.

Hypertext nouveau is based on semantic triples. The Semantic Web standard provides one specification of the features we can expect from triples.99 9 Triples provide a framework for knowledge representation with more depth and flexibility than the popular ‘‘tagging’’ methodology. For example, suitable collections of triples implement AI-style ‘‘frames’’. The idea of using triples to organize archival material is generating some interest as Semantic Web ideas spread.1010 10 Cf. recent museum and library conferences,1111 11 Even among academic computer scientists! (Josh Grochow, p.c.)

An abstractly similar project to Arxana with some grand goals is being developed by Chris Hanson at MIT under the name ‘‘Web-scale Environments for Deduction Systems’’.1212 12

Another technically similar project is Freebase, a hand rolled database of open content, organized on frame-based, triple driven, principles. The developer of the Freebase graphd database has some interesting things to say about old and new ways of handling triples.1313 13

Note 1.6 (Fitting in).

My current development goal is to use this system to create a more flexible multiuser interaction platform than those currently available to web-based collaborative projects (such as PlanetMath1414 14 As an intermediate stage, I’m using Arxana to help organize material for a book I’m writing. Arxana’s theoretical generality, active development status, detailed documentation, and superlatively liberal terms of use may make it an attractive option for you to try as well!

Note 1.7 (What you get).

Arxana has an Emacs frontend, a Common Lisp middle-end, and a SQL backend. If you want to do some work, any one of these components can be swapped out and replaced with the engine of your choice. I’ve released all of the implementation work on this system into the public domain, and it runs on an entirely free/libre/open source software platform.

Note 1.8 (Acknowledgements).

Ted Nelson’s ‘‘Literary Machines’’ and Marvin Minsky’s ‘‘Society of Mind’’ are cornerstones in the historical and social contextualization of this work. Alfred Korzybski’s ‘‘Science and Sanity’’ and Gilles Deleuze’s ‘‘The Logic of Sense’’ provided grounding and encouragement. TeX and GNU Emacs have been useful not just in prototyping this system, but also as exemplary projects in the genre I’m aiming for. John McCarthy’s Elephant 2000 was an inspiring thing to look at and think about1515 15, and of course Lisp has been a vital ingredient.

Thanks also to everyone who’s talked about this project with me!

2 Using the program

Note 2.1 (Dependencies).

Our interface is embedded in Emacs. Backend processing is done with Common Lisp. We are currently using the PostgreSQL database. These packages should be available to you through the usual channels. (I’ve been using SBCL, but any Lisp should do; please make sure you are using a contemporary Emacs version.)

We will connect Emacs to Lisp via Slime1616 16, and Lisp to PostgreSQL via CLSQL.1717 17 CLSQL also talks directly to the Sphinx search engine, which we use for text-based search.1818 18 Once all of these things are installed and working together, you should be able to begin to use Arxana.

Setting up all of these packages can be a somewhat time-consuming and confusing task, especially if you haven’t done it before! See Appendix A for help.

Note 2.2 (Export code and set up the interface).

If you are looking at the source version of this document in Emacs, evaluate the following s-expression (type C-x C-e with the cursor positioned just after its final parenthesis). This exports the Common Lisp components of the program to suitable files for subsequent use, and prepares the Emacs environment. (The code that does this is in Appendix B.)

  (let ((beg (search-forward "\\begin{verbatim}"))
        (end (progn (search-forward "\\end{verbatim}")
                    (match-beginning 0))))
    (eval-region beg end)
Note 2.3 (To load Common Lisp components at run-time).

Link arxana.asd somewhere where Lisp can find it. Then run commands like these in your Lisp; if you like, you can place all of this stuff in your config file to automatically load Arxana when Lisp starts. The final form is only necessary if you plan to use CLSQL’s special syntax on the Lisp command-line.

(asdf:operate ’asdf:load-op ’clsql)
(asdf:operate ’asdf:load-op ’arxana)
(in-package arxana)
Note 2.4 (To connect Emacs to Lisp).

Either run M-x slime RET to start and connect to Lisp locally, or M-x slime-connect RET RET after you have opened a remote connection to your remote server with a command like this: ssh -L 4005: <username>@<host> and started Lisp and the Swank server on the remote machine. To have Swank start automatically when you start Lisp, put commands like this in your config file.

(asdf:operate ’asdf:load-op ’swank)
(setf swank:*use-dedicated-output-stream* nil)
(setf swank:*communication-style* :fd-handler)
(swank:create-server :dont-close t)
Note 2.5 (To define database structures).

If you haven’t yet defined the basic database structures, make sure to load them now! (Using tabledefs.lisp, or the SQL code in Section 3)

Note 2.6 (Importing this document into system).

You can browse this document inside Arxana: after loading the code, run M-x autoimport-arxana.

3 SQL tables

Note 3.1 (Objects and codes).

Every object in the system is identified by an ordered pair: a code and a reference. The codes say which table contains the indicated object, and references provide that object’s id. To a specific element of a list or n-tuple, a third number, that element’s offset, is required. The codes are as follows:

0 list
1 string
2 pair
3 triple
CREATE TABLE strings (

   code1 INT NOT NULL,
   ref1 INT NOT NULL,
   code2 INT NOT NULL,
   ref2 INT NOT NULL,
   UNIQUE (code1, ref1,
           code2, ref2)

CREATE TABLE triples (
   code1 INT NOT NULL,
   ref1 INT NOT NULL,
   code2 INT NOT NULL,
   ref2 INT NOT NULL,
   code3 INT NOT NULL,
   ref3 INT NOT NULL,
   UNIQUE (code1, ref1,
           code2, ref2,
           code3, ref3)
Note 3.2 (A list of lists).

As a central place to manage our collections, we first create a list of lists. The ‘heading’ is the list’s name, and its ‘header’ is metadata.

  heading REFERENCES strings(id) UNIQUE,
  header REFERENCES strings(id)
Note 3.3 (Lists on demand).

Whenever we want to create a new list, we first add to the ‘lists’ table, and then create a new table “listk” (where k is equal to the new maximum id on ‘lists’).

   code INT NOT NULL,
Note 3.4 (Side-note on containers via triples).

To model a basic container, we can just use triples like “(A in B)”. This is useful, but the elements of B are of course unordered. In Section 5.3, we make extensive use of triples like (B 1 α), (B 2 β), etc., to indicate that B’s first component is α, second component is β, and so on; so we can make ordered list-like containers as well.

This is an example of the difference in expressive power of tags (which only provide a sense of unordered containment in “virtual baskets”) and triples (which here are seen to at least provide the additional sense of ordered containment in “virtual filing cabinets”, although they have much more in store for us); cf. Note 1.5.

As useful as models based on these two principles are in principle, the user could easily be overloaded by looking at lots of different containers encoded in raw triples, all at once.

Note 3.5 (Sense of containment).

Note that every element of a list is in the list in the same “sense” – for example, we can’t instantly distinguish elements that are “halfway in” from those that are “all the way in”, the same way we could with pure triples.

Note 3.6 (Uniqueness of strings and triples).

An attempt to create a duplicate contents in a string or triple generates a warning. This saves storage, given possible repetitive use – and avoids confusion. We can, however, reference duplicate “copies” on the lists.

Note 3.7 (Change).

Notice also that since neither strings nor triples “change”, we have to account for change in other ways. In particular, the contents of lists can change. (We may subsequently add some metadata to certain lists are “locked”, or indicate that they can only be changed by adding, etc., so that their contents can be cited stably and reliably.)

Note 3.8 (Provenance and other metadata).

We could of course add much more structure to the database, starting with simple adjustments like adding provenance metadata or versioning into the records for each stored thing. For the time being, I assume that such metadata will appear in the application or content layer, as triples. (The exception are the “headings” and “headers” associated with lists.)

4 Common Lisp-side

4.1 Preliminaries

System definition

(defsystem "arxana"
    :version "1"
    :author "Joe Corneli <>"
    :licence "Public Domain"
    ((:file "packages")
     (:file "utilities" :depends-on ("packages"))
     (:file "database" :depends-on ("utilities"))
     (:file "queries" :depends-on ("packages"))))

Package definition

(defpackage :arxana
  (:use #:cl #:clsql #:clsql-sys))


Note 4.1 (Useful things).

These definitions are either necessary or useful for working the database and manipulating triple-centric and/or theory-situated data. The implementation of theories given here is inspired by Lisp’s streams. This is perhaps the most gnarly part of the code; the pay-off of doing things the way we do them here is that subsequently theories can sit “transparently” over other structures.

(in-package arxana)

;; (defun connect-to-database ()
;;    (connect ‘("localhost" "joe" "joe" "")
;;             :database-type :postgresql-socket))

(defun connect-to-database ()
   (connect ‘("localhost" "joe" "joe" "joe")
            :database-type :mysql))

(defmacro select-one (&rest args)
  ‘(car (select ,@args :flatp t)))

(defmacro select-flat (&rest args)
  ‘(select ,@args :flatp t))

(defun resolve-ambiguity (stuff)
  (first stuff))

(defun isolate-components (content i j)
  (list (nth (1- i) content)
        (nth (1- j) content)))

(defun isolate-beginning (triple)
  (isolate-components (cdr triple) 1 2))

(defun isolate-middle (triple)
  (isolate-components (cdr triple) 3 4))

(defun isolate-end (triple)
  (isolate-components (cdr triple) 5 6))

(defvar *read-from-heading* nil)

(defvar *write-to-heading* nil)
Note 4.2 (On ‘datatype’).

Just translate coordinates into their primary dimension. (How should this change to accomodate codes 4, 5, 6, possibly etc.?)

(defun datatype (data)
  (cond ((eq (car data) 0)
        ((eq (car data) 1)
        ((eq (car data) 2)
        ((eq (car data) 3)

Note 4.3 (Resolving ambiguity).

Often it will eventuate that there will be more than one item returned when we are only truly prepared to deal with one item. In order to handle this sort of ambiguity, it would be great to have either a non-interactive notifier that says that some ambiguity has been dealt with, or an interactive tool that will let the user decide which of the ambiguous options to choose from. For now, we provide the simplest non-interactive tool: just choose the first item from a possibly ambiguous list of items.

Note 4.4 (Using a different database).

See Note A.3 for instructions on changes you will want to make if you use a different database.

Note 4.5 (Use of the “count” function).

The SQL count function is thought to be inefficient with some backends; workarounds exist. (And it’s considered to be efficient with MySQL.)

Note 4.6 (Abstraction).

While it might be in some ways “nice” to allow people to chain together ever-more-abstract references to elements from other theories, I actually think it is better to demand that there just be one layer of abstraction (since we can then quickly translate back and forth, rather than running through a chain of translations).

This does not imply that we cannot have a theory superimposed over another theory (or over multiple theories) that draws input from throughout a massively distributed interlaced system – rather, just that we assume we will need to translate to “base coordinates” when building such structures. However, we’ll certainly want to explore the possibilities for running links between theories (abstractly similar in some sense to pointing at a component of a triple, but here there’s no uniform beg, mid, end scheme to refer to).

4.2 Main table definitions

Note 4.7 (Defining tables from within Lisp).

This is Lisp code to define the permanent SQL tables described in Section 3.

;; (execute-command "CREATE TABLE strings (
;; );")

(execute-command "CREATE TABLE strings (
   text TEXT,
   UNIQUE INDEX (text(255))

(execute-command "CREATE TABLE places (
   code INT NOT NULL,

(execute-command "CREATE TABLE triples (
   code1 INT NOT NULL,
   ref1 INT NOT NULL,
   code2 INT NOT NULL,
   ref2 INT NOT NULL,
   code3 INT NOT NULL,
   ref3 INT NOT NULL,
   UNIQUE (code1, ref1,
           code2, ref2,
           code3, ref3)

(execute-command "CREATE TABLE theories (
  name INT UNIQUE REFERENCES strings(id)
Note 4.8 (Eliminating and tables).

In case you ever need to redefine these tables, you can run code like this first, to delete the existing copies. (Additional tables are added whenever a theory is created; code for deleting theories or their contents will appear in Section 4.)

(dolist (view (list-views)) (drop-view view))
(execute-command "DROP TABLE strings")
(execute-command "DROP TABLE triples")
(execute-command "DROP TABLE places")
(execute-command "DROP TABLE theories")

4.3 Modifying the database

(in-package arxana)

Processing strings

Note 4.9 (On ‘string-to-id’).

Return the id of ‘text’, if present, otherwise nil.

There was a segmentation fault with clisp here at one point, maybe because I hadn’t gotten the clsql sql reader syntax loaded up properly. Note that calling the code without the function wrapper did not produce the same segfault.

(defun string-to-id (text)
  (select [id]
          :from [strings]
          :where [= [text] text]))
Note 4.10 (On ‘add-string’).

Add the argument ‘text’ to the list of strings. If the string is successfully created, its coordinates are returned. Otherwise, and in particular, if the request was to create a duplicate, nil is returned.

Should this give a message “Adding text to the strings table” when the string is added by an indirecto function call, such as through ‘massage’? (Note 4.12.)

(defun add-string (text)
   (progn (insert :into [strings]
                  :attributes ’(text)
                  :values ‘(,text))
          ‘(1 ,(string-to-id text)))
   (sql-database-data-error ()
     (warn "\"~a\" already exists."
Note 4.11 (Error handling bug).

The function ‘add-string’ (Note 4.10) exhibits the first of several error handling calls designed to ensure uniqueness (Note 3.6). Experimentally, this works, but I’m observing that, at least sometimes, if the user tries to add an item that’s already present in the database, the index tied to the associated table increases even though the item isn’t added. This is annoying. I haven’t checked whether this happens on all possible installations of the underlying software.

Parsing general input

Note 4.12 (On ‘massage’).

User input to functions like ‘add-triple’ and so on and so forth can be strings, integers (which the function “serializes” as the string versions of themselves), or as coordinates – lists of the form (code ref). This function converts all of these input forms into the last one! It takes an optional argument ‘addstr’ which, if supplied, says to add string data to the database if it wasn’t there already.

(defun massage (data &optional addstr)
   ((integerp data)
    (massage (format nil "~a" data) addstr))
   ((stringp data)
    (let ((id (string-to-id data)))
      (if id
          (list 0 id)
          (when addstr
            (add-string data)))))
   ((and (listp data)
         (equal (length data) 2))
   (t nil)))

Processing triples

Note 4.13 (On ‘triple-to-id’).

Return the id of the triple (beg mid end), if present, otherwise nil.

(defun triple-to-id (beg mid end)
  (let ((b (massage beg))
        (m (massage mid))
        (e (massage end)))
    (select [id]
            :from [triples]
            :where [and [= [code1] (first b)]
                        [= [ref1] (second b)]
                        [= [code2] (first m)]
                        [= [ref2] (second m)]
                        [= [code3] (first e)]
                        [= [ref3] (second e)]])))
Note 4.14 (On ‘add-triple’).

Elements of triples are parsed by ‘massage’ (Note 4.12). If the triple is successfully created, its coordinates are returned. Otherwise, and in particular, if the request was to create a duplicate, nil is returned.

(defun add-triple (beg mid end)
  "Add a triple comprised of BEG MID and END."
  (let ((b (massage beg t))
        (m (massage mid t))
        (e (massage end t)))
    (when (and b m e)
          :into [triples] :attributes ’(code1 ref1
                                        code2 ref2
                                        code3 ref3)
          :values ‘(,(first b) ,(second b)
                    ,(first m) ,(second m)
                    ,(first e) ,(second e)))
         ‘(2 ,(triple-to-id b m e)))
       (sql-database-data-error ()
         (warn "\"~a\" already entered as [~a ~a ~a]."
               (list beg mid end) b m e))))))

Processing theories

Note 4.15 (Things to do with theories).

For the record, we want to be able to create a theory, add elements to that theory, remove or change elements in the theory, and, for convenience, zap everything in a theory. Perhaps we will also want functions to remove the tables associated with a theory as well, swap the position of two theories, or change the name of a theory. We will also want to be able to export and import theories, so they can be “beamed” between installations. At appropriate places in the Emacs interface, we’ll need to set ‘*write-to-heading*’ and ‘*read-from-heading*’.

Note 4.16 (What can go in a theory).

Notice that there is no rule that says that a triple or place that’s part of a theory needs to point only at strings that are in the same theory.

Note 4.17 (On ‘list-to-id’).

Return the id of the theory with given ‘heading’, if present, otherwise, nil.

(defun list-to-id (heading)
  (let ((string-id (string-to-id heading)))
    (select [id]
            :from [lists]
            :where [= [heading] string-id])))
Note 4.18 (On ‘add-theory’).

Add a theory to the theories table, and all the new dimensions of the frame that comprise this theory. (Theories have names that are strings – it seems a little funny to always have to translate submitted strings to ids for lookup, but this is what we do.)

(defun add-list (heading)
  (let ((string-id (second (massage heading t))))
        (progn (insert :into [lists]
                       :attributes ’(heading)
                       :values ‘(,string-id))
               (let ((k (theory-to-id heading)))
                  (format nil "CREATE TABLE lists~A (
   code INT NOT NULL,
);" k))
                 ‘(0 ,k)))
        (warn "The list \"~a\" already exists."
Note 4.19 (On ‘get-lists’).

Find all lists that contain ‘symbol’.

(defun get-lists (symbol)
  (let* ((data (massage symbol))
         (type (datatype data))
         (id (second data))
         (n (caar
             (query "select count(*) from lists")))
    (loop for k from 1 upto n
          do (let ((present
                    (query (concatenate
                            "select offset from list"
                            (format nil "~A" k)
                            " where ((code = "
                            (format nil "~A" type)
                            ") and (ref = "
                            (format nil "~A" id)
               (when present
                 ;; bit of a problem if there are multiple
                 ;; entries of that item on the given
                 ;; list.
                 (setq results (cons (list 0 k present)
Note 4.20 (On ‘save-to-list’).

Record ‘symbol’ on list named ‘name’.

(defun save-to-list (symbol name)
  (let* ((data (massage symbol t))
         (type (datatype data))
         (string-id (string-to-id name))
         (k (select-one [id]
                        :from [lists]
                        :where [= [name] string-id]))
         (tablek (concatenate ’string
                              type (format nil "~A" k))))
    (insert-records :into (sql-expression :table tablek)
                    :attributes ’(id)
                    :values ‘(,(second data)))))

Lookup by id or coordinates

Note 4.21 (The data format that’s best for Lisp).

It is a reasonable question to ask whether or not the an item’s id should be considered part of that item’s defining data when that data is no longer in the database. For the functions defined here, the id is an input, and so by default I’m not including it in the output here, because it is already known. However, for functions like ‘triples-given-beginning’ (See Note 4.35), the id is not part of the known data, and so it is returned. Therefore I am providing the ‘retain-id’ flag here, for cases where output should be consistent with that of these other functions.

(defun string-lookup (id &optional retain-id)
  (let ((ret (select [text]
                     :from [strings]
                     :where [= [id] id])))
    (if retain-id
        (list id ret)

(defun triple-lookup (id &optional retain-id)
  (let ((ret (select [code1] [ref1]
                     [code2] [ref2]
                     [code3] [ref3]
                     :from [triples]
                     :where [= [id] id])))
    (if retain-id
        (cons id ret)

(defun list-lookup (id &optional retain-id)
  (let ((ret (select [name]
                     :from [lists]
                     :where [= [id] id])))
    (if retain-id
        (list id ret)
Note 4.22 (Succinct idioms for following pointers).

Here are some variants on the functions above which save us from needing to extract the id of the item from its coordinates.

(defun string-contents (coords)
  (string-lookup (second coords)))

(defun place-contents (coords)
  (place-lookup (second coords)))

(defun triple-contents (coords)
  (triple-lookup (second coords)))
Note 4.23 (Switchboard).

Even more succinctly, one function that can get the object indicated by any set of coordinates.

(defun switchboard (coords)
  (cond ((eq (first coords) 0)
         (string-contents coords))
        ((eq (first coords) 1)
         (place-contents coords))
        ((eq (first coords) 2)
         (triple-contents coords))))
Note 4.24 (Anti-pasti).

The readability of this code could perhaps be improved if we used functions like ‘switchboard’ more frequently. (More to the point, it seems it’s not currently used.) In particular, it would be nice if we could sweep idioms like ‘(2 ,(car triple)) under the rug.


4.4 Queries

Note 4.25 (The use of views).

It is easy enough to select those triples which match simple data, e.g., those triples which have the same beginning, middle, or end, or any combination of these. It is a little more complicated to find items that match criteria specified by several different triples; for example, to find all the books by Arthur C. Clarke that are also works of fiction.

Suppose our collection of triples contains a portion as follows:

Profiles of the Future is a book
2001: A Space Odyssey is a book
Ender’s Game is a book
Profiles of the Future has genre non-fiction
2001: A Space Odyssey has genre fiction
Ender’s Game has genre fiction
Profiles of the Future has author Arthur C. Clarke
2001: A Space Odyssey has author Arthur C. Clarke
Ender’s Game has author Orson Scott Card

One way to solve the given problem would be to find those items that are written by Arthur C. Clarke (* “has author” and “Arthur C. Clarke”), that are books (* “is a” “book”), and that are classified as fiction (* “has genre” “fiction”). We are looking for items that match all of these conditions.

Our implementation strategy is: collect the items matching each criterion into a view, then join these views. (See the function ‘satisfy-conditions’ 4.39.)

If we end up working with large queries and a lot of data, this use of views may not be an efficient way to go – but we’ll cross that bridge when we come to it.

Note 4.26 (Search queries).

In Note A.5 et seq., we give some instructions on how to set up the Sphinx search engine to work with Arxana. However, a much tighter integration of Sphinx into Arxana is possible, and will be coming soon.

(in-package arxana)


Note 4.27 (On ‘print-system-object’).

The function ‘print-system-object’ bears some resemblance to ‘massage’, but is for printing instead, and therefor has to be recursive (because triples and places can point to other system objects, printing can be a long and drawn out ordeal).

(defun print-system-object (data &optional components)
    ;; just return strings
    ((stringp data)
    ;; printing from coordinates (code, ref)
    ((and (listp data)
          (equal (length data) 2))
     ;; we’ll need some hack to deal with
     ;; elements-of-theories, which, right now, are two
     ;; elements long but are not (code, ref) pairs but
     ;; rather (local_id, ref) pairs, or maybe actually if
     ;; we take context into consideration, they’re
     ;; actually (k, table, local_id, ref) quadruplets.
     ;; Obviously with *that* data we can translate to
     ;; (code, ref).  On the other hand, if we *don’t*
     ;; take it into consideration, we probably can’t do
     ;; much of anything.  So we should be careful to be
     ;; aware of just what sort of information we’re
     ;; passing around.
     (cond ((equal (first data) 0)
            (string-lookup (second data)))
           ((equal (first data) 1)
             (place-lookup (second data) t)))
           ((equal (first data) 2)
            (let ((triple (triple-lookup (second data) t)))
              (if components
                   (print-beginning triple)
                   (print-middle triple)
                   (print-end triple))
                   (format nil "T~a[" (second data))
                   (print-beginning triple) "."
                   (print-middle triple) "."
                   (print-end triple) "]"))))
           ((equal (first data) 3)
            (concatenate ’string "List printing not implemented yet."))))
    ;; place
    ((and (listp data)
          (equal (length data) 3))
     (concatenate ’string
                  (format nil "P~a|" (first data))
                  (print-system-object (cdr data)) "|"))
    ;; triple
    ((and (listp data)
          (equal (length data) 7))
      (if components
           (print-beginning data)
           (print-middle data)
           (print-end data))
           (format nil "T~a[" (first data))
           (print-beginning data) "."
           (print-middle data) "."
           (print-end data) "]")))
    (t nil)))

(defun print-beginning (triple)
  (print-system-object (isolate-beginning triple)))

(defun print-middle (triple)
  (print-system-object (isolate-middle triple)))

(defun print-end (triple)
  (print-system-object (isolate-end triple)))
Note 4.28 (Depth).

If we are going to have complicated recursive references, our printer, and anything else that gives the system some semantics, should come with some sort of “layers” switch that can be used to limit the amount of recursion we do in any given computation.

Note 4.29 (Printing objects as they appear in Lisp).

With the following functions we provide facilities for printing an object, either from its id or from the expanded form of the data that represents it in Lisp. (This is one good reason to have one standard form for this data; compare Note 4.21. These functions assume that the id is part of what’s printed, so if using functions like ‘triple-lookup’ to retrieve data for printing, you’ll have to graft the id back on before printing with these functions.)

Note 4.30 (Printing theories).

We’ll want to both print all of the content of a theory, and print from the theory in a more limited way. (Perhaps we get the second item for free, already?)

(defun print-string (string &optional components)
  (print-system-object string components))

(defun print-place (place &optional components)
  (print-system-object place components))

(defun print-triple (triple &optional components)
  (print-system-object triple components))

(defun print-string-from-id (id &optional components)
  (print-system-object (list 0 id) components))

(defun print-place-from-id (id &optional components)
  (print-system-object (list 1 id) components))

(defun print-triple-from-id (id &optional components)
  (print-system-object (list 2 id) components))
Note 4.31 (Printing some stuff but not other stuff).

These functions are good for printing lists as come out of the database. See Note 4.34 on printing strings.

(defun print-strings (strings)
  (mapcar ’second strings))

(defun print-places (places &optional components)
  (mapcar (lambda (item)
             (print-system-object item components))

(defun print-triples (triples &optional components)
 (mapcar (lambda (item)
             (print-system-object item components))

(defun print-theories (theories &optional components)
 (mapcar (lambda (item)
             (print-system-object item components))
Note 4.32 (Printing everything in each table).

These functions collect human-readable versions of everything in each table. Notice that ‘all-strings’ is written differently.

(defun all-strings ()
  (mapcar ’second (select [*] :from [strings])))

(defun all-places ()
  (mapcar ’print-system-object
          (select [*] :from [places])))

(defun all-triples ()
 (mapcar ’print-system-object
         (select [*] :from [triples])))

(defun all-theories ()
 (mapcar ’print-system-object
         (select [*] :from [theories])))
Note 4.33 (Printing on particular dimensions).

One possible upgrade to the printing functions would be to provide the built-in to “curry” the printout – for example, just print the source nodes from a list of triples. However, it should of course also be possible to do processing like this Lisp after the printout has been made (the point is, it is presumably it is more efficient only to retrieve and format the data we’re actually looking for).

Note 4.34 (Strings and ids).

Unlike other objects, strings don’t get printed with their ids. We should probably provide an option to print with ids (this could be helpful for subsequent work with the strings in question; on the other hand, since strings are being kept unique, we can immediately exchange a string and it’s id, so I’m not sure if it’s necessary to have an explicit “option”).

Functions that establish basic graph structure

Note 4.35 (Thinking about graph-like data).

Here we have in mind one or more objects (e.g. a particular source and sink) that is associated with potentially any number of triples (e.g. all the possible middles running between these two identified objects). These functions establish various forms of locality or neighborhood within the data.

The results of such queries can be optionally cached in a view, which is useful for further processing (cf. 4.39).

These functions take input in the form of strings and/or coordinates (cf. Note 4.12).

(defun triples-given-beginning (node &optional view)
  "Get triples outbound from the given NODE.  Optional
  argument VIEW causes the results to be selected into a
  view with that name."
  (let ((data (massage node))
        (window (or view "interal-view"))
    (when data
        :as (select [*]
             :from [triples]
             :where [and [= [code1] (first data)]
                         [= [ref1] (second data)]]))
      (setq ret (select [*] :from window))
      (unless view
        (drop-view window))

(defun triples-given-end (node &optional view)
  "Get triples inbound into NODE.  Optional argument VIEW
       causes the results to be selected into a view with
       that name."
  (let ((data (massage node))
        (window (or view "interal-view"))
    (when data
        :as (select [*]
             :from [triples]
             :where [and [= [code3] (first data)]
                         [= [ref3] (second data)]]))
      (setq ret (select [*] :from window))
      (unless view
        (drop-view window))

(defun triples-given-middle (edge &optional view)
  "Get the triples that run along EDGE.  Optional argument
       VIEW causes the results to be selected into a view
       with that name."
  (let ((data (massage edge))
        (window (or view "interal-view"))
    (when data
       :as (select [*]
            :from [triples]
            :where [and [= [code2] (first data)]
                        [= [ref2] (second data)]]))
      (setq ret (select [*] :from window))
      (unless view
        (drop-view window))

(defun triples-given-middle-and-end (edge node &optional
  "Get the triples that run along EDGE into NODE.
       Optional argument VIEW causes the results to be
       selected into a view with that name."
  (let ((edgedata (massage edge))
        (nodedata (massage node))
        (window (or view "interal-view"))
    (when (and edgedata nodedata)
       :as (select [*]
            :from [triples]
            :where [and [= [code2] (first edgedata)]
                        [= [ref2] (second edgedata)]
                        [= [code3] (first nodedata)]
                        [= [ref3] (second nodedata)]]))
      (setq ret (select [*] :from window))
      (unless view
        (drop-view window))

(defun triples-given-beginning-and-middle (node edge
                                           &optional view)
  "Get the triples that run from NODE along EDGE.
Optional argument VIEW causes the results to be selected
into a view with that name."
  (let ((nodedata (massage node))
        (edgedata (massage edge))
        (window (or view "interal-view"))
    (when (and nodedata edgedata)
       :as (select [*]
            :from [triples]
            :where [and [= [code1] (first nodedata)]
                        [= [ref1] (second nodedata)]
                        [= [code2] (first edgedata)]
                        [= [ref2] (second edgedata)]]))
      (setq ret (select [*] :from window))
      (unless view
        (drop-view window))

(defun triples-given-beginning-and-end (node1 node2
       &optional view)
  "Get the triples that run from NODE1 to NODE2.  Optional
       argument VIEW causes the results to be selected
       into a view with that name."
  (let ((node1data (massage node1))
        (node2data (massage node2))
        (window (or view "interal-view"))
    (when (and node1data node2data)
       :as (select [*]
            :from [triples]
            :where [and [= [code1] (first node1data)]
                        [= [ref1] (second node1data)]
                        [= [code3] (first node2data)]
                        [= [ref3] (second node2data)]]))
      (setq ret (select [*] :from window))
      (unless view
        (drop-view window))

;; This one use ‘select-one’ instead of ‘select’
(defun triple-exact-match (node1 edge node2 &optional
  "Get the triples that run from NODE1 along EDGE to
NODE2.  Optional argument VIEW causes the results to be
selected into a view with that name."
  (let ((node1data (massage node1))
        (edgedata (massage edge))
        (node2data (massage node2))
        (window (or view "interal-view"))
    (when (and node1data edgedata node2data)
       :as (select [*]
            :from [triples]
            :where [and [= [code1] (first node1data)]
                        [= [ref1] (second node1data)]
                        [= [code2] (first edgedata)]
                        [= [ref2] (second edgedata)]
                        [= [code3] (first node2data)]
                        [= [ref3] (second node2data)]]))
      (setq ret (select-one [*] :from window))
      (unless view
        (drop-view window))
Note 4.36 (Becoming flexible about a string’s status).

One possible upgrade would be to provide versions of these functions that will flexibly accept either a string or a “placed string” as input (since frequently we’re interested in content of that sort; see 5.17).

Finding places that satisfy some property

Note 4.37 (On ‘get-places-subject-to-constraint’).

Like ‘get-places’ (Note LABEL:get-places), but this time takes an extra condition of the form (A C B) where one of A, B, and C is ‘nil’. We test each of the places in place of this ‘nil’, to see if a triple matching that criterion exists.

(defun get-places-subject-to-constraint (symbol condition)
  (let ((candidate-places (get-places symbol))
    (dolist (place candidate-places)
      (let ((filled-condition
             (map ’list (lambda (elt) (or elt
                                          ‘(1 ,place)))
        (when (apply ’triple-relaxed-match
          (setq accepted-places
                (cons place accepted-places)))))


Note 4.38 (Caution: compatibility with theories?).

For the moment, I’m not sure how compatible this function is with the theories apparatus we’ve established, or with the somewhat vaguer notion of trans-theory questions or concerns. Global queries should work just fine, but theory-local questions may need some work. Before getting into compatibility of these questions with the theory apparatus, I want to make sure that apparatus is working properly. Note that the questions here do rely on functions for graph-like thinking (Note 4.35 et seq.), and it would certainly make sense to port to “subgraphs” as represented by theories.

Note 4.39 (On ‘satisfy-conditions’).

This function finds the items which match constraints. Constraints take the form (A B C), where precisely one of A, B, or C should be ‘nil’, and any of the others can be either input suitable for ‘massage’, or ‘t’. The ‘nil’ entry stands for the object we’re interested in. Any ‘t’ entries are wildcards.

The first thing that happens as the function runs is that views are established exhibiting each group of triples satisfying each predicate. The names of these views are then massaged into a large SQL query. (It is important to “typeset” all of this correctly for our SQL ‘query’.) Finally, once that query has been run, we clean up, dropping all of the views we created.

(defun satisfy-conditions (constraints)
  (let* ((views (generate-views constraints))
         (formatted-list-of-views (format-views
         (where-condition (generate-where-condition
          ;; Let’s see what the query is, first of all.
            "select, v1.code1, v1.ref1, "
                          "v1.code2, v1.ref2, "
                          "v1.code3, v1.ref3 "
            "from "
            "where "
    (mapc (lambda (name) (drop-view name)) views)
Note 4.40 (Subroutines for ‘satisfy-conditions’).

The functions below produce bits and pieces of the SQL query that ‘satisfy-conditions’ submits. The point of the ‘generate-views’ is to create a series of views centered on the term(s) we’re interested in (the ‘nil’ slots in each submitted constraint). With ‘generate-where-condition’, we insist that all of these interesting terms should, in fact, be equal to one another.

Note 4.41 (On ‘generate-views’).

In a ‘cond’ form, for each constraint we must select the appropriate function to generate the view; at the very end of the cond form, we spit out the viewname (for ‘mapcar’ to add to the list of views).

(defun generate-views (constraints)
  (let ((counter 0))
     (lambda (constraint)
       (setq counter (1+ counter))
       (let ((viewname (format nil "v~a" counter)))
          ;; A * ? or A ? *
          ((or (and (eq (second constraint) t)
                    (eq (third constraint) nil))
               (and (eq (second constraint) nil)
                    (eq (third constraint) t)))
            (first constraint)
          ;; * B ? or ? B *
          ((or (and (eq (first constraint) t)
                    (eq (third constraint) nil))
               (and (eq (first constraint) nil)
                    (eq (third constraint) t)))
            (second constraint)
          ;; * ? C or ? * C
          ((or (and (eq (first constraint) t)
                    (eq (second constraint) nil))
               (and (eq (first constraint) nil)
                    (eq (second constraint) t)))
            (third constraint)
          ;; ? B C
          ((eq (first constraint) nil)
            (second constraint)
            (third constraint)
          ;; A ? C
          ((eq (second constraint) nil)
            (first constraint)
            (second constraint)
          ;; A C ?
          ((eq (third constraint) nil)
            (first constraint)
            (third constraint)

(defun format-views (views)
  (let ((formatted-list-of-views ""))
    (mapc (lambda (view)
            (setq formatted-list-of-views
                   (format nil "~a," view))))
          (butlast views))
    (setq formatted-list-of-views
           (format nil "~a " (car (last views)))))

(defun generate-where-condition (views conditions)
  (let ((where-condition "")
        (c (select-component (first conditions))))
    ;; there should be one less "=" condition than there
    ;; are things to compare; until we get to the last
    ;; view, everything is joined together by an ‘and’.
    ;; -- this needs to consider (map over) both ‘views’
    ;; and ‘conditions’.
     for i from 1 upto (1- (length views))
     (let ((compi (select-component (nth i conditions)))
           (viewi (nth i views)))
          "(v1.code" c " = " viewi ".code" compi ") and "
          "(v1.ref" c " = " viewi ".ref" compi ") and ")))))
    (let ((viewn (nth (1- (length views)) views))
          (compn (select-component
                    (nth (length views) conditions))))
        "(v1.code" c " = " viewn ".code" compn ") and "
        "(v1.ref" c " = " viewn ".ref" compn ")")))

(defun select-component (condition)
  (cond ((eq (first condition) nil) "1")
        ((eq (second condition) nil) "2")
        ((eq (third condition) nil) "3")))
Note 4.42 (Even more complicated logic).

In order to conveniently manage complex queries, it would be nice if we could store the results of earlier queries into views, so that we can combine several such views for further processing.

5 Emacs-side

5.1 The interface to Common Lisp

Note 5.1 (On ‘Defun’).

A way to define Elisp functions whose bodies are evaluated by Common Lisp. Trust me, this is a good idea. Besides, it exhibits some facinating backquote and comma tricks. But be careful: this definition of ‘Defun’ did not work on Emacs version 21.

If we want to be able to feed in a standard arglist to Common Lisp (with optional elements and so forth), we’d have define how these arguments are handled here!

(defmacro Defun (name arglist &rest body)
  (declare (indent defun))
  ‘(defun ,name ,arglist
     (let* ((outbound-string
              (format "%S"
                        (append (list ’lambda ’,arglist)
                        (lambda (arg) ‘’,arg)
                                 (lambda (testelt)
                                   (eq testelt
              ;; we now specify the right package!
               (list ’swank:eval-and-grab-output
       (process-slime-output returned-string))))
Note 5.2 (On ‘process-slime-output’).

This should downcase all constituent symbols, but for expediency I’m just downcasing ‘NIL’ at the moment. Will come back for more testing and downcasing shortly. (I suspect the general case is just about as easy as what happens here.)

(defun process-slime-output (str)
  (condition-case nil
      (let ((read-value (read str)))
        (if (symbolp read-value)
            (read (downcase str)))
        (nsubst nil ’NIL read-value))
    (error str)))
(defun translate-emacs-syntax-to-common-syntax (str)
    (insert str)
    (dolist (swap ’(("(\\‘ " "‘")
                    ("(\\\, " ",")))
      (goto-char (point-min))
      (while (search-forward (first swap) nil t)
        (goto-char (match-beginning 0))
        (delete-char -1)
        (goto-char (match-beginning 0))
        (delete-region (match-beginning 0)
                       (match-end 0))
        (insert (second swap))))
    (buffer-substring-no-properties (point-min)
Note 5.3 (Interactive ‘Defun’).

Note, an improved version of this macro would allow me to specify that some Defuns are interactive and some are not. This could be done by examining the submitted body, and adjusting the defun if its car is an ‘interactive’ form. Most of the Defuns will be things that people will want to use interactively, so making this change would probably be a good idea. What I’m doing in the mean time is just writing 2 functions each time I need to make an interactive function that accesses Common Lisp data!

Note 5.4 (Common Lisp evaluation of code chunks).

Another potentially beneficial and simple approach is to write a form like ‘progn’ that evaluates its contents on Common Lisp. This saves us from having to rewrite all of the ‘defun’ facilities into ‘Defun’ (e.g. interactivity). But… the problem with this is that Common Lisp doesn’t know the names of all the variables that are defined in Emacs! I’m not sure how to get all of the values of these variable substituted first, before the call to Common Lisp is made.

Note 5.5 (Debugging ‘Defun’).

In order to make debugging go easier, it might be nice to have an option to make the code that is supposed to be evaluated by Defun actually print on the REPL instead of being processed through an invisible back-end. There could be a couple of different ways to do that, one would be to simulate just what a user might do, the other would be a happy medium between that and what we’re doing now: just put our computery auto-generated code on the REPL and evaluate it. (To some extent, I think the *slime-events* buffer captures this information, but it is not particularly easy to read.)

Note 5.6 (Interactive Common Lisp?).

Suppose we set up some kind of interactive environment in Common Lisp; how would we go about passing this environment along to a user interacting via Emacs? (Note that SLIME’s presentation of the debugging loop is one good example.)

5.2 Database interaction

Note 5.7 (The ‘article’ function).

You can use this function to create an article with a given name and contents. If you like you can put it in a list.

(Defun article (name contents &optional heading)
  (let ((coordinates (add-triple name
                                 "has content"
    (when theory (add-triple coordinates "in" heading))
    (when place (if (numberp place)
                    (put-in-place coordinates place)
                  (put-in-place coordinates)))
Note 5.8 (The ‘scholium’ function).

You can use this function to link annotations to objects. As with the ‘article’ function, you can optionally categorize the connection on a given list (cf. Note 5.7).

(Defun scholium (beginning link end &optional heading)
  (let ((coordinates (add-triple beginning
    (when list (add-triple coordinates "in" heading))
    (when place (if (numberp place)
                    (put-in-place coordinates place)
                  (put-in-place coordinates)))
Note 5.9 (Uses of coordinates).

Note that, if desired, you can feed input of the form ’(code ref) into ‘article’ and ‘scholium’. It’s convenient to do further any processing of the object we’ve created, while we still have ahold of the coordinates returned by ‘add-triple’ (cf. Note 5.25 for an example).

Note 5.10 (Finding all the members of a list by type?).

We just narrow according to type.

Note 5.11 (On ‘get-article’).

Get the contents of the article named ‘name’. Optional argument ‘list’ lets us find and use the position on the given list that holds the name, and use that instead of the name itself.

We do not yet deal well with the ambiguous case in which there are several positions that correspond to the given name that appear on the same list.

Note also that out of the data returned by ‘triples-given-beginning-and-middle’, we should pick the (hopefully just) ONE that corresponds to the given list.

This means we need to pick over the list of triples returned here, and test each one to see if it is in our heading. As to WHY there might be more than one “has content” for a place that we know to be in our heading… I’m not sure. I guess we can go with the assumption that there is just one, for now.

(Defun get-article (name &optional heading)
  (let* ((place-pseudonyms
          (if heading
               name ‘(nil "in" ,heading))
            (get-places name)))
         (goes-by (cond
                    ((eq (length place-pseudonyms) 1)
                     ‘(1 ,(car place-pseudonyms)))
                      name "in" heading)
                    ((not heading) name)
                    (t nil))))
    (when goes-by
      ;; it might be nice to also return ‘goes-by’
      ;; so we can access the appropriate place again.
      (third (print-triple
                goes-by "has content"))
Note 5.12 (On ‘get-names’).

This function simply gets the names of articles that have names – in other words, every triple built around the “has content” relation.

(Defun get-names (&optional heading)
  (let ((conditions (list (list nil "has content" t))))
    (when heading
      (setq conditions
            (append conditions
                    (list (list nil "in" heading)))))
     (lambda (place-or-string)
         ;; place case
         ((eq (first place-or-string) 1)
           (place-lookup (second place-or-string))))
         ;; string case
         ((eq (first place-or-string) 0)
          (print-system-object place-or-string))))
      (lambda (triple)
        (isolate-beginning triple))
      (satisfy-conditions conditions)))))
Note 5.13 (Contrasting cases).

Consider the difference between

(? “has author” “Arthur C. Clarke”)
(? “has genre” “fiction”)


(name “has content” *)
(name “in” “heading”)

where, in the latter case, we know who we’re talking about, and we just want to limit the list of items generated by the “*” by the second condition. This should help illustrate the difference between ‘get-names’ (which is making a general query) and ‘get-article’ (which already knows the name of a specific article), and the logic that they use.

Note 5.14 (Placing items from Emacs).

We periodically need to place items from within Emacs. The function ‘place-item’ is a wrapper for ‘put-in-place’ that makes this possible (it also provides the user with an extra option, namely to put the place itself under a given heading).

Notice that when the symbol is placed in some pre-existing place (which can only happen when ‘id’ is not nil), that place may already be under some other heading. We will ignore this case for now (since it seems that putting objects into new places will be the preferred action), but later we will have to look at what to do in this other case.

(Defun place-item (symbol &optional id heading)
  (let ((coordinates (put-in-place symbol id)))
    (when heading (add-triple coordinates "in" heading))
Note 5.15 (Automatic classifications).

It will presumably make sense to offer increasingly “automatic” classifications for new objects. At this point, we’ve set things up so that the user can optionally supply the name of one heading that their new object is a part of.

It may make more sense to allow an ‘&rest theories’ argument, and add the triple to all of the specified theories. This would require modifying ‘Defun’ to accommodate the ‘&rest’ idiom; see Note 5.1.

Note 5.16 (Postconditions and provenance).

After adding something to the database, we may want to do something extra; perhaps generating provenance information, perhaps checking or enforcing database consistency, or perhaps running a hook that causes some update in the frontend (cf. Note 3.8). Provisions of this sort will come later, as will short-hand convenience functions for making particularly common complex entries.

5.3 Importing LaTeX documents

Note 5.17 (Importing sketch).

The code in this section imports a document as a collection of (sub-)sections and notes. It gathers the sections, sub-sections, and notes recursively and records their content in a tree whose nodes are places (Note LABEL:places) and whose links express the “component-of” relation described in Note 5.19.

This representation lets us see the geometric, hierarchical, structure of the document we’ve imported. It exemplifies a general principle, that geometric data should be represented by relationships between places, not direct relationships between strings. This is because “the same” string often appears in “different” places in any given document (e.g. a paper’s many sub-sections titled “Introduction” will not all have the same content).

What goes into the places is in some sense arbitrary. The key is that whatever is in or attached to these places must tell us everything we need to know about the part of the document associated with that place (e.g. in the case of a note, its title and contents). That’s over and above the structural links which say how the places relate to one another. Finally, all of these places and structural links will be added to a heading that represents the document as a whole.

A natural convention we’ll use will be to put the name of any document component that’s associated with a given place into that place, and add all other information as annotations.

Note 5.18 (Ordered versus unordered data).

The code in this section is an example of one way to work with ordered data (i.e. LaTeX documents are not just hierarchical, but the elements at each level of the hierarchy are also ordered).

Since many artifacts are hierachical (e.g. Lisp code), we should try to be compatible with native methods for working with order (in the case of Lisp, feed the code into a Lisp processor and use CDR and CAR, etc.).

We can use triples such as (“rank” “1” “Fred”) and (“rank” “2” “Barney”) to talk about order. There may be some SQL techniques that would help. (FYI, order can be handled very explicitly in Elephant!)

In order to account for different orderings, we need one more piece of data – some explicit treatment of where the order is; in other words, theories. (This table illustrates the fact that a heading is not so different from “an additional triple”; indeed, the only reason to make them different is to have the extra convenience of having their elements be numbered.)

rank 1 Fred Friday
rank 2 Barney Friday
rank 1 Barney Saturday
rank 2 Fred Saturday
Note 5.19 (The order of order).

The triples (“rank” “1” “Fred”) and (“rank” “2” “Barney”) mentioned in Note 5.18 are easy enough to read and understand; it might be more natural in some ways for us to say (“Fred” “rank” “1”) – Fred has rank 1. In this section, we’re concerned with talking about the ordered parts of a document, and (A n B) seems like an intuitive way to say “A’s nth component is B”.

Note 5.20 (It’s not overdoing it, right?).

When importing this document, we see links like the following. I hope that’s not “overdoing it”. (Take a look at Note 5.11 and Note 5.12 to see how we go about getting information out of the database.) We could get rid of one link if theories were database objects (cf. Note 8.2).

"T557[P135|Web interface|.in.arxana.tex]"
"T558[Future plans.9.P135|Web interface|]"
"T559[T558[Future plans.9.P135|Web interface|].in.arxana.tex]"
Note 5.21 (Importing in general).

We will eventually have a collection of parsers to get various kinds of documents into the system in various different ways (Note 8.12). For now, this section gives a simple way to get some sorts of LaTeX documents into the system, namely documents structured along the same lines as the document you’re reading now!

An interesting approach to parsing math documents has been undertaken in the LaTeXML project.1919 19 Eventually it would be nice to get that level of detail here, too! Emacsspeak is another example of a LaTeX parser that deals with large-scale textual structures as well as smaller bits and pieces.2020 20

It would probably be useful to put together some parsers for HTML and wiki code soon.

Note 5.22 (On ‘import-buffer’).

This function imports LaTeX documents, taking care of the non-recursive aspects of this operation. It imports frontmatter (everything up to the first \begin{section}), but assumes “backmatter” is trivial, and does not import it. The imported material is classified as a “document” with the same name as the imported buffer.

(defun import-buffer (&optional buffername)
    (set-buffer (get-buffer (or buffername
    (goto-char (point-min))
    (search-forward-regexp "\\\\begin{document}")
    (search-forward-regexp "\\\\section")
    (goto-char (match-beginning 0))
    ;; other links will be made in the "heading of this
    ;; document", but here we make a broader assertion.
    (scholium buffername "is a" "document")
    (scholium buffername
              "has frontmatter"
    ;;; These should maybe be scholia attached to
    ;; root-coords (below), but for some reason that
    ;; wasn’t working so well -- investigate later --
    ;; maybe it just wasn’t good to run after running
    ;; ‘import-within’.
    (let* ((root-coords (place-item buffername nil
            ’("section" "subsection" "subsubsection"))
           (current-parent buffername)
           (level-end nil)
           (sections (import-within levels))
           (index 0))
      (while sections
        (let ((coords (car sections)))
          (setq index (1+ index))
          (scholium root-coords
        (setq sections (cdr sections))))))
Note 5.23 (On ‘import-within’).

Recurse through levels of sectioning to import LaTeX code.

It would be good if we could do something about sections that contain neither subsections nor notes (for example, a preface), or, more generally, about text that is not contained in any environment (possibly that appears before any section). We’ll save things like this for another editing round!

For the moment, we’ve decided to build the document hierarchy with links that are blind to whether the kth component of a section is a note or a subsection. Children that are notes are attached in the subroutine ‘import-notes’ and those that are sections are attached in ‘import-within’. Users can find out what type of object they are looking at based on whether or not it “has content”.

Incidentally, when looking for the end of an importing level, ‘nil’ is an OK result – if this is the last section at this level and there is no subsequent section at a higher level.

(defun import-within (levels)
  (let ((this-level (car levels))
        (next-level (car (cdr levels))) answer)
    (while (re-search-forward
             "^\\\\" this-level "{\\([^}\n]*\\)}"
             "\\( +\\\\label{\\)?"
            level-end t)
      (let* ((name (match-string-no-properties 1))
             (at (place-item name nil buffername))
              (or (save-excursion
                     (concat "^\\\\" this-level "{.*")
                     level-end t))
              (if next-level
                  (or (progn (point)
                                (concat "^\\\\"
                                        next-level "{.*")
                                level-end t)))
             (index (let ((current-parent at))
                      (import-notes notes-end)))
             (subsections (let ((current-parent at))
                            (import-within (cdr levels)))))
        (while subsections
          (let ((coords (car subsections)))
            (setq index (1+ index))
            (scholium at
            (setq subsections (cdr subsections))))
        (setq answer (cons at answer))))
    (reverse answer)))
Note 5.24 (On ‘import-notes’).

We’re going to make the daring assumption that the “textual” portions of incoming LaTeX documents are contained in “Notes”. That assumption is true, at least, for the current document. The function returns the count of the number of notes imported, so that ‘import-within’ knows where to start counting this section’s non-note children.

Would this same function work to import all notes from a buffer without examining its sectioning structure? Not quite, but close! (Could be a fun exercise to fix this.)

(defun import-notes (end)
  (let ((index 0))
    (while (re-search-forward (concat "\\\\begin{notate}"
                                      "\\( +\\\\label{\\)?"
                              end t)
      (let* ((name
              (match-string-no-properties 1))
             (tag (match-string-no-properties 3))
              (progn (next-line 1)
              (progn (search-forward-regexp
                     (match-beginning 0)))
             (coords (place-item name nil buffername)))
        (setq index (1+ index))
        (scholium current-parent
        ;; not in the heading
        (scholium coords
                  "has content"
                   beg end))
        (import-code-continuations coords)))
Note 5.25 (On ‘import-code-continuations’).

This runs within the scope of ‘import-notes’, to turn the series of Lisp chunks or other code snippets that follow a given note into a scholium attached to that note. Each separate snippet becomes its own annotation.

The “conditional regexps” used here only work with Emacs version 23 or higher.

I’m noticing a problem with the way the ‘looking-at’ form behaves. It matches the expression in question, but then the match-end is reported as one character less than it supposed to be. Maybe ‘looking-at’ is just not as good as ‘re-search-forward’? But it’s what seems easiest to use.

(defun import-code-continuations (coords)
  (let ((possible-environments
    (while (looking-at
            (concat "\n*?\\\\begin{"
      (let* ((beg (match-end 0))
             (environment (match-string 1))
             (end (progn (search-forward-regexp
                          (concat "\\\\end{"
                         (match-beginning 0)))
             (content (buffer-substring-no-properties
        (scholium (scholium coords
                            "has attachment"
                  "has type"
Note 5.26 (On ‘autoimport-arxana’).

This just calls ‘import-buffer’, and imports this document into the system.

(defun autoimport-arxana ()
  (import-buffer "arxana.tex"))
Note 5.27 (Importing textual links).

Of course, it would be good to import the links that users make between articles, since then we can quickly navigate from an article to the various articles that cite that article, as well as follow the usual forward-directional links. Indeed, we should be able to browse each article within a “neighborhood” of other related articles. (We’ll need to import labels as well, of course.)

5.4 Browsing database contents

Note 5.28 (Browsing sketch).

This section facilitates browsing of documents represented with structures like those created in Section 5.3, and sets the ground for browsing other sorts of contents (e.g. collections of tasks, as in Section 6.1).

In order to facilitate general browsing, it is not enough to simply use ‘get-article’ (Note 5.11) and ‘get-names’ (Note 5.12), although these functions provide our defaults. We must provide the means to find and display different things differently – for example, a section’s table of contents will typically be displayed differently from its actual contents.

Indeed, the ability to display and select elements of document sections (Note 5.32) is basically the core browsing deliverable. In the process we develop a re-usable article selector (Note 5.30; cf. Note 6.9). This in turn relies on a flexible function for displaying different kinds of articles (Note 5.29).

Note 5.29 (On ‘display-article’).

This function takes in the name of the article to display. Furthermore, it takes optional arguments ‘retriever’ and ‘formatter’, which tell it how to look up and/or format the information for display, respectively.

Thus, either we make some statement up front (choosing our ‘formatter’ based on what we already know about the article), or we decide what to display after making some investigation of information attached to the article, some of which may be retrieved and displayed (this requires that we specify a suitable ‘retriever’ and a complementary ‘formatter’).

For example, the major mode in which to display the article’s contents could be stored as a scholium attached to the article; or we might maintain some information about “areas” of the database that would tell us up front what which mode is associated with the current area. (The default is to simply insert the data with no markup whatsoever.)

Observe that this works when no heading argument is given, because in that case ‘get-article’ looks for all place pseudonyms. (But of course that won’t work well when we have multiple theories containing things with the same names, so we should get used to using the heading argument.)

(The business about requiring the data to be a sequence before engaging in further formatting is, of course, just a matter of expediency for making things work with the current dataset.)

(defun display-article
  (name &optional heading retriever formatter)
  (interactive "Mname: ")
  (let* ((data (if retriever
                   (funcall retriever name heading)
                 (get-article name heading))))
    (when (and data (sequencep data))
        (if formatter
            (funcall formatter data heading)
          (pop-to-buffer (get-buffer-create
                          "*Arxana Display*"))
          (delete-region (point-min) (point-max))
          (insert "NAME: " name "\n\n")
          (insert data)
          (goto-char (point-min)))))))
Note 5.30 (An interactive article selector).

The function ‘get-names’ (Note 5.12) and similar functions can give us a collection of articles. The next few functions provide an interactive functionality for moving through this collection to find the article we want to look at.

We define a “display style” that the article selector uses to determine how to display various articles. These display styles are specified by text properties attached to each option the selector provides. Similarly, when we’re working within a given heading, the relevant heading is also specified as a text property.

At selection time, these text properties are checked to determine which information to pass along to ‘display-article’.

(defvar display-style ’((nil . (nil nil))))

(defun thing-name-at-point ()

(defun get-display-type ()
  (get-text-property (line-beginning-position)

(defun get-relevant-heading ()
  (get-text-property (line-beginning-position)

(defun arxana-list-select ()
  (apply ’display-article
         (cdr (assoc (get-display-type)

(define-derived-mode arxana-list-mode fundamental-mode
  "arxana-list" "Arxana List Mode.


(define-key arxana-list-mode-map (kbd "RET")
Note 5.31 (On ‘pick-a-name’).

Here ‘generate’ is the name of a function to call to generate a list of items to display, and ‘format’ is a function to put these items (including any mark-up) into the buffer from which individiual items can then be selected.

One simple way to get a list of names to display would be to reuse a list that we had already produced (this would save querying the database each time). We could, in fact, store a history list of lists of names that had been displayed previously (cf. Note 5.39).

We’ll eventually want versions of ‘generate’ that provide various useful views into the data, e.g., listing all of the elements of a given section (Note 5.32).

Finding all the elements that match a given search term, whether that’s just normal text search or some kind of structured search would be worthwhile too. Upgrading the display to e.g. color-code listed elements according to their type would be another nice feature to add.

(defun pick-a-name (&optional generate format heading)
  (let ((items (if generate
                   (funcall generate)
                 (get-names heading))))
    (when items
      (set-buffer (get-buffer-create "*Arxana Articles*"))
      (toggle-read-only -1)
      (delete-region (point-min)
      (if format
          (funcall format items)
        (mapc (lambda (item) (insert item "\n")) items))
      (toggle-read-only t)
      (goto-char (point-min))
      (pop-to-buffer (get-buffer "*Arxana Articles*")))))
Note 5.32 (On ‘display-section’).

When browsing a document, if you select a section, you should display a list of that section’s constituent elements, be they notes or subsections. The question comes up: when you go to display something, how do you know whether you’re looking at the name of a section, or the name of an article?

When you get the section’s contents out of the database (Note 5.33)

(defun display-section (name heading)
  (interactive (list (read-string
                       "name (default "
                       (buffer-name) "): ")
                      nil nil (buffer-name))))
  ;; should this pop to the Articles window?
  (pick-a-name ‘(lambda ()
                   ,name ,heading))
               ‘(lambda (items)
                   items ,heading))))

(add-to-list ’display-style
             ’(section . (display-section
Note 5.33 (On ‘get-section-contents’).

Sent by ‘display-section’ (Note 5.32) to ‘pick-a-name’ as a generator for the table of contents of the section with the given name in the given heading.

This function first finds the triples that begin with the (placed) name of the section, then checks to see which of these are in the heading of the document we’re examinining (in other words, which of these links represent structural information about that document). It also looks at the items found at the end of these links to see if they are sections or notes (“noteness” is determined by them having content). The links are then sorted by their middles (which show the order in which these components have in the section we’re examining). After this ordering information has been used for sorting, it is deleted, and we’re left with just a list of names in the apropriate order together with an indication of their noteness.

(Defun get-section-contents (name heading)
  (let (contents)
    (dolist (triple (triples-given-beginning
                     ‘(1 ,(resolve-ambiguity
                           (get-places name)))))
      (when (triple-exact-match
             ‘(2 ,(car triple)) "in" heading)
        (let* ((number (print-middle triple))
               (site (isolate-end triple))
                (when (triples-given-beginning-and-middle
                       site "has content")
        (setq contents
              (cons (list number
                           (place-contents site))
    (mapcar ’cdr
            (sort contents
                  (lambda (component1 component2)
                    (< (parse-integer (car component1))
                       (parse-integer (car component2))))))))
Note 5.34 (On ‘format-section-contents’).

A formatter for document contents, used by ‘display-document’ (Note 5.35) as input for ‘pick-a-name’ (Note 5.31).

Instead of just printing the items one by one, like the default formatter in ‘pick-a-name’ does, this version adds appropriate text properties, which we determine based the second component of of ‘items’ to format.

(defun format-section-contents (items heading)
  ;; just replicating the default and building on that.
  (mapc (lambda (item)
          (insert (car item))
          (let* ((beg (line-beginning-position))
                 (end (1+ beg)))
            (unless (second item)
              (put-text-property beg end
            (put-text-property beg end
          (insert "\n"))
Note 5.35 (On ‘display-document’).

When browsing a document, you should first display its top-level table of contents. (Most typically, a list of all of that document’s major sections.) In order to do this, we must find the triples that are begin at the node representing this document and that are in the heading of this document. This boils down to treating the document’s root as if it was a section and using the function ‘display-section’ (Note 5.32).

(defun display-document (name)
  (interactive (list (read-string
                       "name (default "
                       (buffer-name) "): ")
                      nil nil (buffer-name))))
  (display-section name name))
Note 5.36 (Work with ‘heading’ argument).

We should make sure that if we know the heading we’re working with (e.g. the name of the document we’re browsing) that this information gets communicated in the background of the user interaction with the article selector.

Note 5.37 (Selecting from a hierarchical display).

A fancier “article selector” would be able to display several sections with nice indenting to show their hierarchical order.

Note 5.38 (Browser history tricks).

I want to put together (or put back together) something similar to the multihistoried browser that I had going in the previous version of Arxana and my Emacs/Lynx-based web browser, Nero2121 21 joe/nero.el. The basic features are: (1) forward, back, and up inside the structure of a given document; (2) switch between tabs. More advanced features might include: (3) forward and back globally across all tabs; (4) explicit understanding of paths that loop.

These sorts of features are independent of the exact details of what’s printed to the screen each time something is displayed. So, for instance, you could flip between section manifests a la Note 5.32, or between hierarchical displays a la Note 5.37, or some combination; the key thing is just to keep track in some sensible way of whatever’s been displayed!

Note 5.39 (Local storage for browsing purposes).

Right now, in order to browse the contents of the database, you need to query the database every time. It might be handy to offer the option to cache names of things locally, and only sync with the database from time to time. Indeed, the same principle could apply in various places; however, it may also be somewhat complicated to set up. Using two systems for storage, one local and one permanent, is certainly more heavy-duty than just using one permanent storage system and the local temporary display. However, one thing in favor of local storage systems is that that’s what I used in the the previous prototype of Arxana – so some code already exists for local storage! (Caching the list of names we just made a selection from would be one simple expedient, see Note 5.31.)

Note 5.40 (Hang onto absolute references).

Since ‘get-article’ (Note 5.11) translates strings into their “place pseudonyms”, we may want to hang onto those pseudonyms, because they are, in fact, the absolute references to the objects we end up working with. In particular, they should probably go into the text-property background of the article selector, so it will know right away what to select!

5.5 Exporting LaTeX documents*

Note 5.41 (Roundtripping).

The easiest test is: can we import a document into the system and then export it again, and find it unchanged?

Note 5.42 (Data format).

We should be able to stably import and export a document, as well as export any modifications to the document that were generated within Arxana. This means that the exporting functions will have to read the data format that the importing functions use, and that any functions that edit document contents (or structure) will also have to use the same format. Furthermore, browsing functions will have to be somewhat aware of this format. So, this is a good time to ask – did we use a good format?

5.6 Editing database contents*

Note 5.43 (Roundtripping, with changes).

Here, we should import a document into the system and then make some simple changes, and after exporting, check with diff to make sure the changes are correct.

Note 5.44 (Re-importing).

One nice feature would be a function to “re-import” a document that has changed outside of the system, and make changes in the system’s version whereever changes appeared in the source version.

Note 5.45 (Editing document structure).

The way we have things set up currently, it is one thing to make a change to a document’s textual components, and another to change its structure. Both types of changes must, of course, be supported.

6 Applications

6.1 Managing tasks

Note 6.1 (What are tasks?).

Each task tends to have a name, a description, a collection of prerequisite tasks, a description of other material dependencies, a status, some justification of that status, a creation date, and an estimated time of completion. There might actually be several “estimated times of completion”, since the estimate would tend to improve over time. To really understand a task, one should keep track of revisions like this.

Note 6.2 (On ‘store-task-data’).

Here, we’re just filling in a frame. Since “filling in a frame” seems like the sort of operation that might happen over and over again in different contexts, to save space, it would probably be nice to have a macro (or similar) that would do a more general version of what this function does.

(Defun store-task-data
  (name description prereqs materials status
        justification submitted eta)
  (add-triple name "is a" "task")
  (add-triple name "description" description)
  (add-triple name "prereqs" prereqs)
  (add-triple name "materials" materials)
  (add-triple name "status" status)
  (add-triple name "status justification" justification)
  (add-triple name "date submitted" submitted)
  (add-triple name "estimated time of completion" eta))
Note 6.3 (On ‘generate-task-data’).

This is a simple function to create a new task matching the description above.

(defun generate-task-data ()
  (let ((name (read-string "Name: "))
        (description (read-string "Description: "))
        (prereqs (read-string
                  "Task(s) this task depends on: "))
        (materials (read-string "Material dependencies: "))
        (status (completing-read
                 "Status (tabled, in progress, completed):
                 " ’("tabled" "in progress" "completed")))
        (justification (read-string "Why this status? "))
          (concat "Date submitted (default "
                  (substring (current-time-string) 0 10)
                  "): ")
          nil nil (substring (current-time-string) 0 10)))
         (read-string "Estimated date of completion:")))
    (store-task-data name description prereqs materials
                     justification submitted eta)))
Note 6.4 (Possible enhancements to ‘generate-task-data’).

In order to make this function very nice, it would be good to allow “completing read” over known tasks when filling in the prerequisites. Indeed, it might be especially nice to offer a type of completing read that is similar in some sense to the tab-completion you get when completing a file name, i.e., quickly completing certain sub-strings of the final string (in this case, these substrings would correspond to task areas we are progressively zooming down into).

As for the task description, rather than forcing the user to type the description into the minibuffer, it might be nice to pop up a separate buffer instead (a la the Emacs/w3m textarea). If we had a list of all the known tasks, we could offer completing-read over the names of existing tasks to generate the list of ‘prereqs’. It might be nice to systematize date data, so we could more easily e.g. sort and display task info “by date”. (Perhaps we should be working with predefined database types for dates and so on; but see Note 8.13.)

Also, before storing the task, it might be nice to offer the user the chance to review the data they entered.

Note 6.5 (On ‘get-filler’).

Just a wrapper for ‘triples-given-beginning-and-middle’. (Maybe add ‘heading’ as an option here.)

(Defun get-filler (frame slot)
  (third (first
           (triples-given-beginning-and-middle frame
Note 6.6 (On ‘get-task’).

Uses ‘get-filler’ (Note 6.5) to assemble the elements of a task’s frame.

(Defun get-task (name)
  (when (triple-exact-match name "is a" "task")
    (list (get-filler name "description")
          (get-filler name "prereqs")
          (get-filler name "materials")
          (get-filler name "status")
          (get-filler name "status justification")
          (get-filler name "date submitted")
          (get-filler name
                      "estimated time of completion"))))
Note 6.7 (On ‘review-task’).

This is a function to review a task by name.

(defun review-task (name)
  (interactive "MName: ")
  (let ((task-data (get-task name)))
    (if task-data
        (display-task task-data)
      (message "No data."))))

(defun display-task (data)
    (pop-to-buffer (get-buffer-create
                    "*Arxana Display*"))
    (delete-region (point-min) (point-max))
    (insert "NAME: " name "\n\n")
    (insert "DESCRIPTION: " (first data) "\n\n")
            (second data) "\n\n")
            (third data) "\n\n")
    (insert "STATUS: " (fourth data) "\n\n")
    (insert "WHY THIS STATUS?: " (fifth data) "\n\n")
    (insert "DATE SUBMITTED:" (sixth data) "\n\n")
            (seventh data) "\n\n")
    (goto-char (point-min))
    (fill-individual-paragraphs (point-min) (point-max))))
Note 6.8 (Possible enhancements to ‘review-task’).

Breaking this down into a function to select the task and another function to display the task would be nice. Maybe we should have a generic function for selecting any object “by name”, and then special-purpose functions for displaying objects with different properties.

Using text properties, we could set up a “field-editing mode” that would enable you to select a particular field and edit it independently of the others. Another more complex editing mode would know which fields the user had edited, and would store all edits back to the database properly. See Section 5.6 for more on editing.

Note 6.9 (Browsing tasks).

The function ‘pick-a-name’ (Note 5.31) takes two functions, one that finds the names to choose from, and the other that says how to present these names. We can therefore build ‘pick-a-task’ on top of ‘pick-a-name’.

(Defun get-tasks ()
  (mapcar #’first
           (triples-given-middle-and-end "is a" "task")

(defun pick-a-task ()
   (lambda (items)
     (mapc (lambda (item)
             (let ((pos (line-beginning-position)))
               (insert item)
               (put-text-property pos (1+ pos)
               (insert "\n"))) items))))

(add-to-list ’display-style
             ’(task . (get-task display-task)))
Note 6.10 (Working with theories).

Presumably, like other related functions, ‘get-tasks’ should take a heading argument.

Note 6.11 (Check display style).

Check if this works, and make style consistent between this usage and earlier usage.

Note 6.12 (Example tasks).

It might be fun to add some tasks associated with improving Arxana, just to show that it can be done… maybe along with a small importer to show how importing something without a whole lot of structure can be easy.

6.2 Other ideas*

Note 6.13 (A browser within a browser).

All the stuff we’re doing with triples can be superimposed over the existing web and existing web interfaces, by, for example, writing a web browser as a web app, and in this “browser within a browser” offer the ability to annotate and rewrite other people’s web pages, produce 3rd-party redirects, and so forth, sharing these mods with other subscribers to the service. (Already websites such as the short-lived have offered limited versions of “web annotation”, but, so far, what one can do with such services seems quite weak compared with what’s possible.)

Note 6.14 (Improvements to the PlanetMath backend).

From one point of view, the SQL tables are the main thing in Noosphere. We could say that getting the things out of SQL and storing new things there is what Noosphere mainly does. Following this line of thought, anything that adjusts these tables will do just as well, e.g., it shouldn’t be terribly hard to develop an email-based front-end. But rather than making Arxana work with the Noosphere relational table system, it is probably advantageous to translate the data from these tables into the scholium system.

Note 6.15 (A new communication platform).

One of the premier applications I have in mind is a new way to handle communications in an online-forum. I have previously called this “subchanneling”, but really, joining channels is just as important.

Note 6.16 (Some tutorials).

It would be interesting to write a tutorial for Common Lisp or just about any other topic with this system. For example, some little “worksheets” or “gymnasia” that will help solidify user knowledge in topics on which questions keep appearing.

7 Topics of philosophical interest

Note 7.1 (Research and development).

In Note 1.2, I mentioned a model that could apply in many contexts; it is an essentially metaphysical conception. I’m pretty sure that the data model of Note 1.3 provides a general-enough framework to represent anything we might find “out there”. However, even if this is the case, questions as to efficient means of working with such data still abound (cf. Note 3.3, Note 4.25).

I propose that along with development of Arxana as a useful system for doing “commons-based peer production” should come a research programme for understanding in much greater detail what “commons-based peer production” is. Eventually we may want to change the name of the subject of study to reflect still more general ideas of resource use.

While the “frontend” of this research project is anthropological, the “backend” is much closer to artificial intelligence. On this level, the project is about understanding effective means for solving human problems. Often this will involve decomposing events and processes into constituent elements, making increasingly detailed treatments along the lines described in Note 1.1.

Note 7.2 (The relationship between text and commentary).

Text under revision might be marked up by a copyeditor: in cases like these, the interpretation is clear. However, what about marginalia with looser interpretations? These seem to become part of the copy of the text they are attached to. What about steering processes applied to a given course of action? How about the relationship of thoughts or words to perception and action? How can we lower the barrier between conception and action, while still maintaining some purchase on wisdom?

You see, a lot of issues in life have to do with overlays, multi-tracking, interchange between different systems; and in these terms, a lot of philosophy reduces to “media awareness” which extends into more and more immediate contexts (Note 1.2).

Note 7.3 (Heuristic flow).

Continuing the notion above: one does not need a fully-developed “heading” of work in order to do work – instead, one wants some straightforward heuristics that will enable the desired work to get done. So, even supposing the work is “heading building”, it can progress without becoming overwhelmed in abstractions – because theories and heuristics are different things.

Note 7.4 (Limits of simple languages).

Triples are frequently “subject, verb, object” statements, although with the annotation features, we can modify any part of any such statement; for example, we can apply an adverb to a given verb.

“Tags”, of course, already provide “subject, predicate” relationships. It will be interesting to examine the degree to which human languages can be mapped down into these sorts of simple languages. What features are needed to make such languages useful? (Lisp’s ‘car’ and ‘cdr’ seem related to the idea of making predicates useful.)

How are triples and predicates “enough”? What, if anything, do they lack? The difference between triples and predicates illustrates the issue. How should we characterize Arxana’s additions to Lisp?

Note 7.5 (Higher dimensions).

Why stop with three components? Why not have (A,B,C,D,T) represent a semantic relationship between all of A, B, C, and D (in heading T, of course)? Actually, there is no reason to stop apart from the fact that I want to explore simple languages (Note 7.4). In real life, things are not as simple, and we should be ready to deal with the complexities! (Cf., for example, Note 8.4).

8 Future plans

Note 8.1 (Development pathways).

To the extent that it’s possible, I’d like to maintain a succinct non-linear roadmap in which tasks are outlined and prioritized, and some procedural details are made concrete. Whenever relevant this map should point into the current document. I’ll begin by revising the plans I’ve used so far!2222 22 Over the next several months, I’d like to see these plans develop into a genuine production machine, and see the machine begin to stabilize its operations.

Note 8.2 (Theories as database objects).

We’re just beginning to treat theories as database objects; I expect there will be more work to do to make this work really well. We’ll want to make some test cases, like building a “theory of chess”, or even just describing a particular chess board; cf. Note LABEL:partial-image.

Note 8.3 (Search engine/elements).

One of the features that came very easy in the Emacs-only prototype was textual search. With the strings stored in a database, Sphinx seems to be the most suitable search engine to use. It is tempting to try to make our own inverted index using triples, so that text-based search can be even more directly integrated with semantic search. (Since the latest version(s) of Sphinx can act to some extent like a MySQL database, we almost have a direct connection in the backend, but since Sphinx is not the same database, one would at least need some glue code to effect joins and so forth.)

More to the point, it is important for this project that the scholia-based document model be transparently extended down to the level of words and characters. It may be helpful to think about text as always being hypertext; a document as a heading; and a word in the inverted index as a frame.

Note 8.4 (Pointing at database elements and other things).

We will want to be able to point at other tables and at other sorts of objects and make use of their contents. The plan is that our triples will provide a sort of guide or backbone superimposed over a much larger data system.

Note 8.5 (Feature-chase).

There are lots of different features that could be explored, for example: multi-dimensional history lists; a useful treatment of ‘‘clusions’’; MS Word-like colorful annotations; etc. Many of these features are already prototyped.2323 23 See footnote 3.

Note 8.6 (Regression testing).

Along with any major feature chase, we should provide and maintain a regression testing suite.

Note 8.7 (Deleting and changing things).

How will we deal with unlinking, disassociating, forgetting, entropy, and the like? Changes can perhaps be modeled by an insertion following a deletion, and, as noted, we’ll need effective ways to represent and manage change (Note 3.7).

Note 8.8 (Tutorial).

Right now the system is simple enough to be pretty much self-explanatory, but if it becomes much more complicated, it might be helpful to put together a simple guide to some likely-to-be-interesting features.

Note 8.9 (Computing possible paths and connections).

If we can find all the direct paths from one node to another using ‘triples-given-beginning-and-end’, can we inject some algorthms for finding longer, indirect paths into the system, and find ways to make them useful?

Similarly, we can satisfy local conditions (Note 4.39), but we’ll want to deal with increasingly “non-local” conditions (even just using the logical operator “or”, instead of “and”, for example).

Note 8.10 (Monster Mountain).

In Summer 2007, we checked out the Monster Mountain MUD server2424 24, which would enable several users to interact with one LISP, instead of just one database. This would have a number of advantages, particularly for exploring “scholiumific programming”, but also towards fulfilling the user-to-user interaction objective stated in Note 1.2. I plan to explore this after the primary goal of multi-user interaction with the database has been solidly completed.

Note 8.11 (Web interface).

A finished web interface may take a considerable amount of work (if the complexity of an interesting Emacs interface is any indication), but the basics shouldn’t be hard to put together soon.

Note 8.12 (Parsing input).

Complicated objects specified in long-hand (e.g. triples pointing to triples) can be read by a relatively simple parser – which we’ll have to write! The simplest goal for the parser would be to be able to distinguish between a triple and a string – presumably that much isn’t hard. And of course, building complexes of triples that represent statements from natural language is a good long-term goal. (Right now, our granularity level is set much higher.)

Note 8.13 (Choice of database).

I expect Elephant2525 25 may become our preferred database at some point in the future; we are currently awaiting changes to Elephant that make nested queries possible and efficient. Some core queries related to managing a database of semantic links with the current Elephant were constructed by Ian Eslick, Elephant’s maintainer.2626 26

On the other hand, it might be reasonable to use an Emacs database and redo the whole thing to work in Emacs (again), e.g. for single-user applications or users who want to work offline a lot of the time.

Note 8.14 (Different kinds of theories).

Theories or variants thereof are of course already popular in other knowledge representation contexts.2727 27,2828 28 We’ll want to adopt some useful techniques for knowledge management as soon as the core systems are ready.

Various notions of a mathematical theory exist.2929 29 It would be nice to be able to assign specific logic to theories in Arxana, following the ‘‘little theories’’ design of e.g. IMPS.3030 30

9 Conclusion

Note 9.1 (Ending and beginning again).

This is the end of the Arxana system itself; the appendices provide some ancillary tools, and some further discussion. Contributions that support the development of the Arxana project are welcome.

Appendix A Appendix: Auto-setup

Note A.1 (Setting up auto-setup).

This section provides code for satifying dependencies and setting up the program. This code assumes that you are using a Debian/APT-based system (but things are not so different using say, Fedora or Fink; writing a multi-package-manager-friendly installer shouldn’t be hard). Of course, feel free to set things up differently if you have something else in mind!

(defalias ’set-up ’shell-command)

(defun alternative-set-up (string)
    (pop-to-buffer (get-buffer-create "*Arxana Help*"))
    (goto-char (point-max))
    (insert string "\n")))

(defun set-up-arxana-environment ()
  (if (y-or-n-p
       "Run commands (y) (or just show instructions)? ")
      (fset ’set-up ’shell-command)
    (fset ’set-up ’alternative-set-up))
  (when (y-or-n-p "Install dependencies? ")
    (set-up "mkdir ~/arxana")
    (set-up "cd arxana"))

  (when (y-or-n-p "Download latest Arxana? ")
    (set-up "wget"))

  (unless (y-or-n-p "Is your emacs good enough?... ")
     (concat "cvs -z3 -d"
             "/sources/emacs co emacs"))
    (set-up "mv emacs ~")
    (set-up "cd ~/emacs")
    (set-up "./configure && make bootstrap")
    (set-up "cd ~/arxana"))

  (defvar pac-man nil)

  (cond ((y-or-n-p
          "Do you use an apt-based package manager? ")
         (setq pac-man "apt-get"))
        (t (message
            "OK, get Lisp and SQL on your own, then!")))

  (when pac-man
    (when (y-or-n-p "Install Common Lisp? ")
      (set-up (concat pac-man " install sbcl")))

    (when (y-or-n-p "Install Postgresql? ")
      (set-up (concat pac-man " install postgresql"))
      (when (y-or-n-p "Help setting up PostgreSQL? ")
          (pop-to-buffer (get-buffer-create "*Arxana Help*"))
          (insert "As superuser (root),
edit /etc/postgresql/7.4/main/pg_hba.conf
make sure it says this:
host all all trust
then edit /etc/postgresql/7.4/main/postgresql.conf
and make it say
tcpip_socket = true
then restart:
/etc/init.d/postgresql-7.4 restart
su postgres
createuser username
as username, run
createdb -U username\n")))))

  (when (y-or-n-p "Install SLIME...? ")
    (set-up (concat "cvs -d :pserver:anonymous"
                           "/project/slime/cvsroot co slime"))
     (concat "echo \";; Added to ~/.emacs for Arxana:\n\n"
             "(add-to-list ’load-path \"~/slime/\")\n"
             "(setq inferior-lisp-program \"/usr/bin/sbcl\")\n"
             "(require ’slime)\n"
             "(slime-setup ’(slime-repl))\n\n\""
             "| cat - ~/.emacs > ~/updated.emacs &&"
             "mv ~/updated.emacs ~/.emacs")))

  (when (y-or-n-p "Set up Common Lisp environment? ")
    (set-up "mkdir ~/.sbcl")
    (set-up "mkdir ~/.sbcl/site")
    (set-up "mkdir ~/.sbcl/systems")
    (set-up "cd ~/.sbcl/site")
    (set-up (concat "wget"
    (set-up "tar -zxf clsql-4.0.3.tar.gz")
    (set-up (concat "wget"
    (set-up "tar -zxf uffi-1.6.0.tar.gz")
    (set-up (concat "wget"
    (set-up "tar -zxf md5-1.8.5.tar.gz")
    (set-up "cd ~/.sbcl/systems")
    (set-up "ln -s ../site/md5-1.8.5/md5.asd .")
    (set-up "ln -s ../site/uffi-1.6.0/uffi.asd .")
    (set-up "ln -s ../site/clsql-4.0.3/clsql.asd .")
    (set-up "ln -s ../site/clsql-4.0.3/clsql-uffi.asd .")
    (set-up (concat "ln -s ../site/clsql-4.0.3/"
                           "clsql-postgresql-socket.asd ."))
    (set-up "ln -s ~/arxana/arxana.asd ."))

  (when (y-or-n-p "Modify ~/.sbclrc so CL always starts Arxana? ")
     (concat "echo \";; Added to ~/.sbclrc for Arxana:\n\n"
             "(require ’asdf)\n\n"
             "(asdf:operate ’asdf:load-op ’swank)\n"
             "(setf swank:*use-dedicated-output-stream* nil)\n"
             "(setf swank:*communication-style* :fd-handler)\n"
             "(swank:create-server :port 4006 :dont-close t)\n\n"
             "(asdf:operate ’asdf:load-op ’clsql)\n"
             "(asdf:operate ’asdf:load-op ’arxana)\n"
             "(in-package arxana)\n"
             "| cat ~/.sbclrc - > ~/updated.sbclrc &&"
             "mv ~/updated.sbclrc ~/.sbclrc")))

  (when (y-or-n-p "Install Monster Mountain? ")
    (set-up "cd ~/.sbcl/systems")
    (set-up (concat
                    "darcs get"
    (set-up (concat
                    "svn checkout svn://"
                    "usocket/svn/usocket/trunk usocket-svn"))
    ;; I’ve had problems with this approach to setting cclan
    ;; mirror...
      "wget \""
    (set-up (concat "wget"
    (set-up "tar -zxf split-sequence.tar.gz")
     (concat "svn checkout"
             "svn/trunk/ mmtn-read-only"))
     "ln -s ~/bordeaux-threads/bordeaux-threads.asd .")
    (set-up "ln -s ~/usocket-svn/usocket.asd .")
    (set-up "ln -s ~/split-sequence/split-sequence.asd .")
    (set-up "ln -s ~/mmtn/src/mmtn.asd .")))
Note A.2 (Postgresql on Fedora).

There are some slightly different instructions for installing postgresql on Fedora; the above will be changed to include them, but for now, check them out on the web.3131 31

Note A.3 (Using MySQL and CLISP instead).

Since my OS X box seems to have a variety of confusing PostgreSQL systems already installed (which I’m not sure how to configure), and CLISP is easy to install with fink, I thought I’d try a different set up for simplicity and variety.

In order to make it work, I enabled root user on Mac OS X per instructions on web, and installed and configured mysql; used a slight modification of the strings table described previously; download and installed cffi3232 32; changed the definition of ‘connect-to-database’ in Arxana’s utilities.lisp; doctored up my  /.clisprc.lisp; and changed how I started Lisp. Details below.

;; on the shell prompt
sudo apt-get install mysql
sudo mysqld_safe --user=mysql &
sudo daemonic enable mysql
sudo mysqladmin -u root password root
mysql --user=root --password=root -D test
create database joe; grant all on joe.* to joe@localhost
identified by ’joe’

;; in tabledefs.lisp
(execute-command "CREATE TABLE strings (
   text TEXT,
   UNIQUE INDEX (text(255))

;; in ~/asdf-registry/ or whatever you’ve designated as
;; your asdf:*central-registry*
ln -s ~/cffi_0.10.4/cffi-uffi-compat.asd .
ln -s ~/cffi_0.10.4/cffi.asd .

;; In utilities.lisp
(defun connect-to-database ()
   (connect ‘("localhost" "joe" "joe" "joe")
            :database-type :mysql))

;; In ~/.clisprc.lisp
(asdf:operate ’asdf:load-op ’clsql)
(push "/sw/lib/mysql/"

;; From SLIME prompt, and not in ~/.clisprc.lisp
(in-package #:arxana)
Note A.4 (Installing Sphinx).

Here are some tips on how to install and configure Sphinx.

;; Fedora/Postgresql flavor
yum install postgresql-devel
./configure --without-mysql

;; Fink/MySQL flavor
./configure --with-mysql
Note A.5 (Getting Sphinx set up).

Here are some instructions I’ve used to get Sphinx set up.

Note A.6 (Create a sphinx.conf).

I want a very minimal sphinx.conf, this seems to work. (We should probably set this up so that it gets written to a file when the Arxana is set up.)

## Copy this to /usr/local/etc/sphinx.conf when you want
## to use it.

source strings
 type            = mysql
 sql_host        = localhost
 sql_user        = joe
 sql_pass        = joe
 sql_db          = joe
 sql_query       = SELECT id, text FROM strings

## index definition

index strings
 source          = strings
 path            = /Users/planetmath/sphinx/search-testing
 morphology      = none

## indexer settings

 mem_limit       = 32M

## searchd settings

 listen          = 3312
 listen          = localhost:3307:mysql41
 log             = /Users/planetmath/sphinx/searchd.log
 query_log       = /Users/planetmath/sphinx/searchd_query.log
 read_timeout    = 5
 max_children    = 30
 pid_file        = /Users/planetmath/sphinx/
 max_matches     = 1000
Note A.7 (Working from the command line).

Then you can run commands like these.

/usr/local/bin/indexer strings
/usr/local/bin/search "but, then"

% mysql -h -P 3307
mysql> SELECT * FROM strings WHERE MATCH(’but, then’);
Note A.8 (Integrating this with Lisp).

Since we can talk to Sphinx via Mysql protocol, it seems reasonable that we should be able to talk to it from CLSQL, too. With a little fussing to get the format right, I found something that works!

(connect ‘("" "" "" "" "3307") :database-type :mysql)
(mapcar (lambda (elt) (floor (car elt)))
  (query "select * from strings where match(’text’)"))
Note A.9 (Some added difficulty with Postgresql).

When I try to index things on the server, I get an error, as below. The question is a good one… I’m not sure how postgresql is set up on the server, actually…

ERROR: index ’strings’: sql_connect: could not connect to server:
Connection refused
Is the server running on host "localhost" and accepting
TCP/IP connections on port 5432?

Appendix B Appendix: A simple literate programming system

Note B.1 (The literate programming system used in this paper).

This code defines functions that grab all the Lisp portions of this document, evaluate the Emacs Lisp sections in Emacs, and save the Common Lisp sections in suitable files.3333 33 Cf. It requires that the LaTeX be written in a certain consistent way. The function assumes that this document is the current buffer.

(defvar lit-code-beginning-regexp

(defvar lit-code-end-regexp

(defun lit-process ()
    (let ((to-buffer "*Lit Code*")
          (from-buffer (buffer-name (current-buffer)))
          (start-buffers (buffer-list)))
      (set-buffer (get-buffer-create to-buffer))
      (set-buffer (get-buffer-create from-buffer))
      (goto-char (point-min))
      (while (re-search-forward
              lit-code-beginning-regexp nil t)
        (let* ((file (match-string 1))
               (beg (match-end 0))
               (end (save-excursion
                       lit-code-end-regexp nil t)
                      (match-beginning 0)))
               (match (buffer-substring-no-properties
                       beg end)))
          (let ((to-buffer
                 (if file
                     (concat "*Lit Code*: " file)
                   "*Lit Code*")))
              (set-buffer (get-buffer-create
              (insert match)))))
          (buffer (set-difference (buffer-list)
          (set-buffer buffer)
          (if (string= (buffer-name buffer)
                       "*Lit Code*")
            (write-region (point-min)
                          (concat "~/arxana/"
        (kill-buffer buffer)))))
Note B.2 (Emacs-export?).

It wouldn’t be hard to export the Elisp sections so that those who wanted to could ditch the literate wrapper.

Note B.3 (Bidirectional updating).

Eventually it would be nice to have a code repository set up, and make it so that changes to the code can get snarfed up here.

Note B.4 (A literate style).

Ideally, each function will have its own Note to introduce it, and will not be called before it has been defined. I sometimes make an exception to this rule, for example, functions used to form recursions may appear with no further introduction, and may be called before they are defined.

Appendix C Appendix: Hypertext platforms

Note C.1 (The hypertextual canon).

There is a core library of texts that come up in discussions of hypertext.

  • The Rosetta stone

  • The Talmud (Judah haNasi, Rav Ashi, and many others)

  • Monadology (Wilhelm Leibniz)

  • The Life and Opinions of Tristam Shandy, Gentleman (Lawrence Sterne)

  • Middlemarch (George Eliot)

  • The Nova Trilogy (William S. Burroughs)

  • The Logic of Sense (Gilles Deleuze)

  • Labyrinths (Jorge Luis Borges)

  • Literary Machines (Ted Nelson)

  • Lila (Robert M. Pirsig)

  • Dirk Gently’s Holistic Detective Agency (Douglas Adams)

  • Pussy, King of the Pirates (Kathy Acker)

At the same time, it is somewhat ironic that none of the items on this list are themselves hypertexts in the contemporary sense of the word. It’s also a bit funny that certain other works (even some by the same authors) aren’t on this list. Perhaps we begin to get a sense of what’s going on in this quote from Kathleen Burnett:3434 34

“Multiplicity, as a hypertextual principle, recognizes a multiplicity of relationships beyond the canonical (hierarchical). Thus, the traditional concept of literary authorship comes under attack from two quarters–as connectivity blurs the boundary between author and reader, multiplicity problematizes the hierarchy that is canonicity.”

It seems quite telling that non-hypertextual canons remain mostly-non-hypertextual even today, despite the existence of catalogs, indexes, and online access.3535 35

Note C.2 (A geek’s guide to literature).

This title is a riff on Slasov Žižek’s “A pervert’s guide to cinema”. Taking Note C.1 as a jumping-off point, why don’t we make a survey of historical texts from the point of view of an aficionado of hypertext! Just what does one have to do to “get on the list”? Just what is “the hypertextual perspective”? And, if Žižek is correct and we’re to look for the hyperreal in the world of cinematic fictions – what’s left over for the world of literature? (Or mathematics?)

Note C.3 (The number 3).

This is the number of things present if we count carefully the items A, B, and a connection C between them. [Picture of A𝐶B.]

(Or even: given A and B, we use Wittgenstein counting, and intuit that C exists as the collection {A,B}; after all, some connection must exist precisely because we were presented with A and B together – and lest the connections proliferate infinitely, we lump them all together as one. [Picture of A, B, with the frame labeled C.])

Note C.4 (Surfaces).

Deleuze talks about a theory of surfaces associated with verbs and events. His surfaces represent the evanescence of events in time, and of their descriptions in language. An event is seen as a vanishingly-thin boundary between one state of being and another.

Certainly, a statement that is true now may not be true five minutes from now. It is easier to think and talk about things that are coming up and things that have already happened. “Living in the moment” is regarded as special or even “Zen”.

We can begin to put these musings on a more solid mathematical basis. We first examine two types of interfaces:

  1. 1.

    A𝐶B, A𝐷B, A𝐸B (the interface of A and B across C, D, and E);

  2. 2.

    A𝐶B, D𝐶E, F𝐶G (the interface of various terms across C).

Note C.5 (Comic books).

No geek’s guide to literature would be complete without putting comics in a hallowed place. [Framed picture of A, B next to framed picture of A, B, a.] What happened? ¨

Note C.6 (Intersecting triples).

Diagrammatically, it is tempting to portray (ACB)midDE as if it was closely related to A(CDE)begB, despite the fact that they are notationally very different. I’ll have to think more about what this means.

Appendix D Appendix: Computational Linguistics

Note D.1 (What is this?).

It might be reasonable to make annotating sentences part of our writeup on hypertext platforms – but I’m putting it here for now. If hypertext is what deals with language artifacts on the “bulky” level (saying, for example, that a subsection is part of a section, and so on), then computational linguistics is what deals with the finer levels. However, the distinction is in some ways arbitrary, and many of the techniques should be at least vaguely similar.

Note D.2 (Annotation sensibilities).

We will want to be able to make at least two different kinds of annotations of verbs. For example, given the statement

  • S.

    (“Who” “is on” “first”),

I’d like to be able to say

  • I.

    (“is on” “means” “the position of a base runner in baseball”).

However, I’d also like to be able to say

  • II.

    (“is on” “because” “he was walked”).

Annotation I is meant to apply to the term “is on” itself (in a context that might be more general than just this one sentence). If Who is also on steroids, that’s another matter – as this type of annotation helps make clear!

Annotation II is meant to apply to the term “is on” as it appears in sentence S. In particular, Annotation II seems to work best in a context in which we’ve already accepted the ontological status of the verb-phrase “is on first”.

Whereas Annotation I should presumably exist before statement S is ever made (and it certainly helps make that statement make sense), Annotation II is most properly understood with reference to the fully-formed statement S. However, Annotation II is different from a statement like (S “has truth value” F) in that it looks into the guts of S.

Note D.3 (Comparison of places and ontological status).

The difference between (I) a “global” annotation, and (II) the annotation of a specific sentence is analogous to the difference between (a) relationships between objects without a place, and (b) relationships between objects in specific places. (Cf. Note D.2: “global” statements are of course made “local” by the theories that scope them.)

For example, in a descriptive ontology of research documents, I might make the “placeless” statement,

  • a.

    (“Introduction” “names” “a section”)

On the other hand, the statement

  • b.

    (“Introduction” “has subject” “American History”),

seems likely to be about a specific Introduction. (And somewhere in the backend, this triple should be expressed in terms of places!)

Note D.4 (Semantics).

In a sentence like

(((“I” “saw” “myself”)mid “as if” “through a glass”)beg “but” “darkly”)

first of all, there may be different parenthesizations, and second of all, the semantics of links like “as if” and “but” may shape, to some extent, the ways in which we parethesize.

Appendix E Appendix: Resource use

Note E.1 (Free culture in action).

I thought it worthwhile to include this quote from a joint paper with Aaron Krowne:3636 36 See Footnote 1.

“[F]ree content typically manifests aspects of a common resource as well as an open access resource; while anyone can do essentially whatever they wish with the content offline, in its online life, the content is managed in a socially-mediated way. In particular, rights to in situ modification tend to be strictly controlled. […] By finding new ways to support freedom of speech within CBPP documents, we embrace subjectivity as a way to enhance the content of an intersubjectively valued corpus. In the context of “hackable” media and maintenance protocols, the semantics with which scholia are handled can be improved upon indefinitely on a user-by-user basis and a resource-wide basis. This is free culture in action.”

Note E.2 (Learning).

The learner, confronted with a learning resource, or the consumer of any other information resource (or indeed, practically any resource whatsoever) may want a chance to respond to the questions “was this what you were looking for?” and “did you find this helpful?”. In some cases, an independent answer to that question could be generated (e.g. if a student is seen to come up with a correct answer, or not).

Note E.3 (Connections).

A useful communication goal is to expose some of the connections between disparate resources. Some existing connections may be far more explicit than others. It’s important to facilitate the making and explicating of connections by “third parties” (Note 6.13). The search for connections between ostensibly unrelated things is a key part of both creativity and learning. In addition, connecting with what others are doing is an important part of being a social animal.

Note E.4 (Boundaries).

Notice that the departmentalization of knowledge is similar to any regime that oversees and administers boundaries. In addition to bridging different areas, learning often involves pushing one’s boundaries and getting out of one’s comfort zone. The “sociological imagination” involves seeing oneself as part of something bigger; this goes along with the idea of a discourse that lowers or transcends the boundaries between participants. Imagination of any form can challenge myopic patterns of resource use, although there are also myopic fictions which neglect to look at what’s going on in reality!