Internet-Draft
Date: 4. 4. 04
Expires: when 1.0 is ready
©1995 - 2011
C. von Loesch



Modular Message Protocol

<draft-vonLoesch-mmp-00>
http://www.psyc.eu/mmp.html

Status of this Memo

This document describes portions of version 0.99 of the PSYC Protocol for SYnchronous Conferencing. You may consider it an Internet-Draft and as such subject to all provisions of Section 10 of RFC2026 except that the right to produce derivative works is not granted. In all matters of intellectual property rights and procedures, the intention is to benefit the Internet community and the public at large, while respecting the design decision to keep the original author of the protocol in charge. We don't like the bloat effects other efforts are experiencing. Only if you have developed an application in accordance to this specification you may call it PSYC-compliant to the version mentioned above.

Abstract

This document describes a generic extensible object-oriented message protocol whose structure and features can be negotiated and renegotiated during communication. It is designed to be easily implementable, text-based but, at the same time, low in overhead.

It can operate on top of virtual circuits like TCP, on reliable or unreliable messaging interfaces like UDP, and on top of multicast protocols of varying nature.

Introduction

In its simplest form, MMP is just a very compact non-binary messaging protocol. Its strength is vested in the modularity, the capability to be extended, and in the optional conferencing modules themselves.

Several generic message protocols are available but no standardized or established protocol exists which supports an amalgam of both unicast and multicast messaging.

In the field of unicast you can use protocols like SMTP, HTTP, XMPP, and various forms of RPC. For reliable multicast, some non-standard Multicast IP based solutions like RMP or RTP/SRM do exist. RMP is also capable of unicast UDP but it is a design choice not to be totally dependent on Multicast IP.

UDP or TCP based multicasting protocols also exist but no generic message protocol exists which can be unicasted/multicasted by conservative means such as TCP and UDP and multicasted via the MBONE using Multicast IP.

The de-facto standard for TCP-based multicasting is IRC, and there are dramatic reasons why IRC cannot do the job. That is why MMP is here to interface PSYC with varying network infrastructures.

Packet Format

    {packet} :=
	[ {modifier} ]*
		# optional variable modifiers
	[ "\n" ]
		# head to body delimiter. may be missing when the packet
		# ends without a data body, or when the sending implementation
		# was too lazy to distinguish MMP from PSYC. in that case you
		# may encounter a PSYC method. this may however never happen
		# in combination with _length. whoever sends that, implies he
		# is fully capable of distinguishing MMP and PSYC. try to be
		# relaxed on what you accept and rigid on what you transmit.  :)
		# in fact the case of leaving out the head to body delimiter
		# has proven to be unnecessary: even stupid templates can
		# contain the proper vars at the proper place - so we might
		# as well enforce the delimiter in all cases where there is
		# data.
	[ {data} "\n" ]
		# packet without data sets default variables for communication
	".\n"	# end of packet delimiter is a period in a line of its own
An MMP message is defined as the defragmentized concatenation of data.The term distinction, however, is irrelevant to you if your implementation does not provide fragments.

Variables

MMP defines the following variables:
    {mmp variable} :=
	  "_source"			# packet source entity
	| "_source_identification"	# the UNI of the source, if applicable
	| "_source_location"		# the UNL of the source, if applicable
	| "_source_relay"		# the original sender of the message
	| "_target"			# recipient object
	| "_context"			# psyc context (a group or channel)
	| "_counter"			# packet number, see below for details
	| "_length"			# length of data (be transparent)
	| "_initialize"			# initialization instruction
	| "_fragment"			# current fragment number
	| "_encoding"			# family of args for encodings like
					# "_encoding_pgp_keySize" maybe
	| "_amount_fragments"		# total number of fragments
	| "_list_using_modules"		# MMP modules being used
	| "_list_require_modules"	# modules i require you to use
	| "_list_understand_modules"	# modules you and i could be using
	| "_list_using_encoding"	# MMP encoding being used
	| "_list_require_encoding"	# encoding i require you to use
	| "_list_understand_encoding"	# encoding you and i could be using
	| "_list_using_protocols"	# switching to other protocol(s)
	| "_list_require_protocols"	# i want you to switch protocol(s)
	| "_list_understand_protocols"	# protocols i will let you switch to
	| "_trace"			# trace path of packet delivery
	| "_tag"                        # a tag that should be referred to
	|                               # upon any reply or forward
	| "_tag_relay"                  # the original tag this message is a            |                               # reply or forward of.
	|                               # should a _tag_relay require another
	|                               # relay, it will simply inherit this
	|                               # as a replacement for _tag, thus the
	|                               # _tag_relay stays the same further on,
	|                               # unless a specific new _tag was set
	| "_relay"                      # this is a reply or forward of a               |                               # given packet-id
	| {group variable}		# a variable as specified by PSYC
	| {experimental extension variable}
Variables can be set or modified within every MMP packet according to the definition of Variable Modifiers.

The logical target of a packet is defined as either the context of the packet or, if no context is defined, as the physical target(s) of the packet.

The source variable must contain one UNL, typically reduced to the path on the remote entity (whose hostname and portnumber are available through underlying transport protocols) unless the packet is coming in through a proxy.

The context variable may contain the name of a PSYC group manager object, or experimental forms of context encoding.

The targets variable may contain one or more UNLs, typically reduced to the path on the local entity (unless the local entity is a proxy), separated by the "," character.

The counter variable is encoded as hexidecimal string and increments as much as you consider necessary (for now). The counter variable is needed for packet identification.

The initialize variable is actually an instruction to clear the variable state for this communication.

Variables used by the conferencing extension PSYC are specified in their own document.

TODO: Short forms of variables will have to be picked someday soon.

Modules

The variables ending in "_modules" may contain any subset of the following:
    {module types} :=
	  "_state"		# capable of maintaining state
	| "_fragments"		# capable of sending or receiving fragments
	| "_length"		# be transparent for arbitrary binary messages
	| "_context"		# understands _context variable
	| "_counter"		# understands packet ids
				# (and sorts out duplicates)
	| "_ordered"		# passes messages to upper layer
				# in correct order
	| "_targets_multiple"	# capable of handling multiple targets
	| "_groups"		# capable of psyc group extensions
	| {experimental module}

Packet Identification

    {packet id} :=
	{source} {logical target} {counter} [ {fragment} ]
A packet is identified by the sending object in its full UNL form (that is including the hostname and port number of the sending entity) combined with the logical target of the packet being either the context or the physical target(s).

A different counter is used per source and logical target, in order to make the identifications utilizable in both unicasting and multicasting infrastructures.

If a fragment is given it is appended to the packet id.

If you are going to support multiple infrastructures, even if all of them are reliable, you MUST be capable of sorting out duplicate packets since PSYC group managers will make use of that ability.

Virtual Circuits

Virtual circuits (VCs) such as TCP give both sides the ability to safely maintain state concerning the format of communication in the circuit. Transmitting MMP packets over VCs enables you to reduce packet counting to "clock synchronization" or, if you dare, leave it out totally.

VCs also permit you to send packets of arbitrary length, for instance to transfer files, however you will want to use the _fragments module to be able to intertwine file transfer packets with conversation.

Unreliable Transmission

When using MMP over unreliable infrastructures, you MUST be able to identify and sort out packets, plus you need a packet recovery strategy when packets are missing.

Packet recovery strategies are not yet specified. They will probably turn out different according to the infrastructure being used and may need practical investigation. So, please, feel free to go ahead, investigate, and suggest something.

Here is a suggestion for a strategy: Introduce a _received_ids variable that mentions the last successfully received packet id(s), and include it with ongoing communication. If there is no outgoing message within a certain span of time, send an acknowledging packet without body.

See also other implementations of reliable multicast like SRM and RMP.

Multicasting Infrastructures

Multicast MMP, regardless of packet delivery reliability, makes it a MUST for you to implement _context. You will probably want to also implement PSYC _groups or an experimental alternative conference control extension.

Maintaining State

If neither the infrastructure you are delivering MMP packets with, nor your implementation are willing to maintain state, you may reject the state module and then both you and your counterpart will have to send the full state with every message.

You SHOULD make use of this feature only if your implementation is not likely to be too popular and thus not handle much traffic.

All MMP implementations should be able to handle requests for stateless communication.

If the network transport infrastructure does not give you the possibility of maintaining state, the method to do so yourself is by processing all packets ordered by their counter. If you find packets missing you will have to activate packet recovery strategies.

Negotiation

The _understand variable family tells the other side which abilities we can handle while the _using family tells the other side what we are doing or about to do. To switch on a protocol module, or switch to a different protocol, simply pick something from the list of the other side's understood options, and send it in a +_using. In the case of symmetric changes like protocol switches or activation of TLS you may have to refrain from sending anything after that, and wait for the other side to start the actual switch. You must however finish the current packet. All negotiations take effect after the next packet delimiter.

Encoding

The encoding list variables allow you to negotiate encoding of the entire communication stream like compression or encryption. You can place any of gzip, zip, pgp, ssl, ssh or whatever encoding you may be able to provide in the list of understood encodings. You may specify the latest version of the encoding algorythm, like this: ssh/1.2.17.

Your counterpart can then pick encoding techniques using _list_using_encoding. Order is relevant in this case: "gzip pgp" is encryption after compression, which makes sense, while "pgp gzip" is compression after encryption, which is a bad idea.

Additional information like keys and key sizes may be added with custom variables in the family of _encoding like for instance _encoding_pgp_publicKey containing a URL for the public key.

Examples

  1. Talking to a PSYC client via TCP using telnet.
    User input is highlighted in bold.
    joe@gorilla> telnet pc283.pcpool.blurb.edu 4404
    Trying ...
    Connected to pc283.pcpool.blurb.edu.
    Escape character is '^]'.
    .
    =_source        psyc://orangutan.dyndns.example.org
    =_source_identification	psyc://ve.popular.example.com/~jack/
    =_using_protocols       PSYC/0.9 TCP IP/4
    =_understand_protocols  PSYC/0.9:4404 TCP IP/4, PSYC/0.9:4404 UDP IP/4, HTTP/1.1:80
    =_understand_modules    _context, _encrypt, _fragments, _state
    =_using_modules		_context, _state
    
    =_name	Cute-User-Interface serving Jack Rübenfresser
    =_encoding              ISO-8859-1
    =_nickname		Jack
    =_description		http://www.popular.example.com/~jack/.vedl
    =_implementation	i86/ms-win3.11/CUI/1.0beta2
    _notice_circuit_established
    Connection to [_source] established.
    Protocols accepted: [_understand_protocols].
    Gateways provided: [_understand_schemes].
    .
    
    .
    m
    hi there jack
    this is joe..
    .
    
    =_mood	worried
    .
    m
    how you're doing?
    .
    :_length	44 
    
    _message_conversation
    huh? is it really you?
    .
    i
    .
    
    :_method	i
    _error_unsupported_method
    No such method '[_method]' defined here.
    .
    m
    i'm using telnet to talk to you
    cause i haven't got a psyc client here
    .
    :_length	200 
    
    _message_conversation
    Aah! That's why it says "non-authenticated connection"..
    Why don't you type " =_nickname  joe", that would look
    a lot nicer than just "@gorilla.zoo.net:31822"!!   :-)))
    .
    
    =_nickname	joe
    m
    like this?
    .
    :_length	78 
    
    :_mood	happy
    _message_conversation
    Yeah! Much better!
    Hi there joe! What's up?
    .
    
  2. Entering a PSYC conference via TCP using telnet.
    Only user input is shown. You can see the results if you fire up a telnet yourself and dump these lines into it.
    .
    =_nick		telnetPSYCer
    =_target	psyc://ve.symlynx.com/@test
    
    _request_enter
    .
    
    _message
    just wanted to say hello
    .
    
    _request_leave
    .
  3. More examples are in the Protocol Dumps document.

Extension

MMP is open to be extended with further experimental modules. For instance look at PSYC, the conference control extension to MMP.

Additionally with the _protocols negotiation facility you can always make two MMP implementations switch away to an other protocol.

Future

MMP does not support synchronization, time stamps... Feel free to experiment with extensions.

This spec does not propose algorithms for reliable delivery. Let's just say it's as reliable or unreliable as underlying protocols and by means of packet numbering you can implement your own algorithms... for now.

Security Aspects

The current implementation in psyced essentially deals with _source, _context and _target. Looks like we will always set up a _context for any group of recipients, so there is no necessity to also introduce _list_targets.

psyced implements a notion of trustworthiness, whereas data from localhost is trusted absolutely, signed data from a known host is trusted according to configuration and data from random sources is generally not trusted. Also the web of trust of hosts being trusted by trusted hosts needs consideration. While sources of high trust level are typically allowed to proxy or request relaying operation from us, lower trust level hosts need to be treated as follows.

When a packet contains no _context, you should lookup the _target in your local object list, whereas the _source MUST NOT be a local object, in fact it MUST be an object on the peer host, thus you need to ensure its UNI reflects that. Preferabily you would make sure that the host given in the _source UNI is a valid name or alias of the peer host. Should you find out, that this is not the case, you can bail out with a relaying denied message. If time does not permit you to do hostname checks, you may modify the variable to contain the hostname or ip address you are sure of, and live with potential consequences.

When an unknown host provides a _context, it must reside on that host (we have dropped the idea of _context being an opaque string, _context is a place object). Ensure this in the same way as you would do for a _source without a _context (See above).

Whenever a _context is given you may want to look if the given _source is an object on the local server (typically happens when entering a remote place) and you MAY NOT apply any modifications to it if that is not the case. The recipient(s) of the message decided to join this _context, thus trust that it will not lie to them. However it would be favourable to restrict the accepted packet types to those a _context may produce. A _context MUST NOT be able to fake a private message from random sources. Also it MUST be ensured that the recipient actually entered such a _context.

Attempts to break such rules, like invalid context or inappropriate method type for a context, MUST be brought to the attention of the server administrator, the recepient MUST NOT be disturbed by such activity and the implementation may choose to publicly blacklist the host that originated such a protocol breach.

When an entity resends a message on behalf of an other entity, the real origin of the message is put into _source_relay. Obviously it depends on the senders trustworthiness if you believe this information.

Discussion

I don't like _initialize. There must be some smarter way to do it - Maybe one could use _counter for this: a _counter of "-(something)" means the message count is resynchronized at (something) and variables are reset. After that (something) is incremented per message. Sounds a bit like a hack but still nicer than having an MMP instruction. Yet another approach would be to make MMP itself unable to reset its state, and leave that to PSYC (introduce an _initialize method).

TODO: Reply flags need to be in MMP!

Suggestions for more eloquent names for variable names are very welcome.