NAME

Lumas - Language for Universal Message Abstraction and Specification

SYNOPSIS

    use Lumas;

        $in_message = "123 a = b";

        get_next( $in_message, $out_type, $out_value );

        skip_value( $in_message );

        get_named( $in_message, '=1a', $out_type, $out_value );

        get_named_nm( $in_message, 'a', $out_type, $out_value );

        get_named_cm( $in_message, '1a', $out_type, $out_value );

DESCRIPTION

Lumas allows for the reading of Lumas encoded messages. For more on Lumas and its encoding format, see:

        http://www.tech-know-ware.com/lumas

USER FUNCTIONS

These functions provide the user interface to Lumas.

get_next ( IN_MESSAGE, OUT_TYPE, OUT_VALUE )

On each call to get_next the next message token is retrieved. This allows the message to be parsed in a method akin to XML's SAX function. However, it is the responsibility of the user's parsing code to call get_next rather than the user's code being called by the library which is the case for SAX. This approach allows the context of the read to be easily represented by the function from which the get_next is read.

IN_MESSAGE is the message to be parsed. The module uses progressive matching, so once set, the string used for IN_MESSAGE should not have user regular expressions applied to it. In some circumstances it may be appropriate to make a copy of the message, and use that when calling get_next.

When the function is complete, the variable OUT_TYPE will be set to one of the LMS_XXXXX constants and the variable OUT_VALUE will contain the value read.

The return value is 1 for success and 0 for failure.

get_named ( IN_MESSAGE, IN_PATH, OUT_TYPE, OUT_VALUE )

get_named allows a named parameter from a message to be read. The parameters IN_MESSAGE, OUT_TYPE and OUT_VALUE have the same meanings as for get_next.

IN_PATH specifies the parameter to be read in a similar way to how a file is specified within a directory hierachy. Each level in the hierarchy is made up of 3 separate and optional parts, the untagged count, the text tag and the instance. The untagged count specifies either 0 based count of the untagged parameter that is being sought, or the total number of untagged values if a text tag is specified. (Note: untagged count can be omitted from the specification if it is known that the parser can not confuse an untagged parameter with a sought tagged parameter.) text tag specifies any text tag that is to be looked for. instance specifies the instance of the parameter that is to be looked for, starting at 0. For example, in there are 3 occurances of a particular value in a message section, the 2nd instance can be referenced by setting the instance part of the specification to 1.

The parts are combined into a single string, using the format '<untagged count><text tag>,<instance>'. For example '2mine,1', skips 2 untagged values and then looks for the 2nd (1th) instance of the 'mine' parameter.

Each level of the hierarchy is separated by an = sign. E.g. '2mine,1=3', which gets the 4th untagged parameter within the '2mine,1' struct.

If the path starts with an '=', (e.g. '=2mine,1=3') the path is relative to the start of the message, otherwise it is relative to the match operation just ended.

get_named_nm ( IN_MESSAGE, IN_PATH, OUT_TYPE, OUT_VALUE )

get_named_nm (nm = no move) is similar to get_named except that it records the location in the string before an attempt is made to locate the specified parameter, and returns the that position once the location operation is complete.

This allows you to locate the start of a struct using get_named and then do searches relative to the start of the struct using get_named_nm after that. This will be more efficient, and probably easier to code.

get_named_cm ( IN_MESSAGE, IN_PATH, OUT_TYPE, OUT_VALUE )

get_named_cm (cm = conditional move) is similar to get_named_nm except that it will return to the stored location only if the parameter in NOT found. If the parameter is found, it will leave the string position at the location where the search ended. This allows you to conditionally move into a struct depending on whether it is present or not.

You may also use Perl's pos( IN_MESSAGE ) to record positions in the message during a parsing operation.

skip_value ( IN_MESSAGE )

skip_value skips the next value, including whole structs and embedded items. When using get_next to parse a message it can be used to skip over items that you are not interested in.

N.B. You must not call skip_value for void types.

is_uascii( IN_TYPE )

An unquoted-ascii value might look like a number of other types, such as integers, dates etc. If the unquoted ascii value looks like one of the other types, the tokeniser will label it as the other type. If you know you are looking for an unquoted ascii value that may look like one of the other types, call is_uascii with the type returned from the parser. It will return 1 if the type could actually be unquoted ascii, and 0 otherwise.

is_float( IN_TYPE )

A float value might look like a regular integer. If the float value looks like an integer the tokeniser will label it as such. If you know you are looking for a float value that may look like an integer, call is_float with the type returned from the parser. It will return 1 if the type could actually be float, and 0 otherwise.

EXPORTS

Lumas exports a number of constant functions of the form LMS_XXXXX that are used to identify the read types. These are:

LMS_ERROR, LMS_FINISHED, LMS_TAG, LMS_LIST_TAG, LMS_VOID_CONST_UASCII, LMS_BOOL, LMS_INT, LMS_DATE, LMS_TIME, LMS_OID, LMS_IPV4, LMS_IPV6, LMS_ASCII, LMS_UNICODE, LMS_BYTES, LMS_STRUCT_START, LMS_STRUCT_END, LMS_EMBEDDED, LMS_NOTHING.

BUGS

Fixed in Version 0.7:
(When a ascii or unicode string ends in something like \\' the tokeniser will only grab \' and decide that that is not the end of the string. Really need something that says if there's an even number of \ before a ', then it's the end of the string, whereas, if there's an odd number, it's not the end of the string.)

AUTHOR

Pete Cordell <lumas@tech-know-ware.com>