Lumas - Language for Universal Message Abstraction and Specification
use Lumas;
$in_message = "123 a = b";
get_next( $in_message, $out_type, $out_value );
skip_value( $in_message );
get_named( $in_message, '=1a', $out_type, $out_value );
get_named_nm( $in_message, 'a', $out_type, $out_value );
get_named_cm( $in_message, '1a', $out_type, $out_value );
Lumas
allows for the reading of Lumas encoded messages. For more on Lumas and its
encoding format, see:
http://www.tech-know-ware.com/lumas
These functions provide the user interface to Lumas
.
On each call to get_next the next message token is retrieved. This allows the message to be parsed in a method akin to XML's SAX function. However, it is the responsibility of the user's parsing code to call get_next rather than the user's code being called by the library which is the case for SAX. This approach allows the context of the read to be easily represented by the function from which the get_next is read.
IN_MESSAGE
is the message to be parsed. The module uses progressive matching, so once
set, the string used for IN_MESSAGE should not have user regular
expressions applied to it. In some circumstances it may be appropriate to
make a copy of the message, and use that when calling get_next.
When the function is complete, the variable OUT_TYPE
will be set to one of the LMS_XXXXX
constants and the variable OUT_VALUE
will contain the value read.
The return value is 1
for success and 0
for failure.
get_named allows a named parameter from a message to be read. The parameters
IN_MESSAGE
, OUT_TYPE
and OUT_VALUE
have the same meanings as for
get_next.
IN_PATH
specifies the parameter to be read in a similar way to how a file is
specified within a directory hierachy. Each level in the hierarchy is made
up of 3 separate and optional parts, the untagged count, the text tag and the instance. The untagged count specifies either 0 based count of the untagged parameter that is being
sought, or the total number of untagged values if a text tag is specified. (Note: untagged count can be omitted from the specification if it is known that the parser can
not confuse an untagged parameter with a sought tagged parameter.) text tag specifies any text tag that is to be looked for. instance specifies the instance of the parameter that is to be looked for, starting
at 0. For example, in there are 3 occurances of a particular value in a
message section, the 2nd instance can be referenced by setting the
instance part of the specification to 1.
The parts are combined into a single string, using the format '<untagged count><text tag>,<instance>'. For example '2mine,1', skips 2 untagged values and then looks for the 2nd (1th) instance of the 'mine' parameter.
Each level of the hierarchy is separated by an = sign. E.g. '2mine,1=3', which gets the 4th untagged parameter within the '2mine,1' struct.
If the path starts with an '=', (e.g. '=2mine,1=3') the path is relative to the start of the message, otherwise it is relative to the match operation just ended.
get_named_nm (nm = no move) is similar to get_named except that it records the location in the string before an attempt is made to locate the specified parameter, and returns the that position once the location operation is complete.
This allows you to locate the start of a struct using get_named and then do searches relative to the start of the struct using get_named_nm after that. This will be more efficient, and probably easier to code.
get_named_cm (cm = conditional move) is similar to get_named_nm except that it will return to the stored location only if the parameter in NOT found. If the parameter is found, it will leave the string position at the location where the search ended. This allows you to conditionally move into a struct depending on whether it is present or not.
You may also use Perl's pos( IN_MESSAGE ) to record positions in the message during a parsing operation.
skip_value skips the next value, including whole structs and embedded items. When using get_next to parse a message it can be used to skip over items that you are not interested in.
N.B. You must not call skip_value for void types.
An unquoted-ascii value might look like a number of other types, such as
integers, dates etc. If the unquoted ascii value looks like one of the
other types, the tokeniser will label it as the other type. If you know you
are looking for an unquoted ascii value that may look like one of the other
types, call
is_uascii with the type returned from the parser. It will return 1
if the type could actually be unquoted ascii, and 0
otherwise.
A float value might look like a regular integer. If the float value looks
like an integer the tokeniser will label it as such. If you know you are
looking for a float value that may look like an integer, call is_float with the type returned from the parser. It will return 1
if the type could actually be float, and 0
otherwise.
Lumas
exports a number of constant functions of the form LMS_XXXXX
that are used to identify the read types. These are:
LMS_ERROR, LMS_FINISHED, LMS_TAG, LMS_LIST_TAG, LMS_VOID_CONST_UASCII, LMS_BOOL, LMS_INT, LMS_DATE, LMS_TIME, LMS_OID, LMS_IPV4, LMS_IPV6, LMS_ASCII, LMS_UNICODE, LMS_BYTES, LMS_STRUCT_START, LMS_STRUCT_END, LMS_EMBEDDED, LMS_NOTHING.
Fixed in Version 0.7:
(When a ascii or unicode string ends in something like \\' the tokeniser
will only grab \' and decide that that is not the end of the string. Really
need something that says if there's an even number of \ before a ', then
it's the end of the string, whereas, if there's an odd number, it's not the
end of the string.)
Pete Cordell <lumas@tech-know-ware.com>
Copyright (c) 2003 Tech-Know-Ware Ltd. All rights reserved.