Internal Class Representation

Microparser

class microparser.Element(*, name, id, line_nr, parent_id)[source]

Bases: Serializer

XML Element container class.

Provides methods for xml parsing. Inherits from Serializer class. As well hierarchical assignment by parent element/numerical id is part of this class.

__init__(*, name, id, line_nr, parent_id)[source]
Parameters:
  • name (str) – xml element name (id)

  • id (int) – xml element internal processing id

  • line_nr (int) – line number of found xml opening tag in payload data

  • parent_id (int) – parent element numerical id

Variables:
  • _name (str) – xml element id

  • _id (int) – internal numerical element id

  • _parent_id (int) – internal numerical parent id (self._id reference)

  • _attributes (dict[str]) – xml tag attributes var/value pairs

  • _content (str) – tag inner content (if no sub elements found)

  • _line_start (int) – start line of found xml opening tag

  • _line_end (int) – end line of found xml closing tag

add_attribute(name, value)[source]

Add attribute/value pair to self._attributes dictionary.

Parameters:
  • name (str) – attribute name

  • value (str) – attribute value

add_content(content)[source]

Add content (value found between xml opening and closing tag) to self._content.

Parameters:

content (str) – tag value

get_attribute_by_name(name)[source]

Get attribute value by given attribute name.

Returns:

attribute value

Return type:

str

get_attributes()[source]

Get attributes.

Returns:

attributes dictionary

Return type:

dict

get_content()[source]

Get content.

Returns:

content (value found between xml opening and closing tag)

Return type:

str

get_id()[source]

Get numerical element id.

Returns:

internal numerical id

Return type:

int

get_line_end()[source]

Get line number of found end tag.

Returns:

get line number of found end tag

Return type:

int

get_line_start()[source]

Get line number of found start tag.

Returns:

get line number of found start tag

Return type:

int

get_name()[source]

Get element name.

Returns:

element xml id

Return type:

str

get_parent_element()[source]

Get parent element.

Returns:

parent element

Return type:

Element

get_parent_id()[source]

Get numerical id of elements parent.

Returns:

internal numerical parent id

Return type:

int

set_line_end(line_nr)[source]

Set line end (tag found at line nr. in payload)

Parameters:

line_nr (int) – line number

set_parent_element(element)[source]

Set parent element.

Parameters:

element (Element) – parent element

class microparser.Parser(payload)[source]

Bases: object

XML MicroParser class.

Parses simple raw XML data not including DTD or XSLT model description. This data has to be ommitted and would lead to misbehaviour when provided. Also XML namespace parsing is not supported.

The XML will be transformed to internal JSON structs which can easily be iterated over or printed out.

Currently all lines MUST end with a “\n” otherwise input will be treated as a single line and tag closings will not be recognized correctly.

See Examples section for valid input, generated output and supported features.

Processing flow:

  • Process xml input data line by line (add to internal processing list).

  • Setup Element() instances for each found element in list, add properties.

  • Add elements to Parser._elements list.

  • Run Serializer (add/link child elements in OOP based manner).

  • Easily iterate (automatically recursive) over the given result elements.

Class Inheritance/Dependencies:

  • microparser.Element->microparser.Serializer->transformer.JSONTransformer

The microparser.Serializer class provides members/methods for recursive transformation processing for different transformer module/class types.

Currently only xml transformation is provided, the transformer module is built for future expansion (add multiple formats, e.g. yaml or else).

__init__(payload)[source]
Parameters:

payload (str) – xml payload data

Variables:
  • _elements (list[Element]) – runtime xml item object storage

  • _current_element (Element) – current processed element

  • _current_line_nr (int) – current processed source line number

  • _current_item_id (int) – current processed numerical internal xml item id

Example:

>>> import microparser
>>>
>>> payload = '' \
>>>     '<tag1>\n' \
>>>     '    <tag2 a="1" b="value1">\n' \
>>>     '        <tag3 b="1" c="value2">value3</tag3>\n' \
>>>     '    </tag2>\n' \
>>>     '</tag1>\n'
>>>
>>> parser = microparser.Parser(payload)
>>>
>>> parser.build_serializer()
>>> parser.process_json()
>>>
>>> r1 = parser.get_root_element().get_json_dict()
>>> r2 = parser.get_element_by_name('tag2').get_json_dict()
>>> r3 = parser.get_element_by_id(2).get_json_dict()
>>>
>>> print(r1)
>>> print(r2)
>>> print(r3)
_add_child_elements_recursive(element)[source]

Add child elements recursive.

Parameters:

element (Element) – xlm start element

_current_item_id_gen()[source]

Current item id generator. Simple iterator. Start value 1, incremented by 1 (while True).

Return type:

Iterator[int]

_current_line_nr_gen()[source]

Current line number generator. Simple iterator. Start value 0, incremented by 1 (while True).

Return type:

Iterator[int]

_get_last_unclosed_element_id()[source]

Get last unprocessed/unclosed element id.

Returns:

last non closed element numerical id

Return type:

int or None

_parse_attributes(attributes)[source]

Parse attribute var/value keypairs from string and add to current processed element item.

Parameters:

attributes (str) – tag attributes unparsed string

_parse_line(line)[source]

Parse single xml data line.

build_serializer()[source]

Build hierarchical serializer by adding all found elements starting with root element.

dump()[source]

Send self (__repr__) to logger (debug)

get_child_element(name)[source]

Return elements children searched by element name.

Parameters:

name (string) – element tag name

Yield:

found item

Return type:

element or None

get_child_elements_by_id(id)[source]

Return elements children searched by element numerical id.

Parameters:

id (int) – element internal numerical id

Yield:

found item

Return type:

elements (list of objects) or None

get_element_by_id(id)[source]

Get element by internal numerical id.

Parameters:

id (int) – internal numerical element id

Returns:

found element

Return type:

Element or None

get_element_by_name(name)[source]

Get element by xml element id.

Parameters:

element (str) – xml tag id

Returns:

found element

Return type:

Element or None

get_elements()[source]

Return processed elements list.

Returns:

processed elements

Return type:

list[Element] self._elements

get_root_element()[source]

Return root element.

Returns:

root element

Return type:

self._elements[0]

process_json()[source]

Process json transformation.

class microparser.Serializer[source]

Bases: JSONTransformer

Serializer class.

Provides methods for element dependency handling.

__init__()[source]

Elements hierarchical connector.

__iter__()[source]

Overloaded iterator.

Will be called on ‘yield self’, iterate() method makes it recursive.

Returns:

iter(list[Element])

Return type:

iterator

_clear_child_elements()[source]

Reset self._child_elements list.

_remove_child_element(index)[source]

Remove_element from child_elements list.

Parameters:

index (int) – child element list position (index)

add_child_element(element)[source]

Append object to self._child_elements list.

Parameters:

element (Element) – append element

get_child_element_count()[source]

Return child element count.

Returns:

current child element count

Return type:

int

get_child_elements()[source]

Get child elements..

Returns:

child elements list

Return type:

list[Element]

get_element_by_element_id(element_id)[source]

Get element by element numerical id.

Returns:

found element

Return type:

Element or None

get_element_by_element_name(element_name)[source]

Get element by element numerical id.

Returns:

found element

Return type:

Element or None

iterate()[source]

Recursive iterate through hierarchical objects.

Transformer

class transformer.JSONTransformer[source]

Bases: object

JSON transformer class.

Transforms given JSON (text) data into XML format. Data in the first step will be transformed to internal JSON structs and afterwards converted (recursive) to xml.

This class will be inherited by the microparser.Serializer class which provides base members/methods for recursive transformation processing.

__init__()[source]

Builds json from serialized (connected) Element object hierarchy.

_set_json_attribute(key, value)[source]

Set single json attribute.

Parameters:
  • key (str) – attribute key

  • value (mixed) – attribute value (str or dict)

_set_json_value()[source]

Set json value.

get_json()[source]

Return json result.

Returns:

json result dictionary (json dumped)

Return type:

str

get_json_dict()[source]

Return internal json dictionary.

Returns:

json result dictionary

Return type:

dict

json_transform()[source]

Transform xml elements to python dictionary.

set_json_attributes()[source]

Set json attributes.

Helper

class helper.Looper(*, payload, function, methods=None)[source]

Bases: object

Looper Class.

Provides processing of list of input items (type should be irrelevant, currently set to type string) applied to multiple processing method references (list).

After single item has been processed by multiple methods specified in methods list (e.g. strip), it will be sent to the final processing function.

__init__(*, payload, function, methods=None)[source]

Loops over payload items. For each item:

  • applies methods given in methods list.

  • calls function reference given in function argument using item as argument.

Parameters:
  • payload (list[str]) – payload list

  • function (str) – function reference for item processing after methods processing

  • methods (list[str]) – list of methods applied to item

Variables:
  • _payload (list[str]) – list of payload items to be processed

  • _function (str) – stored function reference

  • _methods (list[str]) – list of methods applied to payload items

Example:

>>> from microparser import Looper
>>>
>>> def myfunction(payload):
>>>     print(payload)
>>>
>>> payload = 'one,two,three'
>>>
>>> args = {
>>>     'payload': payload.split(','),
>>>     'function': myfunction,
>>>     'methods': ['strip']
>>> }
>>>
>>> Looper(**args).process()
generate_methods(element)[source]

Generate methods when provided.

process()[source]

Process payload elements.

static process_methods(methods, element)[source]

Loop over methods.