Internal Class Representation
Microparser
- class microparser.Element(*, name, id, line_nr, parent_id)[source]
Bases:
Serializer
XML Element container class.
Provides methods for xml parsing. Inherits from Serializer class. As well hierarchical assignment by parent element/numerical id is part of this class.
- __init__(*, name, id, line_nr, parent_id)[source]
- Parameters:
name (str) – xml element name (id)
id (int) – xml element internal processing id
line_nr (int) – line number of found xml opening tag in payload data
parent_id (int) – parent element numerical id
- Variables:
_name (str) – xml element id
_id (int) – internal numerical element id
_parent_id (int) – internal numerical parent id (self._id reference)
_attributes (dict[str]) – xml tag attributes var/value pairs
_content (str) – tag inner content (if no sub elements found)
_line_start (int) – start line of found xml opening tag
_line_end (int) – end line of found xml closing tag
- add_attribute(name, value)[source]
Add attribute/value pair to self._attributes dictionary.
- Parameters:
name (str) – attribute name
value (str) – attribute value
- add_content(content)[source]
Add content (value found between xml opening and closing tag) to self._content.
- Parameters:
content (str) – tag value
- get_attribute_by_name(name)[source]
Get attribute value by given attribute name.
- Returns:
attribute value
- Return type:
str
- get_content()[source]
Get content.
- Returns:
content (value found between xml opening and closing tag)
- Return type:
str
- get_line_end()[source]
Get line number of found end tag.
- Returns:
get line number of found end tag
- Return type:
int
- get_line_start()[source]
Get line number of found start tag.
- Returns:
get line number of found start tag
- Return type:
int
- get_parent_id()[source]
Get numerical id of elements parent.
- Returns:
internal numerical parent id
- Return type:
int
- class microparser.Parser(payload)[source]
Bases:
object
XML MicroParser class.
Parses simple raw XML data not including DTD or XSLT model description. This data has to be ommitted and would lead to misbehaviour when provided. Also XML namespace parsing is not supported.
The XML will be transformed to internal JSON structs which can easily be iterated over or printed out.
Currently all lines MUST end with a “\n” otherwise input will be treated as a single line and tag closings will not be recognized correctly.
See Examples section for valid input, generated output and supported features.
Processing flow:
Process xml input data line by line (add to internal processing list).
Setup Element() instances for each found element in list, add properties.
Add elements to Parser._elements list.
Run Serializer (add/link child elements in OOP based manner).
Easily iterate (automatically recursive) over the given result elements.
Class Inheritance/Dependencies:
microparser.Element->microparser.Serializer->transformer.JSONTransformer
The microparser.Serializer class provides members/methods for recursive transformation processing for different transformer module/class types.
Currently only xml transformation is provided, the transformer module is built for future expansion (add multiple formats, e.g. yaml or else).
- __init__(payload)[source]
- Parameters:
payload (str) – xml payload data
- Variables:
- Example:
>>> import microparser >>> >>> payload = '' \ >>> '<tag1>\n' \ >>> ' <tag2 a="1" b="value1">\n' \ >>> ' <tag3 b="1" c="value2">value3</tag3>\n' \ >>> ' </tag2>\n' \ >>> '</tag1>\n' >>> >>> parser = microparser.Parser(payload) >>> >>> parser.build_serializer() >>> parser.process_json() >>> >>> r1 = parser.get_root_element().get_json_dict() >>> r2 = parser.get_element_by_name('tag2').get_json_dict() >>> r3 = parser.get_element_by_id(2).get_json_dict() >>> >>> print(r1) >>> print(r2) >>> print(r3)
- _add_child_elements_recursive(element)[source]
Add child elements recursive.
- Parameters:
element (Element) – xlm start element
- _current_item_id_gen()[source]
Current item id generator. Simple iterator. Start value 1, incremented by 1 (while True).
- Return type:
Iterator[int]
- _current_line_nr_gen()[source]
Current line number generator. Simple iterator. Start value 0, incremented by 1 (while True).
- Return type:
Iterator[int]
- _get_last_unclosed_element_id()[source]
Get last unprocessed/unclosed element id.
- Returns:
last non closed element numerical id
- Return type:
int or None
- _parse_attributes(attributes)[source]
Parse attribute var/value keypairs from string and add to current processed element item.
- Parameters:
attributes (str) – tag attributes unparsed string
- build_serializer()[source]
Build hierarchical serializer by adding all found elements starting with root element.
- get_child_element(name)[source]
Return elements children searched by element name.
- Parameters:
name (string) – element tag name
- Yield:
found item
- Return type:
element or None
- get_child_elements_by_id(id)[source]
Return elements children searched by element numerical id.
- Parameters:
id (int) – element internal numerical id
- Yield:
found item
- Return type:
elements (list of objects) or None
- get_element_by_id(id)[source]
Get element by internal numerical id.
- Parameters:
id (int) – internal numerical element id
- Returns:
found element
- Return type:
Element or None
- get_element_by_name(name)[source]
Get element by xml element id.
- Parameters:
element (str) – xml tag id
- Returns:
found element
- Return type:
Element or None
- get_elements()[source]
Return processed elements list.
- Returns:
processed elements
- Return type:
list[Element] self._elements
- class microparser.Serializer[source]
Bases:
JSONTransformer
Serializer class.
Provides methods for element dependency handling.
- __iter__()[source]
Overloaded iterator.
Will be called on ‘yield self’, iterate() method makes it recursive.
- Returns:
iter(list[Element])
- Return type:
iterator
- _remove_child_element(index)[source]
Remove_element from child_elements list.
- Parameters:
index (int) – child element list position (index)
- add_child_element(element)[source]
Append object to self._child_elements list.
- Parameters:
element (Element) – append element
- get_child_element_count()[source]
Return child element count.
- Returns:
current child element count
- Return type:
int
- get_child_elements()[source]
Get child elements..
- Returns:
child elements list
- Return type:
list[Element]
- get_element_by_element_id(element_id)[source]
Get element by element numerical id.
- Returns:
found element
- Return type:
Element or None
Transformer
- class transformer.JSONTransformer[source]
Bases:
object
JSON transformer class.
Transforms given JSON (text) data into XML format. Data in the first step will be transformed to internal JSON structs and afterwards converted (recursive) to xml.
This class will be inherited by the microparser.Serializer class which provides base members/methods for recursive transformation processing.
- _set_json_attribute(key, value)[source]
Set single json attribute.
- Parameters:
key (str) – attribute key
value (mixed) – attribute value (str or dict)
- get_json()[source]
Return json result.
- Returns:
json result dictionary (json dumped)
- Return type:
str
Helper
- class helper.Looper(*, payload, function, methods=None)[source]
Bases:
object
Looper Class.
Provides processing of list of input items (type should be irrelevant, currently set to type string) applied to multiple processing method references (list).
After single item has been processed by multiple methods specified in methods list (e.g. strip), it will be sent to the final processing function.
- __init__(*, payload, function, methods=None)[source]
Loops over payload items. For each item:
applies methods given in methods list.
calls function reference given in function argument using item as argument.
- Parameters:
payload (list[str]) – payload list
function (str) – function reference for item processing after methods processing
methods (list[str]) – list of methods applied to item
- Variables:
_payload (list[str]) – list of payload items to be processed
_function (str) – stored function reference
_methods (list[str]) – list of methods applied to payload items
- Example:
>>> from microparser import Looper >>> >>> def myfunction(payload): >>> print(payload) >>> >>> payload = 'one,two,three' >>> >>> args = { >>> 'payload': payload.split(','), >>> 'function': myfunction, >>> 'methods': ['strip'] >>> } >>> >>> Looper(**args).process()