XML::LibXML::Simple - XML::LibXML clone of XML::Simple::XMLin |
XML::LibXML::Simple - XML::LibXML clone of XML::Simple::XMLin()
XML::LibXML::Simple is a Exporter
my $xml = ...; # filename, fh, string, or XML::LibXML-node
Imperative:
use XML::LibXML::Simple qw(XMLin); my $data = XMLin $xml, %options;
Or the Object Oriented way:
use XML::LibXML::Simple (); my $xs = XML::LibXML::Simple->new(%options); my $data = $xs->XMLin($xml, %options);
This module is a blunt rewrite of XML::Simple (by Grant McLean) to use the XML::LibXML parser for XML structures, where the original uses plain Perl or SAX parsers.
For descriptions of the %options see the DETAILS section of this manual page.
The functions XMLin
(exported implictly) and xml_in
(exported on request) simply call <XML::LibXML::Simple-
new->XMLin()
>>
with the provided parameters.
As first parameter to XMLin() must provide the XML message to be translated into a Perl structure. Choose one of the following:
XMLin()
will look for the
file in each directory in the SearchPath (see OPTIONS below) and in the
current directory. eg:
$data = XMLin('/etc/params.xml', %options);
Note, the filename -
(dash) can be used to parse from STDIN.
XMLin()
will check the script directory and
each of the SearchPath directories for a file with the same name as the script
but with the extension '.xml'. Note: if you wish to specify options, you
must specify the value 'undef'. eg:
$data = XMLin(undef, ForceArray => 1);
$data = XMLin('<opt username="bob" password="flurp" />', %options);
$fh = IO::File->new('/etc/params.xml'); $data = XMLin($fh, %options);
XML::LibXML::Simple supports most options defined by XML::Simple, so the interface is quite compatible. Minor changes apply. This explanation is extracted from the XML::Simple manual-page.
ForceArray
because you'll almost certainly want to turn it on
make sure you know what the KeyAttr
option does and what its default
value is because it may surprise you otherwise.
Option names are case in-sensitive so you can use the mixed case versions
shown here; you can add underscores between the words (eg: key_attr)
if you like.
In alphabetic order:
XMLin('<opt one="1">Two</opt>', ContentKey => 'text')
will parse to:
{ one => 1, text => 'Two' }
instead of:
{ one => 1, content => 'Two' }
You can also prefix your selected key name with a '-' character to have
XMLin()
try a little harder to eliminate unnecessary 'content' keys after
array folding. For example:
XMLin( '<opt><item name="one">First</item><item name="two">Second</item></opt>', KeyAttr => {item => 'name'}, ForceArray => [ 'item' ], ContentKey => '-content' )
will parse to:
{ item => { one => 'First' two => 'Second' } }
rather than this (without the '-'):
{ item => { one => { content => 'First' } two => { content => 'Second' } } }
<opt> <name>value</name> </opt>
would parse to this:
{ name => [ 'value' ] }
instead of this (the default):
{ name => 'value' }
This option is especially useful if the data structure is likely to be written back out as XML and the default behaviour of rolling single nested elements up into attributes is not desirable.
If you are using the array folding feature, you should almost certainly enable this option. If you do not, single nested elements will not be parsed to arrays and therefore will not be candidates for folding to a hash. (Given that the default value of 'KeyAttr' enables array folding, the default value of this option should probably also have been enabled as well).
It is also possible to include compiled regular expressions in the list --any element names which match the pattern will be forced to arrays. If the list contains only a single regex, then it is not necessary to enclose it in an arrayref. Eg:
ForceArray => qr/_list$/
XMLin()
parses elements which have text content as well as attributes,
the text content must be represented as a hash value rather than a simple
scalar. This option allows you to force text content to always parse to
a hash value even when there are no attributes. So for example:
XMLin('<opt><x>text1</x><y a="2">text2</y></opt>', ForceContent => 1)
will parse to:
{ x => { content => 'text1' }, y => { a => 2, content => 'text2' } }
instead of:
{ x => 'text1', y => { 'a' => 2, 'content' => 'text2' } }
<opt> <searchpath> <dir>/usr/bin</dir> <dir>/usr/local/bin</dir> <dir>/usr/X11/bin</dir> </searchpath> </opt>
Would normally be read into a structure like this:
{ searchpath => { dir => [ '/usr/bin', '/usr/local/bin', '/usr/X11/bin' ] } }
But when read in with the appropriate value for 'GroupTags':
my $opt = XMLin($xml, GroupTags => { searchpath => 'dir' });
It will return this simpler structure:
{ searchpath => [ '/usr/bin', '/usr/local/bin', '/usr/X11/bin' ] }
The grouping element (<searchpath>
in the example) must not contain any
attributes or elements other than the grouped element.
You can specify multiple 'grouping element' to 'grouped element' mappings in
the same hashref. If this option is combined with KeyAttr
, the array
folding will occur first and then the grouped element names will be eliminated.
XMLin()
normally discards the root
element name. Setting the 'KeepRoot' option to '1' will cause the root element
name to be retained. So after executing this code:
$config = XMLin('<config tempdir="/tmp" />', KeepRoot => 1)
You'll be able to reference the tempdir as
$config->{config}->{tempdir}
instead of the default
$config->{tempdir}
.
For example, this XML:
<opt> <user login="grep" fullname="Gary R Epstein" /> <user login="stty" fullname="Simon T Tyson" /> </opt>
would, by default, parse to this:
{ user => [ { login => 'grep', fullname => 'Gary R Epstein' }, { login => 'stty', fullname => 'Simon T Tyson' } ] }
If the option 'KeyAttr => ``login''' were used to specify that the 'login' attribute is a key, the same XML would parse to:
{ user => { stty => { fullname => 'Simon T Tyson' }, grep => { fullname => 'Gary R Epstein' } } }
The key attribute names should be supplied in an arrayref if there is more
than one. XMLin()
will attempt to match attribute names in the order
supplied.
Note 1: The default value for 'KeyAttr' is ['name', 'key', 'id']
.
If you do not want folding on input or unfolding on output you must
setting this option to an empty list to disable the feature.
Note 2: If you wish to use this option, you should also enable the
ForceArray
option. Without 'ForceArray', a single nested element will be
rolled up into a scalar rather than an array and therefore will not be folded
(since only arrays get folded).
Two further variations are made possible by prefixing a '+' or a '-' character to the attribute name:
The option 'KeyAttr => { user => ``+login'' }' will cause this XML:
<opt> <user login="grep" fullname="Gary R Epstein" /> <user login="stty" fullname="Simon T Tyson" /> </opt>
to parse to this data structure:
{ user => { stty => { fullname => 'Simon T Tyson', login => 'stty' }, grep => { fullname => 'Gary R Epstein', login => 'grep' } } }
The '+' indicates that the value of the key attribute should be copied rather than moved to the folded hash key.
A '-' prefix would produce this result:
{ user => { stty => { fullname => 'Simon T Tyson', -login => 'stty' }, grep => { fullname => 'Gary R Epstein', -login => 'grep' } } }
XMLin()
, any attributes in the XML will be ignored.
Note: you can spell this option with a 'z' if that is more natural for you.
The internally created parser object is configured in safe mode. Read the XML::LibXML::Parser manual about security issues with certain parameter settings. The default is unsafe!
Parser
parameter.
XMLin()
a filename, but the filename include no directory
component, you can use this option to specify which directories should be
searched to locate the file. You might use this option to search first in the
user's home directory, then in a global directory such as /etc.
If a filename is provided to XMLin()
but SearchPath is not defined, the
file is assumed to be in the current directory.
If the first parameter to XMLin()
is undefined, the default SearchPath
will contain only the directory in which the script itself is located.
Otherwise the default SearchPath will be empty.
<opt> <colour value="red" /> <size value="XXL" /> </opt>
Setting ValueAttr => [ 'value' ]
will cause the above XML to parse to:
{ colour => 'red', size => 'XXL' }
instead of this (the default):
{ colour => { value => 'red' }, size => { value => 'XXL' } }
NsExpand
option. The downside, however, is
that the labels get very long.
Without this option:
<record xmlns:x="http://xyz"> <x:field1>42</x:field1> </record> <record xmlns:y="http://xyz"> <y:field1>42</y:field1> </record>
translates into
{ 'x:field1' => 42 } { 'y:field1' => 42 }
but both source component have exactly the same meaning. When NsExpand
is used, the result is:
{ '{http://xyz}field1' => 42 } { '{http://xyz}field1' => 42 }
Of course, addressing these fields is more work. It is advised to implement it like this:
my $ns = 'http://xyz'; $data->{"{$ns}field1"};
NsExpand
. To do
it sloppy, use NsStrip
. With this option set, the above example will
return
{ field1 => 42 } { field1 => 42 }
When XMLin()
reads the following very simple piece of XML:
<opt username="testuser" password="frodo"></opt>
it returns the following data structure:
{ username => 'testuser', password => 'frodo' }
The identical result could have been produced with this alternative XML:
<opt username="testuser" password="frodo" />
Or this (although see 'ForceArray' option for variations):
<opt> <username>testuser</username> <password>frodo</password> </opt>
Repeated nested elements are represented as anonymous arrays:
<opt> <person firstname="Joe" lastname="Smith"> <email>joe@smith.com</email> <email>jsmith@yahoo.com</email> </person> <person firstname="Bob" lastname="Smith"> <email>bob@smith.com</email> </person> </opt>
{ person => [ { email => [ 'joe@smith.com', 'jsmith@yahoo.com' ], firstname => 'Joe', lastname => 'Smith' }, { email => 'bob@smith.com', firstname => 'Bob', lastname => 'Smith' } ] }
Nested elements with a recognised key attribute are transformed (folded) from
an array into a hash keyed on the value of that attribute (see the KeyAttr
option):
<opt> <person key="jsmith" firstname="Joe" lastname="Smith" /> <person key="tsmith" firstname="Tom" lastname="Smith" /> <person key="jbloggs" firstname="Joe" lastname="Bloggs" /> </opt>
{ person => { jbloggs => { firstname => 'Joe', lastname => 'Bloggs' }, tsmith => { firstname => 'Tom', lastname => 'Smith' }, jsmith => { firstname => 'Joe', lastname => 'Smith' } } }
The <anon> tag can be used to form anonymous arrays:
<opt> <head><anon>Col 1</anon><anon>Col 2</anon><anon>Col 3</anon></head> <data><anon>R1C1</anon><anon>R1C2</anon><anon>R1C3</anon></data> <data><anon>R2C1</anon><anon>R2C2</anon><anon>R2C3</anon></data> <data><anon>R3C1</anon><anon>R3C2</anon><anon>R3C3</anon></data> </opt>
{ head => [ [ 'Col 1', 'Col 2', 'Col 3' ] ], data => [ [ 'R1C1', 'R1C2', 'R1C3' ], [ 'R2C1', 'R2C2', 'R2C3' ], [ 'R3C1', 'R3C2', 'R3C3' ] ] }
Anonymous arrays can be nested to arbirtrary levels and as a special case, if the surrounding tags for an XML document contain only an anonymous array the arrayref will be returned directly rather than the usual hashref:
<opt> <anon><anon>Col 1</anon><anon>Col 2</anon></anon> <anon><anon>R1C1</anon><anon>R1C2</anon></anon> <anon><anon>R2C1</anon><anon>R2C2</anon></anon> </opt>
[ [ 'Col 1', 'Col 2' ], [ 'R1C1', 'R1C2' ], [ 'R2C1', 'R2C2' ] ]
Elements which only contain text content will simply be represented as a
scalar. Where an element has both attributes and text content, the element
will be represented as a hashref with the text content in the 'content' key
(see the ContentKey
option):
<opt> <one>first</one> <two attr="value">second</two> </opt>
{ one => 'first', two => { attr => 'value', content => 'second' } }
Mixed content (elements which contain both text content and nested elements) will be not be represented in a useful way - element order and significant whitespace will be lost. If you need to work with mixed content, then XML::Simple is not the right tool for your job - check out the next section.
In general, the output and the options are equivalent, although this module has some differences with XML::Simple to be aware of.
XMLout()
as implemented by
XML::Simple or any of a zillion template systems.
forcearray
option,
because XML::Simple seems to behave inconsequently.
the XML::Compile manpage for processing XML when a schema is available. When you have a schema, the data and structure of your message get validated.
the XML::Simple manpage, the original implementation which interface is followed as closely as possible.
The interface design and large parts of the documentation were taken from the the XML::Simple manpage module, written by Grant McLean <grantm@cpan.org>
Copyrights of the perl code and the related documentation by 2008-2014 by [Mark Overmeer]. For other contributors see ChangeLog.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See http://www.perl.com/perl/misc/Artistic.html
XML::LibXML::Simple - XML::LibXML clone of XML::Simple::XMLin |