问题描述
我正在解析由外部 程序 生成的 xml 文件.然后我想使用我自己的命名空间向这个文件添加自定义注释.我的输入如下所示:
I am parsing an xml file generated by an external program. I would then like to add custom annotations to this file, using my own namespace. My input looks as below:
<sbml xmlns="http://www.sbml.org/sbml/level2/version4" xmlns:celldesigner="http://www.sbml.org/2001/ns/celldesigner" level="2" version="4"> <model metaid="untitled" id="untitled"> <annotation>...</annotation> <listOfUnitDefinitions>...</listOfUnitDefinitions> <listOfCompartments>...</listOfCompartments> <listOfSpecies> <species metaid="s1" id="s1" name="GenA" compartment="default" initialAmount="0"> <annotation> <celldesigner:extension>...</celldesigner:extension> </annotation> </species> <species metaid="s2" id="s2" name="s2" compartment="default" initialAmount="0"> <annotation> <celldesigner:extension>...</celldesigner:extension> </annotation> </species> </listOfSpecies> <listOfReactions>...</listOfReactions> </model> </sbml>
问题是lxml只在使用时声明命名空间,这意味着声明重复了很多次,就像这样(简化):
The issue being that lxml only declares namespaces when they are used, which means the declaration is repeated many times, like so (simplified):
<sbml xmlns="namespace" xmlns:celldesigner="morenamespace" level="2" version="4"> <listOfSpecies> <species> <kjw:test xmlns:kjw="http://this.is.some/custom_namespace"/> <celldesigner:data>Some important data which must be kept</celldesigner:data> </species> <species> <kjw:test xmlns:kjw="http://this.is.some/custom_namespace"/> </species> .... </listOfSpecies> </sbml>
是否可以强制 lxml 在父元素中仅写入一次此声明,例如 sbml 或 listOfSpecies?还是有充分的理由不这样做?我想要的结果是:
Is it possible to force lxml to write this declaration only once in a parent element, such as sbml or listOfSpecies? Or is there a good reason not to do so? The result I want would be:
<sbml xmlns="namespace" xmlns:celldesigner="morenamespace" level="2" version="4" xmlns:kjw="http://this.is.some/custom_namespace"> <listOfSpecies> <species> <kjw:test/> <celldesigner:data>Some important data which must be kept</celldesigner:data> </species> <species> <kjw:test/> </species> .... </listOfSpecies> </sbml>
重要的问题是必须保留从文件中读取的现有数据,所以我不能只创建一个新的根元素(我认为?).
The important problem is that the existing data which is read from a file must be kept, so I cannot just make a new root element (I think?).
下面附上代码.
def annotateSbml(sbml_input): from lxml import etree checkSbml(sbml_input) # Makes sure the input is valid sbml/xml. ns = "http://this.is.some/custom_namespace" etree.register_namespace('kjw', ns) sbml_doc = etree.ElementTree() root = sbml_doc.parse(sbml_input, etree.XMLParser(remove_blank_text=True)) nsmap = root.nsmap nsmap['sbml'] = nsmap[None] # Makes code more readable, but seems ugly. Any alternatives to this? nsmap['kjw'] = ns ns = '{' + ns + '}' sbmlns = '{' + nsmap['sbml'] + '}' for species in root.findall('sbml:model/sbml:listOfSpecies/sbml:species', nsmap): species.append(etree.Element(ns + 'test')) sbml_doc.write("test.sbml.xml", pretty_print=True, xml_declaration=True) return
推荐答案
lxml中无法修改节点的命名空间映射.请参阅此开放票,该票具有此功能作为愿望清单项目.
Modifying the namespace mapping of a node is not possible in lxml. See this open ticket that has this feature as a wishlist item.
它源自 这个线程lxml 邮件列表,其中 替代根节点的解决方法 是作为替代方案给出.但是替换根节点存在一些问题:请参阅上面的票证.
It originated from this thread on the lxml mailing list, where a workaround replacing the root node is given as an alternative. There are some issues with replacing the root node though: see the ticket above.
为了完整起见,我将建议的根替换解决方法代码放在这里:
I'll put the suggested root replacement workaround code here for completeness:
>>> DOC = """<sbml xmlns="http://www.sbml.org/sbml/level2/version4" xmlns:celldesigner="http://www.sbml.org/2001/ns/celldesigner" level="2" version="4"> ... <model metaid="untitled" id="untitled"> ... <annotation>...</annotation> ... <listOfUnitDefinitions>...</listOfUnitDefinitions> ... <listOfCompartments>...</listOfCompartments> ... <listOfSpecies> ... <species metaid="s1" id="s1" name="GenA" compartment="default" initialAmount="0"> ... <annotation> ... <celldesigner:extension>...</celldesigner:extension> ... </annotation> ... </species> ... <species metaid="s2" id="s2" name="s2" compartment="default" initialAmount="0"> ... <annotation> ... <celldesigner:extension>...</celldesigner:extension> ... </annotation> ... </species> ... </listOfSpecies> ... <listOfReactions>...</listOfReactions> ... </model> ... </sbml>""" >>> >>> from lxml import etree >>> from StringIO import StringIO >>> NS = "http://this.is.some/custom_namespace" >>> tree = etree.ElementTree(element=None, file=StringIO(DOC)) >>> root = tree.getroot() >>> nsmap = root.nsmap >>> nsmap['kjw'] = NS >>> new_root = etree.Element(root.tag, nsmap=nsmap) >>> new_root[:] = root[:] >>> new_root.append(etree.Element('{%s}%s' % (NS, 'test'))) >>> new_root.append(etree.Element('{%s}%s' % (NS, 'test'))) >>> print etree.tostring(new_root, pretty_print=True) <sbml xmlns:celldesigner="http://www.sbml.org/2001/ns/celldesigner" xmlns:kjw="http://this.is.some/custom_namespace" xmlns="http://www.sbml.org/sbml/level2/version4"><model metaid="untitled" id="untitled"> <annotation>...</annotation> <listOfUnitDefinitions>...</listOfUnitDefinitions> <listOfCompartments>...</listOfCompartments> <listOfSpecies> <species metaid="s1" id="s1" name="GenA" compartment="default" initialAmount="0"> <annotation> <celldesigner:extension>...</celldesigner:extension> </annotation> </species> <species metaid="s2" id="s2" name="s2" compartment="default" initialAmount="0"> <annotation> <celldesigner:extension>...</celldesigner:extension> </annotation> </species> </listOfSpecies> <listOfReactions>...</listOfReactions> </model> <kjw:test/><kjw:test/></sbml>