-
Notifications
You must be signed in to change notification settings - Fork 67
Open
Description
Using
Python 3.11.7
pandasdmx 1.10.0
I am getting an XMLParseError while attempting to get data using a dictionary key from "ABS_XML".
import pandasdmx as sdmx
abs_xml = sdmx.Request("ABS_XML")
resp = abs_xml.data('ABS_ANNUAL_ERP_LGA2022',
key = dict(SEX_ABS='1'),
params = dict(startPeriod='2021'))Traceback
[c:\Users\timot\anaconda3\envs\SDMX\Lib\site-packages\pandasdmx\remote.py:11](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/remote.py:11): RuntimeWarning: optional dependency requests_cache is not installed; cache options to Session() have no effect
warn(
--- SS without DSD ---
{1: False}
--- <class 'pandasdmx.message.StructureMessage'> ---
{2: <pandasdmx.StructureMessage>
<Header>
id: 'IDREF59600'
prepared: '2024-02-05T17:21:23.770127+11:00'
receiver: <Agency Unknown>
sender: <Agency Unknown>
source:
test: False}
--- <class 'pandasdmx.model.DataStructureDefinition'> ---
{'ABS_ANNUAL_ERP_LGA2022': <DataStructureDefinition ABS:ABS_ANNUAL_ERP_LGA2022(1.2.0): ERP by LGA (2022), Age and Sex, 2001 to 2022>}
--- <class 'pandasdmx.model.Agency'> ---
{'ABS': <Agency ABS>}
--- <class 'pandasdmx.model.DataflowDefinition'> ---
{'ABS_ANNUAL_ERP_LGA2022': <DataflowDefinition ABS:ABS_ANNUAL_ERP_LGA2022(1.2.0): ERP by LGA (2022), Age and Sex, 2001 to 2022>}
--- <class 'pandasdmx.model.CategoryScheme'> ---
{87: <CategoryScheme ABS:PEOPLE(1.0.0) (13 items): People>, 88: <CategoryScheme ABS:PEOPLE(1.0.0) (1 items)>}
--- <class 'pandasdmx.model.Categorisation'> ---
{'CAT_ANNUAL_ERP_LGA2022': <Categorisation ABS:CAT_ANNUAL_ERP_LGA2022(1.2.0): ERP by LGA (2022), Age and Sex, 2001 to 2022>}
--- <class 'pandasdmx.model.Codelist'> ---
{'CL_AGE': <Codelist ABS:CL_AGE(1.0.0) (194 items): Age>, 'CL_ERP': <Codelist ABS:CL_ERP(1.0.0) (1 items): Measure>, 'CL_FREQ': <Codelist ABS:CL_FREQ(1.0.0) (9 items): Frequency>, 'CL_LGA_2022': <Codelist ABS:CL_LGA_2022(1.0.0) (578 items): Local Government Areas - 2022>, 'CL_OBS_STATUS': <Codelist ABS:CL_OBS_STATUS(1.0.0) (16 items): Observation Status>, 'CL_REGION_TYPE': <Codelist ABS:CL_REGION_TYPE(1.0.0) (43 items): Region Type>, 'CL_SEX': <Codelist ABS:CL_SEX(1.0.0) (3 items): Sex>, 'CL_UNIT_MEASURE': <Codelist ABS:CL_UNIT_MEASURE(1.0.0) (88 items): Unit of Measure>}
--- <class 'pandasdmx.model.ConceptScheme'> ---
{11693: <ConceptScheme ABS:CS_COMMON(1.0.0) (5 items): Common Concepts>, 11694: <ConceptScheme ABS:CS_COMMON(1.0.0) (1 items)>, 11700: <ConceptScheme ABS:CS_DEMOG(1.0.0) (25 items): Demographic Concepts>, 11701: <ConceptScheme ABS:CS_DEMOG(1.0.0) (1 items)>, 'CS_DEMOG': <ConceptScheme ABS:CS_DEMOG(1.0.0) (1 items)>, 11712: <ConceptScheme ABS:CS_GEOGRAPHY(1.0.0) (25 items): Geography Concepts>, 11713: <ConceptScheme ABS:CS_GEOGRAPHY(1.0.0) (1 items)>, 'CS_GEOGRAPHY': <ConceptScheme ABS:CS_GEOGRAPHY(1.0.0) (1 items)>, 'CS_COMMON': <ConceptScheme ABS:CS_COMMON(1.0.0) (3 items)>, 11734: <ConceptScheme ABS:CS_ATTRIBUTE(1.0.0) (6 items): Attribute Concepts>, 11735: <ConceptScheme ABS:CS_ATTRIBUTE(1.0.0) (1 items)>, 'CS_ATTRIBUTE': <ConceptScheme ABS:CS_ATTRIBUTE(1.0.0) (2 items)>}
--- <class 'pandasdmx.model.Annotation'> ---
{'obs_count': Annotation(id='obs_count', title='698478', type='sdmx_metrics', url=None, text=), 11758: Annotation(id=None, title='A', type='ReleaseVersion', url=None, text=)}
--- Name ---
{11759: ('en', 'Availability (A) for ABS_ANNUAL_ERP_LGA2022')}
--- <class 'pandasdmx.reader.sdmxml.Reference'> ---
{'ABS_ANNUAL_ERP_LGA2022': <pandasdmx.reader.sdmxml.Reference object at 0x0000023E19BFFF50>}
--- <class 'pandasdmx.model.MemberSelection'> ---
{11762: <MemberSelection MEASURE in {'ERP'}>, 11766: <MemberSelection SEX_ABS in {'1', '2', '3'}>, 11786: <MemberSelection AGE in {'8599', 'A04', 'A10', 'A15', 'A20', 'A25', 'A30', 'A35', 'A40', 'A45', 'A50', 'A55', 'A59', 'A60', 'A65', 'A70', 'A75', 'A80', 'TOT'}>, 12344: <MemberSelection LGA_2022 in {'1', '10050', '10180', '10250', '10300', '10470', '10500', '10550', '10600', '10650', '10750', '10800', '10850', '10900', '10950', '11150', '11200', '11250', '11300', '11350', '11400', '11450', '11500', '11520', '11570', '11600', '11650', '11700', '11720', '11730', '11750', '11800', '12000', '12150', '12160', '12350', '12380', '12390', '12700', '12730', '12750', '12850', '12870', '12900', '12930', '12950', '13010', '13310', '13340', '13450', '13550', '13660', '13800', '13850', '13910', '14000', '14100', '14170', '14220', '14300', '14350', '14400', '14500', '14550', '14600', '14650', '14700', '14750', '14850', '14870', '14900', '14920', '14950', '15050', '15240', '15270', '15300', '15350', '15520', '15560', '15650', '15700', '15750', '15800', '15850', '15900', '15950', '15990', '16100', '16150', '16200', '16260', '16350', '16380', '16400', '16490', '16550', '16610', '16700', '16900', '16950', '17000', '17040', '17080', '17100', '17150', '17200', '17310', '17350', '17400', '17420', '17550', '17620', '17640', '17650', '17750', '17850', '17900', '17950', '18020', '18050', '18100', '18200', '18250', '18350', '18400', '18450', '18500', '18710', '19399', '2', '20110', '20260', '20570', '20660', '20740', '20830', '20910', '21010', '21110', '21180', '21270', '21370', '21450', '21610', '21670', '21750', '21830', '21890', '22110', '22170', '22250', '22310', '22410', '22490', '22620', '22670', '22750', '22830', '22910', '22980', '23110', '23190', '23270', '23350', '23430', '23670', '23810', '23940', '24130', '24210', '24250', '24330', '24410', '24600', '24650', '24780', '24850', '24900', '24970', '25060', '25150', '25250', '25340', '25430', '25490', '25620', '25710', '25810', '25900', '25990', '26080', '26170', '26260', '26350', '26430', '26490', '26610', '26670', '26700', '26730', '26810', '26890', '26980', '27070', '27170', '27260', '27350', '27450', '27630', '29399', '3', '30250', '30300', '30370', '30410', '30450', '30760', '30900', '31000', '31750', '31820', '31900', '31950', '32080', '32250', '32260', '32270', '32310', '32330', '32450', '32500', '32600', '32750', '32770', '32810', '33100', '33200', '33220', '33360', '33430', '33610', '33620', '33800', '33830', '33960', '33980', '34420', '34530', '34570', '34580', '34590', '34710', '34770', '34800', '34830', '34860', '34880', '35010', '35250', '35300', '35600', '35670', '35740', '35760', '35780', '35790', '35800', '36070', '36150', '36250', '36300', '36370', '36510', '36580', '36630', '36660', '36720', '36820', '36910', '36950', '36960', '37010', '37300', '37310', '37340', '37400', '37550', '37570', '37600', '4', '40070', '40120', '40150', '40220', '40250', '40310', '40430', '40520', '40700', '40910', '41010', '41060', '41140', '41190', '41330', '41560', '41750', '41830', '41960', '42030', '42110', '42250', '42600', '42750', '43080', '43220', '43360', '43650', '43710', '43790', '44000', '44060', '44210', '44340', '44550', '44620', '44830', '45040', '45090', '45120', '45290', '45340', '45400', '45540', '45680', '45890', '46090', '46300', '46450', '46510', '46670', '46860', '46970', '47140', '47290', '47490', '47630', '47700', '47800', '47910', '47980', '48050', '48130', '48260', '48340', '48410', '48540', '48640', '48750', '48830', '49399', '5', '50080', '50210', '50250', '50280', '50350', '50420', '50490', '50560', '50630', '50770', '50840', '50910', '50980', '51080', '51120', '51190', '51260', '51310', '51330', '51400', '51470', '51540', '51610', '51680', '51710', '51750', '51820', '51860', '51890', '51960', '52030', '52100', '52170', '52240', '52310', '52380', '52450', '52520', '52590', '52660', '52730', '52800', '52870', '52940', '53010', '53080', '53150', '53220', '53290', '53360', '53430', '53570', '53640', '53710', '53780', '53800', '53920', '53990', '54060', '54130', '54170', '54200', '54280', '54310', '54340', '54410', '54480', '54550', '54620', '54690', '54760', '54830', '54900', '54970', '55040', '55110', '55180', '55250', '55320', '55390', '55460', '55530', '55600', '55670', '55740', '55810', '55880', '55950', '56090', '56160', '56230', '56300', '56370', '56460', '56580', '56620', '56730', '56790', '56860', '56930', '57000', '57080', '57140', '57210', '57280', '57350', '57420', '57490', '57630', '57700', '57770', '57840', '57910', '57980', '58050', '58190', '58260', '58330', '58400', '58470', '58510', '58540', '58570', '58610', '58680', '58760', '58820', '58890', '59030', '59100', '59170', '59250', '59310', '59320', '59330', '59340', '59350', '59360', '59370', '6', '60210', '60410', '60610', '60810', '61010', '61210', '61410', '61510', '61610', '61810', '62010', '62210', '62410', '62610', '62810', '63010', '63210', '63410', '63610', '63810', '64010', '64210', '64610', '64810', '65010', '65210', '65410', '65610', '65810', '7', '70200', '70420', '70540', '70620', '70700', '71000', '71150', '71300', '72200', '72300', '72330', '72800', '73600', '74050', '74550', '74560', '74660', '74680', '79399', '8', '89399', '9', '99399', 'AUS'}>, 12348: <MemberSelection REGION_TYPE in {'AUS', 'LGA2022', 'STE'}>, 12350: <MemberSelection FREQUENCY in {'A'}>}
--- <class 'pandasdmx.model.RangePeriod'> ---
{12353: RangePeriod(start=Period(is_inclusive=True, period=datetime.datetime(2001, 1, 1, 0, 0)), end=Period(is_inclusive=True, period=datetime.datetime(2022, 12, 31, 0, 0)))}
<common:KeyValue xmlns:common="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/common" xmlns:message="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/message" xmlns:structure="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/structure" id="TIME_PERIOD">
<common:TimeRange/></common:KeyValue>
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
File [c:\Users\timot\anaconda3\envs\SDMX\Lib\site-packages\pandasdmx\reader\sdmxml.py:299](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/reader/sdmxml.py:299), in Reader.read_message(self, source, dsd)
[297](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/reader/sdmxml.py:297) try:
[298](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/reader/sdmxml.py:298) # Parse the element
--> [299](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/reader/sdmxml.py:299) result = func(self, element)
[300](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/reader/sdmxml.py:300) except TypeError:
File [c:\Users\timot\anaconda3\envs\SDMX\Lib\site-packages\pandasdmx\reader\sdmxml.py:1190](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/reader/sdmxml.py:1190), in _ms(reader, elem)
[1189](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/reader/sdmxml.py:1189) else:
-> [1190](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/reader/sdmxml.py:1190) raise RuntimeError
[1192](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/reader/sdmxml.py:1192) if arg["values_for"] is None:
RuntimeError:
The above exception was the direct cause of the following exception:
XMLParseError Traceback (most recent call last)
File [c:\Pystuff\pandasdmx\Fresh.py:6](file:///C:/Pystuff/pandasdmx/Fresh.py:6)
[2](file:///C:/Pystuff/pandasdmx/Fresh.py:2) import pandasdmx as sdmx
[4](file:///C:/Pystuff/pandasdmx/Fresh.py:4) abs_xml = sdmx.Request("ABS_XML")
----> [6](file:///C:/Pystuff/pandasdmx/Fresh.py:6) resp = abs_xml.data('ABS_ANNUAL_ERP_LGA2022',
[7](file:///C:/Pystuff/pandasdmx/Fresh.py:7) key = dict(SEX_ABS='1'),
[8](file:///C:/Pystuff/pandasdmx/Fresh.py:8) params = dict(startPeriod='2021'))
File [c:\Users\timot\anaconda3\envs\SDMX\Lib\site-packages\pandasdmx\api.py:457](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/api.py:457), in Request.get(self, resource_type, resource_id, tofile, use_cache, dry_run, **kwargs)
[455](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/api.py:455) req = self._request_from_url(kwargs)
[456](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/api.py:456) else:
--> [457](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/api.py:457) req = self._request_from_args(kwargs)
[459](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/api.py:459) req = self.session.prepare_request(req)
[461](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/api.py:461) # Now get the SDMX message via HTTP
File [c:\Users\timot\anaconda3\envs\SDMX\Lib\site-packages\pandasdmx\api.py:287](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/api.py:287), in Request._request_from_args(self, kwargs)
[283](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/api.py:283) raise ValueError(f"unrecognized arguments: {kwargs!r}")
[285](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/api.py:285) if validate:
[286](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/api.py:286) # Make the key, and retain the DSD (if any) for use in parsing
--> [287](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/api.py:287) key, dsd = self._make_key(resource_type, resource_id, key, dsd)
[288](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/api.py:288) kwargs["dsd"] = dsd
[290](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/api.py:290) url_parts.append(key)
File [c:\Users\timot\anaconda3\envs\SDMX\Lib\site-packages\pandasdmx\api.py:184](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/api.py:184), in Request._make_key(self, resource_type, resource_id, key, dsd)
[180](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/api.py:180) pass
[181](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/api.py:181) elif self.source.supports[Resource.datastructure]:
[182](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/api.py:182) # Retrieve the DataStructureDefinition
[183](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/api.py:183) dsd = (
--> [184](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/api.py:184) self.dataflow(
[185](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/api.py:185) resource_id, params=dict(references="all"), use_cache=True
[186](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/api.py:186) )
[187](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/api.py:187) .dataflow[resource_id]
[188](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/api.py:188) .structure
[189](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/api.py:189) )
[191](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/api.py:191) if dsd.is_external_reference:
[192](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/api.py:192) # DataStructureDefinition was not retrieved with the Dataflow
[193](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/api.py:193) # query; retrieve it explicitly
[194](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/api.py:194) dsd = self.get(resource=dsd, use_cache=True).structure[dsd.id]
File [c:\Users\timot\anaconda3\envs\SDMX\Lib\site-packages\pandasdmx\api.py:514](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/api.py:514), in Request.get(self, resource_type, resource_id, tofile, use_cache, dry_run, **kwargs)
[511](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/api.py:511) reader = Reader()
[513](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/api.py:513) # Parse the message, using any provided or auto-queried DSD
--> [514](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/api.py:514) msg = reader.read_message(response_content, dsd=kwargs.get("dsd", None))
[516](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/api.py:516) # Store the HTTP response with the message
[517](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/api.py:517) msg.response = response
File [c:\Users\timot\anaconda3\envs\SDMX\Lib\site-packages\pandasdmx\reader\sdmxml.py:317](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/reader/sdmxml.py:317), in Reader.read_message(self, source, dsd)
[315](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/reader/sdmxml.py:315) self._dump()
[316](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/reader/sdmxml.py:316) print(etree.tostring(element, pretty_print=True).decode())
--> [317](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/reader/sdmxml.py:317) raise XMLParseError from exc
[319](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/reader/sdmxml.py:319) # Parsing complete; count uncollected items from the stacks, which represent
[320](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/reader/sdmxml.py:320) # parsing errors
[321](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/reader/sdmxml.py:321)
[322](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/reader/sdmxml.py:322) # Remove some internal items
[323](file:///C:/Users/timot/anaconda3/envs/SDMX/Lib/site-packages/pandasdmx/reader/sdmxml.py:323) self.pop_single("SS without DSD")
XMLParseError: RuntimeErrorThe error looks to occur when trying to get the dsd structure information.
dsd = abs_xml.dataflow('ABS_ANNUAL_ERP_LGA2022', params=dict(references="all"), use_cache=True).dataflow['ABS_ANNUAL_ERP_LGA2022'].structureSpecifying references=descendants and then using the information returned, allows the data request to complete successfully.
dsd = abs_xml.dataflow('ABS_ANNUAL_ERP_LGA2022', params=dict(references="descendants"), use_cache=True).dataflow['ABS_ANNUAL_ERP_LGA2022'].structure
resp = abs_xml.data('ABS_ANNUAL_ERP_LGA2022',
key = dict(SEX_ABS='1'),
params = dict(startPeriod='2021'),
dsd = dsd)<pandasdmx.DataMessage>
<Header>
id: 'IREF030445'
prepared: '2024-02-08T02:01:51'
sender: <Agency _Stat_V8>
source:
test: False
response: <Response [200]>
DataSet (1)
dataflow: <DataflowDefinition (missing id)>
observation_dimension: <TimeDimension TIME_PERIOD>
My main suspicion would be parsing the 2 content constraints returned from,
https://api.data.abs.gov.au/dataflow/ABS/ABS_ANNUAL_ERP_LGA2022/latest?references=all.
These are automatically generated during a point in time release,
https://sis-cc.gitlab.io/dotstatsuite-documentation/using-api/embargo-management/#point-in-time-release-feature
Metadata
Metadata
Assignees
Labels
No labels