Skip to content
This repository was archived by the owner on Sep 16, 2022. It is now read-only.

Commit 736a649

Browse files
authored
Merge test to stable: v0.1.3-rc1
* Changed datacite output according to comments * CSCMETAX-61: [REF] Update requirements.txt. Brings about a warning about gevent monkey-patching, but according to gevent/gevent#1016 should not affect since we do not import gevent in the code. * CSCMETAX-379:[ADD] special handling of syke datasets for urnresolver * Update syke.py * CSCMETAX-61: [REF] Refactor modifying the Request object in CommonViewSet to be more sensible * CSCMETAX-393: [ADD] Add permission class ServicePermissions, which reads app_config and controls general read and write access for services per each api * CSCMETAX-401:[ADD] more data to the oai_dc metadataformat output * Update minimal_api.py * CSCMETAX-359: [ADD] research_dataset.relation.entity.type value is auto-populated from resource_type reference data, if given. Adjust test data and tests accordingly. * CSCMETAX-394: [ADD] Remove sensitive fields from datasets apis * CSCMETAX-61: [FIX] Restrict /datasets/pid/files from public * CSCMETAX-407: [ADD] Query parameter to filter datasets by field metadata_owner_org * CSCMETAX-398: [ADD] Datasets api: Allow the first addition of files to an empty dataset to occur without creating a new dataset version (update from 0 files to N files) * CSCMETAX-61: [REF] Change Travis deploy user * CSCMETAX-400: [ADD] Datasets api: Add query param ?migration_override=bool which enables passing a custom preferred_identifier when creating datasets * CSCMETAX-395: [FIX] Datasets api /datasets/pid/files now also supports ?file_fields=x,y,z parameter * CSCMETAX-394: [ADD] Remove sensitive fields (email, phone, telephone) from OAI-PMH api outputs * CSCMETAX-394: [FIX] Uncomment the part that does the actual cleaning... In tests, search for known sensitive values instead of field names * CSCMETAX-61: [FIX] Try a fix to an error in initial data loading during travis deployment * CSCMETAX-408: [ADD] Remove exclusion of migrations/ directory from .gitignore. Generate first migration files. Fix .flake8 to ignore migrations/ directory since there are files that are autogenerated * Schema changes required by the Etsin migration. * Etsim migration related schema changes (part2) * CSCMETAX-280:[ADD] datacatalog harvesting * Update minimal_api.py * CSCMETAX-406: [REF] Update data catalog and dataset schemas * CSCMETAX-280:[FIX] according to review comments. * CSCMETAX-280:[FIX] proper handling of language field data. * CSCMETAX-280:[FIX] proper handling of language field data. * CSCMETAX-406: [REF] Update data catalog and dataset schemas * CSCMETAX-406: [REF] Update test data to comform with latest schema changes (description field cardinality change, some other) * CSCMETAX-406: [FIX] OAI-PMH tests after schema updates * CSCMETAX-280:[ADD] limit for metadata prefixes for datacatalog set. * CSCMETAX-406: [ADD] Update data catalog and production catalog schemas also
1 parent 9885627 commit 736a649

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

55 files changed

+2238
-1411
lines changed

.flake8

+2-2
Original file line numberDiff line numberDiff line change
@@ -27,5 +27,5 @@ ignore =
2727
# do not use bare except
2828
E722
2929

30-
# often contains "unused" imports
31-
exclude = __init__.py, src/metax_api/migrations, src/static
30+
# often contains "unused" imports, too long lines (generated files), and such
31+
exclude = __init__.py,src/metax_api/migrations/*,src/static

.gitignore

-2
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,4 @@
66
/src/.coverage
77
.idea/
88
ubuntu-xenial-16.04-cloudimg-console.log
9-
/src/metax_api/migrations/*
10-
!/src/metax_api/migrations/*__keep*
119
.ropeproject/

.travis-deploy.sh

+2-2
Original file line numberDiff line numberDiff line change
@@ -13,11 +13,11 @@ cd metax-ops/ansible/
1313
if [[ "$TRAVIS_BRANCH" == "test" && "$TRAVIS_PULL_REQUEST" == "false" ]]; then
1414
echo "Deploying to test.."
1515
ansible-galaxy -r requirements.yml install --roles-path=roles
16-
ansible-playbook -vv -i inventories/test/hosts site_deploy.yml --extra-vars "ssh_user=metax-user"
16+
ansible-playbook -vv -i inventories/test/hosts site_deploy.yml --extra-vars "ssh_user=metax-deploy-user"
1717
elif [[ "$TRAVIS_BRANCH" == "stable" && "$TRAVIS_PULL_REQUEST" == "false" ]]; then
1818
echo "Deploying to stable.."
1919
ansible-galaxy -r requirements.yml install --roles-path=roles
20-
ansible-playbook -vv -i inventories/stable/hosts site_deploy.yml --extra-vars "ssh_user=metax-user"
20+
ansible-playbook -vv -i inventories/stable/hosts site_deploy.yml --extra-vars "ssh_user=metax-deploy-user"
2121
fi
2222

2323
# Make sure the last command to run before this part is the ansible-playbook command

.travis.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ services:
1717
- postgresql
1818

1919
before_install:
20-
- openssl aes-256-cbc -K $encrypted_596a6d1c4f83_key -iv $encrypted_596a6d1c4f83_iv -in deploy-key.enc -out deploy-key -d
20+
- openssl aes-256-cbc -K $encrypted_62ed3fb8af4c_key -iv $encrypted_62ed3fb8af4c_iv -in deploy-key.enc -out deploy-key -d
2121
- rm deploy-key.enc
2222
- chmod 600 deploy-key
2323
- mv deploy-key ~/.ssh/id_rsa

deploy-key.enc

-16 Bytes
Binary file not shown.

requirements.txt

+5-5
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,24 @@
11
coveralls==1.3.0 # code coverage reportin in travis
22
dicttoxml==1.7.4
3-
python-dateutil==2.7.1
3+
python-dateutil==2.7.3
44
Django==2.0 # BSD-license
55
elasticsearch<6.0.0
66
hiredis==0.2.0 # Used by redis (redis-py) for parser
77
djangorestframework==3.8.2 # BSD-license
88
django-rainbowtests==0.6.0 # colored test output
99
flake8==3.5.0 # MIT-license
10-
gevent==1.2.2 # gunicorn dep
11-
gunicorn==19.7.1 # MIT-license
10+
gevent==1.3.1 # gunicorn dep
11+
gunicorn==19.8.1 # MIT-license
1212
ipdb==0.11 # dev tool
1313
jsonschema==2.6.0
1414
lxml==4.2.1
1515
pika==0.11.2
1616
psycopg2-binary==2.7.4 # LGPL with exceptions or ZPL
1717
pyoai==2.5.0
1818
python-simplexquery==1.0.5.3
19-
pytz==2018.3
19+
pytz==2018.4
2020
pyyaml==3.12
2121
redis==2.10.6
2222
requests==2.18.4 # Apache 2.0-license
23-
simplejson==3.13.2 # MIT-license
23+
simplejson==3.15.0 # MIT-license
2424
urllib3==1.22

src/metax_api/api/oaipmh/base/metax_oai_server.py

+185-36
Original file line numberDiff line numberDiff line change
@@ -7,14 +7,16 @@
77
from oaipmh.error import IdDoesNotExistError
88
from oaipmh.error import BadArgumentError
99

10-
from metax_api.models.catalog_record import CatalogRecord
10+
from metax_api.models.catalog_record import CatalogRecord, DataCatalog
1111
from metax_api.services import CatalogRecordService as CRS
1212

13+
syke_url_prefix_template = 'http://metatieto.ymparisto.fi:8080/geoportal/catalog/search/resource/details.page?uuid=%s'
14+
1315

1416
class MetaxOAIServer(ResumptionOAIPMH):
1517

1618
def _is_valid_set(self, set):
17-
if not set or set == 'urnresolver' or set in settings.OAI['SET_MAPPINGS']:
19+
if not set or set in ['urnresolver', 'datacatalogs'] or set in settings.OAI['SET_MAPPINGS']:
1820
return True
1921
return False
2022

@@ -30,16 +32,20 @@ def _get_filtered_records(self, set, cursor, batch_size, from_=None, until=None)
3032
if not self._is_valid_set(set):
3133
raise BadArgumentError('invalid set value')
3234

33-
query_set = CatalogRecord.objects.all()
35+
proxy = CatalogRecord
36+
if set == 'datacatalogs':
37+
proxy = DataCatalog
38+
39+
query_set = proxy.objects.all()
3440
if from_ and until:
35-
query_set = CatalogRecord.objects.filter(date_modified__gte=from_, date_modified__lte=until)
41+
query_set = proxy.objects.filter(date_modified__gte=from_, date_modified__lte=until)
3642
elif from_:
37-
query_set = CatalogRecord.objects.filter(date_modified__gte=from_)
43+
query_set = proxy.objects.filter(date_modified__gte=from_)
3844
elif until:
39-
query_set = CatalogRecord.objects.filter(date_modified__lte=until)
45+
query_set = proxy.objects.filter(date_modified__lte=until)
4046

4147
if set:
42-
if set == 'urnresolver':
48+
if set in ['urnresolver', 'datacatalogs']:
4349
pass
4450
else:
4551
query_set = query_set.filter(
@@ -48,38 +54,154 @@ def _get_filtered_records(self, set, cursor, batch_size, from_=None, until=None)
4854
query_set = query_set.filter(data_catalog__catalog_json__identifier__in=self._get_default_set_filter())
4955
return query_set[cursor:batch_size]
5056

57+
def _handle_syke_urnresolver_metadata(self, record):
58+
identifiers = []
59+
preferred_identifier = record.research_dataset.get('preferred_identifier')
60+
identifiers.append(preferred_identifier)
61+
for id_obj in record.research_dataset.get('other_identifier', []):
62+
if id_obj.get('notation', '').startswith('{'):
63+
uuid = id_obj['notation']
64+
identifiers.append(syke_url_prefix_template % uuid)
65+
return identifiers
66+
5167
def _get_oai_dc_urnresolver_metadata(self, record):
5268
"""
5369
Preferred identifier is added only for ida and att catalog records
5470
other identifiers are added for all.
71+
72+
Special handling for SYKE catalog.
5573
"""
74+
5675
identifiers = []
57-
identifiers.append(settings.OAI['ETSIN_URL_TEMPLATE'] % record.identifier)
5876

59-
# assuming ida and att catalogs are not harvested
60-
if not record.catalog_is_harvested():
61-
preferred_identifier = record.research_dataset.get('preferred_identifier')
62-
identifiers.append(preferred_identifier)
63-
for id_obj in record.research_dataset.get('other_identifier', []):
64-
if id_obj.get('notation', '').startswith('urn:nbn:fi:csc-kata'):
65-
other_urn = id_obj['notation']
66-
identifiers.append(other_urn)
77+
data_catalog = record.data_catalog.catalog_json.get('identifier')
78+
if data_catalog == 'urn:nbn:fi:att:data-catalog-harvest-syke':
79+
identifiers = self._handle_syke_urnresolver_metadata(record)
80+
81+
else:
82+
identifiers.append(settings.OAI['ETSIN_URL_TEMPLATE'] % record.identifier)
83+
84+
# assuming ida and att catalogs are not harvested
85+
if not record.catalog_is_harvested():
86+
preferred_identifier = record.research_dataset.get('preferred_identifier')
87+
identifiers.append(preferred_identifier)
88+
for id_obj in record.research_dataset.get('other_identifier', []):
89+
if id_obj.get('notation', '').startswith('urn:nbn:fi:csc-kata'):
90+
other_urn = id_obj['notation']
91+
identifiers.append(other_urn)
6792

6893
meta = {
6994
'identifier': identifiers
7095
}
7196
return meta
7297

73-
def _get_oai_dc_metadata(self, record):
74-
identifier = record.research_dataset.get('preferred_identifier')
98+
def _get_oaic_dc_value(self, value, lang=None):
99+
valueDict = {}
100+
valueDict['value'] = value
101+
if lang:
102+
valueDict['lang'] = lang
103+
return valueDict
104+
105+
def _get_oai_dc_metadata(self, record, json, type):
106+
identifier = []
107+
if 'preferred_identifier' in json:
108+
identifier.append(self._get_oaic_dc_value(json.get('preferred_identifier')))
109+
if 'identifier' in json:
110+
identifier.append(self._get_oaic_dc_value(json.get('identifier')))
111+
112+
title = []
113+
title_data = json.get('title', {})
114+
for key, value in title_data.items():
115+
title.append(self._get_oaic_dc_value(value, key))
116+
117+
creator = []
118+
creator_data = json.get('creator', [])
119+
for value in creator_data:
120+
if 'name' in value:
121+
creator.append(self._get_oaic_dc_value(value.get('name')))
122+
123+
subject = []
124+
subject_data = json.get('keyword', [])
125+
for value in subject_data:
126+
subject.append(self._get_oaic_dc_value(value))
127+
subject_data = json.get('field_of_science', [])
128+
for value in subject_data:
129+
for key, value2 in value.get('pref_label', {}).items():
130+
subject.append(self._get_oaic_dc_value(value2, key))
131+
subject_data = json.get('theme', [])
132+
for value in subject_data:
133+
for key, value2 in value.get('pref_label', {}).items():
134+
subject.append(self._get_oaic_dc_value(value2, key))
135+
136+
desc = []
137+
desc_data = json.get('description', {}).get('name', {})
138+
for key, value in desc_data.items():
139+
desc.append(self._get_oaic_dc_value(value, key))
140+
141+
publisher = []
142+
publisher_data = json.get('publisher', {})
143+
for key, value in publisher_data.get('name', {}).items():
144+
publisher.append(self._get_oaic_dc_value(value, key))
145+
146+
contributor = []
147+
contributor_data = json.get('contributor', [])
148+
for value in contributor_data:
149+
if 'name' in value:
150+
contributor.append(self._get_oaic_dc_value(value.get('name')))
151+
152+
date = self._get_oaic_dc_value(str(record.date_created))
153+
154+
language = []
155+
language_data = json.get('language', [])
156+
for value in language_data:
157+
if 'identifier' in value:
158+
language.append(self._get_oaic_dc_value(value['identifier']))
159+
160+
relation = []
161+
relation_data = json.get('relation', [])
162+
for value in relation_data:
163+
if 'identifier'in value.get('entity', {}):
164+
relation.append(self._get_oaic_dc_value(value['entity']['identifier']))
165+
166+
coverage = []
167+
coverage_data = json.get('spatial', [])
168+
for value in coverage_data:
169+
if 'geographic_name' in value:
170+
coverage.append(self._get_oaic_dc_value(value['geographic_name']))
171+
172+
rights = []
173+
rights_data = json.get('access_rights', {})
174+
rights_desc = rights_data.get('description', {}).get('name', {})
175+
for key, value in rights_desc.items():
176+
rights.append(self._get_oaic_dc_value(value, key))
177+
178+
for value in rights_data.get('license', []):
179+
if 'identifier' in value:
180+
rights.append(self._get_oaic_dc_value(value['identifier']))
181+
182+
types = []
183+
types.append(self._get_oaic_dc_value(type))
184+
75185
meta = {
76-
'identifier': [identifier]
186+
'identifier': identifier,
187+
'title': title,
188+
'creator': creator,
189+
'subject': subject,
190+
'description': desc,
191+
'publisher': publisher,
192+
'contributor': contributor,
193+
'date': [date],
194+
'type': types,
195+
'language': language,
196+
'relation': relation,
197+
'coverage': coverage,
198+
'rights': rights
77199
}
78200
return meta
79201

80-
def _get_oai_datacite_metadata(self, record):
202+
def _get_oai_datacite_metadata(self, json):
81203
datacite_xml = CRS.transform_datasets_to_format(
82-
{'research_dataset': record.research_dataset}, 'datacite', False
204+
{'research_dataset': json}, 'datacite', False
83205
)
84206
meta = {
85207
'datacentreSymbol': 'Metax',
@@ -88,13 +210,20 @@ def _get_oai_datacite_metadata(self, record):
88210
}
89211
return meta
90212

91-
def _get_metadata_for_record(self, record, metadata_prefix):
213+
def _get_metadata_for_record(self, record, json, type, metadata_prefix):
214+
if type == 'Datacatalog' and metadata_prefix != 'oai_dc':
215+
raise BadArgumentError('Invalid set value. DataCatalogs can only be harvested using oai_dc format.')
216+
92217
meta = {}
218+
json = CRS.strip_catalog_record(json)
219+
93220
if metadata_prefix == 'oai_dc':
94-
meta = self._get_oai_dc_metadata(record)
221+
meta = self._get_oai_dc_metadata(record, json, type)
95222
elif metadata_prefix == 'oai_datacite':
96-
meta = self._get_oai_datacite_metadata(record)
223+
meta = self._get_oai_datacite_metadata(json)
97224
elif metadata_prefix == 'oai_dc_urnresolver':
225+
# This is a special case. Only identifier values are retrieved from the record,
226+
# so strip_catalog_record is not applicable here.
98227
meta = self._get_oai_dc_urnresolver_metadata(record)
99228
return self._fix_metadata(meta)
100229

@@ -106,9 +235,14 @@ def _get_header_timestamp(self, record):
106235
timestamp = record.date_created
107236
return timezone.make_naive(timestamp)
108237

109-
def _get_oai_item(self, record, metadata_prefix):
110-
identifier = record.identifier
111-
metadata = self._get_metadata_for_record(record, metadata_prefix)
238+
def _get_oai_item(self, identifier, record, metadata_prefix):
239+
metadata = self._get_metadata_for_record(record, record.research_dataset, 'Dataset', metadata_prefix)
240+
item = (common.Header('', identifier, self._get_header_timestamp(record), ['metax'], False),
241+
common.Metadata('', metadata), None)
242+
return item
243+
244+
def _get_oai_catalog_item(self, identifier, record, metadata_prefix):
245+
metadata = self._get_metadata_for_record(record, record.catalog_json, 'Datacatalog', metadata_prefix)
112246
item = (common.Header('', identifier, self._get_header_timestamp(record), ['metax'], False),
113247
common.Metadata('', metadata), None)
114248
return item
@@ -161,18 +295,24 @@ def listMetadataFormats(self, identifier=None):
161295

162296
def listSets(self, cursor=None, batch_size=None):
163297
"""Implement OAI-PMH verb ListSets."""
164-
data = []
298+
data = [('datacatalogs', 'datacatalog', '')]
165299
for set_key in settings.OAI['SET_MAPPINGS'].keys():
166300
data.append((set_key, set_key, ''))
167301
return data
168302

303+
def _get_record_identifier(self, record, set):
304+
if set == 'datacatalogs':
305+
return record.catalog_json['identifier']
306+
else:
307+
return record.identifier
308+
169309
def listIdentifiers(self, metadataPrefix=None, set=None, cursor=None,
170310
from_=None, until=None, batch_size=None):
171311
"""Implement OAI-PMH verb listIdentifiers."""
172312
records = self._get_filtered_records(set, cursor, batch_size, from_, until)
173313
data = []
174314
for record in records:
175-
identifier = record.research_dataset.get('preferred_identifier')
315+
identifier = self._get_record_identifier(record, set)
176316
data.append(common.Header('', identifier, self._get_header_timestamp(record), ['metax'], False))
177317
return data
178318

@@ -182,18 +322,27 @@ def listRecords(self, metadataPrefix=None, set=None, cursor=None, from_=None,
182322
data = []
183323
records = self._get_filtered_records(set, cursor, batch_size, from_, until)
184324
for record in records:
185-
data.append(self._get_oai_item(record, metadataPrefix))
325+
identifier = self._get_record_identifier(record, set)
326+
if set == 'datacatalogs':
327+
data.append(self._get_oai_catalog_item(identifier, record, metadataPrefix))
328+
else:
329+
data.append(self._get_oai_item(identifier, record, metadataPrefix))
186330
return data
187331

188332
def getRecord(self, metadataPrefix, identifier):
189333
"""Implement OAI-PMH verb GetRecord."""
190334
try:
191-
record = CatalogRecord.objects.get(
192-
data_catalog__catalog_json__identifier__in=self._get_default_set_filter(),
193-
identifier__exact=identifier
194-
)
335+
record = CatalogRecord.objects.get(identifier__exact=identifier)
336+
json = record.research_dataset
337+
type = 'Dataset'
195338
except CatalogRecord.DoesNotExist:
196-
raise IdDoesNotExistError("No dataset with id %s available through the OAI-PMH interface." % identifier)
197-
metadata = self._get_metadata_for_record(record, metadataPrefix)
339+
try:
340+
record = DataCatalog.objects.get(catalog_json__identifier__exact=identifier)
341+
json = record.catalog_json
342+
type = 'Datacatalog'
343+
except DataCatalog.DoesNotExist:
344+
raise IdDoesNotExistError("No record with id %s available." % identifier)
345+
346+
metadata = self._get_metadata_for_record(record, json, type, metadataPrefix)
198347
return (common.Header('', identifier, self._get_header_timestamp(record), ['metax'], False),
199348
common.Metadata('', metadata), None)

0 commit comments

Comments
 (0)