Datasets:
The dataset viewer is not available for this split.
Error code: StreamingRowsError
Exception: FileNotFoundError
Message: https://github.com/purvimisal/OneStopCorpus-Compiled/raw/main/Texts-SeparatedByReadingLevel.zip
Traceback: Traceback (most recent call last):
File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/implementations/http.py", line 417, in _info
await _file_info(
File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/implementations/http.py", line 837, in _file_info
r.raise_for_status()
File "/src/services/worker/.venv/lib/python3.9/site-packages/aiohttp/client_reqrep.py", line 1005, in raise_for_status
raise ClientResponseError(
aiohttp.client_exceptions.ClientResponseError: 503, message='first byte timeout', url=URL('https://raw.githubusercontent.com/purvimisal/OneStopCorpus-Compiled/main/Texts-SeparatedByReadingLevel.zip')
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/src/services/worker/src/worker/utils.py", line 263, in get_rows_or_raise
return get_rows(
File "/src/services/worker/src/worker/utils.py", line 204, in decorator
return func(*args, **kwargs)
File "/src/services/worker/src/worker/utils.py", line 241, in get_rows
rows_plus_one = list(itertools.islice(ds, rows_max_number + 1))
File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 1353, in __iter__
for key, example in ex_iterable:
File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 207, in __iter__
yield from self.generate_examples_fn(**self.kwargs)
File "/tmp/modules-cache/datasets_modules/datasets/onestop_english/6b19eec5680862ad1cf1990e98b06a98d1fa4c85f3585dc4dfab93f52b89d9cf/onestop_english.py", line 132, in _generate_examples
split_text, split_labels = self._get_examples_from_split(split_key, data_dir)
File "/tmp/modules-cache/datasets_modules/datasets/onestop_english/6b19eec5680862ad1cf1990e98b06a98d1fa4c85f3585dc4dfab93f52b89d9cf/onestop_english.py", line 91, in _get_examples_from_split
files = os.listdir(dir_path)
File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/streaming.py", line 74, in wrapper
return function(*args, download_config=download_config, **kwargs)
File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/download/streaming_download_manager.py", line 532, in xlistdir
fs, *_ = fsspec.get_fs_token_paths(path, storage_options=storage_options)
File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/core.py", line 606, in get_fs_token_paths
fs = filesystem(protocol, **inkwargs)
File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/registry.py", line 261, in filesystem
return cls(**storage_options)
File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/spec.py", line 76, in __call__
obj = super().__call__(*args, **kwargs)
File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/implementations/zip.py", line 58, in __init__
self.fo = fo.__enter__() # the whole instance is a context
File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/core.py", line 102, in __enter__
f = self.fs.open(self.path, mode=mode)
File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/spec.py", line 1199, in open
f = self._open(
File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/implementations/http.py", line 356, in _open
size = size or self.info(path, **kwargs)["size"]
File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/asyn.py", line 115, in wrapper
return sync(self.loop, func, *args, **kwargs)
File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/asyn.py", line 100, in sync
raise return_result
File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/asyn.py", line 55, in _runner
result[0] = await coro
File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/implementations/http.py", line 430, in _info
raise FileNotFoundError(url) from exc
FileNotFoundError: https://github.com/purvimisal/OneStopCorpus-Compiled/raw/main/Texts-SeparatedByReadingLevel.zipNeed help to make the dataset viewer work? Open a discussion for direct support.
Dataset Card for OneStopEnglish corpus
Dataset Summary
OneStopEnglish is a corpus of texts written at three reading levels, and demonstrates its usefulness for through two applications - automatic readability assessment and automatic text simplification.
Supported Tasks and Leaderboards
[More Information Needed]
Languages
[More Information Needed]
Dataset Structure
Data Instances
An instance example:
{
"text": "When you see the word Amazon, what’s the first thing you think...",
"label": 0
}
Note that each instance contains the full text of the document.
Data Fields
text: Full document text.label: Reading level of the document- ele/int/adv (Elementary/Intermediate/Advance).
Data Splits
The OneStopEnglish dataset has a single train split.
| Split | Number of instances |
|---|---|
| train | 567 |
Dataset Creation
Curation Rationale
[More Information Needed]
Source Data
Initial Data Collection and Normalization
[More Information Needed]
Who are the source language producers?
[More Information Needed]
Annotations
Annotation process
[More Information Needed]
Who are the annotators?
[More Information Needed]
Personal and Sensitive Information
[More Information Needed]
Considerations for Using the Data
Social Impact of Dataset
[More Information Needed]
Discussion of Biases
[More Information Needed]
Other Known Limitations
[More Information Needed]
Additional Information
Dataset Curators
[More Information Needed]
Licensing Information
Creative Commons Attribution-ShareAlike 4.0 International License
Citation Information
[More Information Needed]
Contributions
Thanks to @purvimisal for adding this dataset.
- Downloads last month
- 875