Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Codalab Migration #4565

Open
AndrewJGaut opened this issue Nov 1, 2023 · 6 comments
Open

Codalab Migration #4565

AndrewJGaut opened this issue Nov 1, 2023 · 6 comments
Assignees
Labels
p1 Do it in the next two weeks.

Comments

@AndrewJGaut
Copy link
Contributor

We are still experiencing some bugs with the migration. We are tracking some progress here.

@AndrewJGaut AndrewJGaut added the p1 Do it in the next two weeks. label Nov 1, 2023
@AndrewJGaut
Copy link
Contributor Author

Here is a doc with some issues we are experiencing: https://docs.google.com/document/d/1A9MoAOnbf7ALGgHic6emy7Svc44ACPJORylbwel3NrY/edit

The good news is that the errors appear to mostly be due to zip_directory and tar_gzip_directory functions, so at least it's localized. The bad news is that some errors appear to be due to bugs in those functions (which, by the way, weren't changed in any migration script PR). So, may be tough to fix.

@AndrewJGaut
Copy link
Contributor Author

We tried both zip_directory and tar_gzip_directory . We tried the latter first and kept getting errors of the form

Exception in thread Thread-140:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/threading.py", line 917, in _bootstrap_inner
    self.run()
  File "/opt/conda/lib/python3.7/threading.py", line 865, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/codalab-worksheets/codalab/lib/upload_manager.py", line 308, in create_index
    indexFilePath=tmp_index_file.name,
  File "/opt/codalab-worksheets/codalab/lib/beam/SQLiteIndexedTar.py", line 273, in __init__
    self._createIndex(self.tarFileObject)
  File "/opt/codalab-worksheets/codalab/lib/beam/SQLiteIndexedTar.py", line 619, in _createIndex
    for tarInfo in loadedTarFile:
  File "/opt/conda/lib/python3.7/tarfile.py", line 2403, in __iter__
    tarinfo = self.next()
  File "/opt/conda/lib/python3.7/tarfile.py", line 2279, in next
    self.fileobj.seek(self.offset - 1)
  File "/opt/conda/lib/python3.7/tarfile.py", line 517, in seek
    self.read(self.bufsize)
  File "/opt/conda/lib/python3.7/tarfile.py", line 537, in read
    buf = self._read(size)
  File "/opt/conda/lib/python3.7/tarfile.py", line 545, in _read
    return self.__read(size)
  File "/opt/conda/lib/python3.7/tarfile.py", line 568, in __read
    buf = self.fileobj.read(self.bufsize)
  File "indexed_gzip/indexed_gzip.pyx", line 797, in indexed_gzip.indexed_gzip._IndexedGzipFile.readinto
indexed_gzip.indexed_gzip.ZranError: zran_read returned error: ZRAN_READ_FAIL (file: n/a)

so we tried zip_directory as well.

With zip_directory, we got errors for about 15.5% of bundles. (And, again, pretty much every error was due to zip_directory. Still collecting data for tar_gzip_directory

@AndrewJGaut
Copy link
Contributor Author

Collecting some extra results right now with some extra logging we set up. (I had meant to run that overnight, but there was an error)

@AndrewJGaut
Copy link
Contributor Author

Some more errors:

{
    "unexpected end of data": {
        "uuid": [
            "0x0010b52d853b46bbb522bff4c80c21b4",
            "0x004192f64515479ebe14145cf4eefc53",
            "0x00517d4ac0e448b1b5d6299af1fe5c86",
            "0x005410fe6f5f48088f10e7b197c77d52",
            "0x00613a51b5ae48b4b9e501556e64865b",
            "0x006e83c2761e4c43b9b88bfbed725f35",
            "0x007c8af66f684838afe617c13a16ce8d",
            "0x007dbf30e3a145f483789f3e2d1889aa",
            "0x00b34a5b271a4e7c9bdafb1b606838f0",
            "0x00e15e636ed449ada2eee4d76ddc103b",
            "0x011a84546043492c96371835ebaed2d2",
            "0x013e2f4933a74ae38b2d6a1be18509ca",
            "0x0162e3ce38774e81a9bbff7baf254f78"
        ],
        "traceback": "Traceback (most recent call last):\n  File \"codalab/migration.py\", line 316, in migrate_bundle\n    bundle_uuid, disk_location, bundle_info, is_
dir, target_location\n  File \"codalab/migration.py\", line 219, in sanity_check\n    new_file_list = tarfile.open(fileobj=f, mode='r:gz').getnames()\n  File \"/opt/con
da/lib/python3.7/tarfile.py\", line 1769, in getnames\n    return [tarinfo.name for tarinfo in self.getmembers()]\n  File \"/opt/conda/lib/python3.7/tarfile.py\", line
1761, in getmembers\n    self._load()        # all members, we first have to\n  File \"/opt/conda/lib/python3.7/tarfile.py\", line 2348, in _load\n    tarinfo = self.ne
xt()\n  File \"/opt/conda/lib/python3.7/tarfile.py\", line 2281, in next\n    raise ReadError(\"unexpected end of data\")\ntarfile.ReadError: unexpected end of data\n",
        "count": 13
    },
    "SanityCheck failed with Directory file lists differ.": {
        "uuid": [
            "0x0010e4ab1f7f4eff9a500ba20c2d7320",
            "0x001898c9d2604666a559fee8d0fbda85",
            "0x0038a2efb4874b878a86b2d5f9836d22",
            "0x003d435154ec4df9bd0ab3bf8a6766af",
            "0x0044f1ee3fd64bce840ee54f812656e9",
            "0x004b29fe995441d28d5d89de847e4bb9",
            "0x004cafc5b31c4084a2567e75b7b17eb4",
            "0x004dd2ff13ac428bac38980b38d407ac",
            "0x004b29fe995441d28d5d89de847e4bb9",                                                                                                             [326/1907]
            "0x004cafc5b31c4084a2567e75b7b17eb4",
            "0x004dd2ff13ac428bac38980b38d407ac",
            "0x004f15e956964564b6387840bb1505e5",
            "0x004fa10dd0c2476492f5c096d7d09a75",
            "0x0050001b39b24e4284e4eb14bef6d7e6",
            "0x005085de214946f5b97f686be86d9a37",
            "0x00509cde2a804b21892dcf36b2eda997",
            "0x00525a4fcb6745f2b768aa45919f9266",
            "0x0057a5b7324c4c9b881d026efce4fdc3",
            "0x00597a24d4f5442ebc50e24d4d23ee09",
            "0x0059d68d14534d0aa7d4467531e071b5",
            "0x005aed6ccc944da68041e31c6c63427f",
            "0x0060045d9d2846c79eae1c5ec5b5184b",
            "0x00611d941cb740cfba607d4684a0b63b",
            "0x0068903074b94a09b495049e44fca559",
            "0x006c7bf8d4ff479792137a348247f849",
            "0x00708c13508d480e9a790dc0af8e372b",
            "0x0071fbb750f8442f9759df7208119490",
            "0x007766928c7d462ab887c46f7b288cf3",
            "0x007bde03ce7c41daad26756e2363a0aa",
            "0x007d85a1aca5426cbd8538332ca93a33",
            "0x007e38e1a0634568a36129f652ca4a72",
            "0x007ee2059de345c2925467682cc0276a",
            "0x007fb65bc59b42579fbad6008c516d05",
            "0x0081f9980fe241f9b5f55f8a46def771",
            "0x00825aed573b423cb519775dcb9f17a9",
            "0x00851a7c2faa4de2916d750f7337a139",
            "0x00876d7f0cf44f53af417c5a182ee175",
            "0x009026eb4efa4b718b7ec12b62fbdb39",
            "0x00916a691d154d1b85ae1e6ba1663229",
            "0x0095d83e1f034ac38a998850a81e6abf",
            "0x0096a0b710454da9a1bd0caa293ebf7b",
            "0x00985fcf23e94a31a588eccb22546849",
            "0x009a5673b0e94a5e9584fa5a884ee8d8",
            "0x009ba6f437db49feb90f1c031408cff1",
            "0x009c210d353c40ea993ff65a18cb499a",                                                                                                             [290/1907]
            "0x009f7b9c569d4a03a44b9e92d47a1fdf",
            "0x00a0790236534408ab7bbcd7a4d16512",
            "0x00a597d820434eab9092c2db1264a908",
            "0x00a5f4fa48d94eafaa17b1d2807696e9",
            "0x00a75ce62fd44790b95a43b72416a006",
            "0x00af45439c494c448787a0399a1de439",
            "0x00b3640159114b0dba22f506b22f8583",
            "0x00b5d8b7209840fdbb46c567f4a0c35d",
            "0x00b6ccffdbf74836a85832488eec79ed",
            "0x00b700acad894081afb1982bb70441b3",
            "0x00b789454f944d66bdef3df24aeb3fdd",
            "0x00c0e43b827c48b5b2bbd54ffbe1ea26",
            "0x00c1b2bdbb664a619f31bb9d886a5ec9",
            "0x00d7870ee97d4c66ada684c9d05a3abf",
            "0x00d92261571a44199ad8892d148f5a4f",
            "0x00d94fd4181049d4a510b361f471628f",
            "0x00db269a2edf46aea4c9fa452d9f2ba9",
            "0x00dc3b5a0120470bac0d3e5555c4fb31",
            "0x00dceca9a6264651a458c2b5d21e1897",
            "0x00dd456a28b5494ca66c3a6002f3b2c2",
            "0x00de547f450f409fa416cc5943e31a19",
            "0x00de8bd75d4a4fdb900ea38433aa55ef",
            "0x00dfc7d6295e41329778ee076bdc6ca6",
            "0x00e17a70b7be450785c512b8a7bd2de3",
            "0x00e500a3b0d44b70b21a69be60e4a970",
            "0x00e5654a090a42ae99b6fd1afdcdb238",
            "0x00e5e4e5e5eb4f4db5f4a82e2727dc81",
            "0x00e679ad24b7402eaff5fddd3bdc16cf",
            "0x00eb1a9d5e0d43c49f78c0a761800a59",
            "0x00ec129c1d2d4ea4a0594ee4b24dc6ed",
            "0x00eec6267f254d67a8a8ab5677ba4a90",
            "0x00ef1f5baa3d4c0392779dd400f8d752",
            "0x00f0d34f68804eb7b1e157fe380223fc",
            "0x00f4951357664b0ca2f34f585b9a0007",
            "0x00f9fd2a04af4ccc92ac4eebdfb2385e",
            "0x01043683f0c846229726c664e87f2b4a",                                                                                                             [254/1907]
            "0x0106c35d373d4b84a07be46cec0469c6",
            "0x0109b2fb6b37444fb24e5a931d42e1fc",
            "0x010c1ed286574f7d99369f100f6042b0",
            "0x01131b5ab40e485bb0ad4ad5e586c485",
            "0x0113f3fa97cb4608808f0b020b985efc",
            "0x01181a7ff63442569fa63cff9dd99f84",
            "0x0119a7aeb11d4104a6966e1f1c3f4461",
            "0x011b7b7fcda14a1d9d6c1baa2b02bee6",
            "0x011bd45c470543a08384969ea700546f",
            "0x011c63e4d3f44242ac007edac9800e92",
            "0x01214df9525e4e9984f65b192c2416a6",
            "0x01217e5fe7254999a10d92dc44f7fc04",
            "0x0122335234754c63aa71b2341353b194",
            "0x01241eb5981a4ad8bdbca88e04fcf60c",
            "0x0129964ee0d84cc7a6b3c57d109654ca",
            "0x012b51155fc94c018ac8834aa0c05057",
            "0x012d8af8fb4348cf889c8b49128bfc49",
            "0x012ddb5cde824ce4a02eaf944cbe4cbd",
            "0x012de7e99bf3429199543daad8586394",
            "0x013318e4e045463191ecec9396ce3f77",
            "0x0134488cef794a89a3b8c8fbf4bc5f01",
            "0x013678c91a524e148308eafc4dd2475a",
            "0x0137db98bce5455e80d9454e6a924075",
            "0x013a0b81d90f487cab86889e4daaa3ba",
            "0x013a18fca7184a1393545226a60cbf9f",
            "0x013e175cbae7499baa88806982046e49",
            "0x014026f6c2214278bbbf015fc5773d9d",
            "0x01418037e26b4b83bf31571240ed0fd2",
            "0x0142694796994d2aa33867b1699fa5d4",
            "0x0144f44509d04813bab4eacf6876aa4c",
            "0x01455518e895415a8e24d5e5a72121f8",
            "0x014827da8ada4bbf8506fdaa6d7510a7",
            "0x014a4f9d63d148008978cd98a5321863",
            "0x014a9c4ef2064edcbcc8404561e96851",
            "0x014e324451b443a7927809cf897abf43",
            "0x015019e6336444229fffd3a1248db525",                                                                                                             [218/1907]
            "0x01541ba573ac4514807a5b855697b64a",
            "0x015bf158154942038a886905aae2ebe5",
            "0x015cb56e4cec4bf3af39b42cbd630b54",
            "0x0162316c61824791b10405e3139458b8",
            "0x01641fb1d00e4a50bdac14867fad0c78",
            "0x0167127a13df48b09989a7e229253890",
            "0x016af2642a074d5fb05e9b84a1dd8819",
            "0x016af319b10a4322856bc882550fe2b4",
            "0x0170c7ab747d42e0b59a298fc9fcf44e",
            "0x01752222dce64922affb4e01e06c2f72",
            "0x017737fb205c4b70966de8ed6449cc7a",
            "0x017c2937e6ca4638b32fc8836f049b94",
            "0x017cd4e69e264838856702b3d315304f",
            "0x017e68b52d6e4d478d2ced186d1d6ec1",
            "0x017ee05da06f4841810c59bcf9392ed9",
            "0x0180c9fc0c71441695cfb29547099ef5",
            "0x018433a81ded4c0aa9d76ad7639d106e",
            "0x0184c2466da84ede904236176a3deb6f",
            "0x018c974512cb42cc938530c4455577c4",
            "0x018d14c95dec4e6592e2e9e638d03d64",
            "0x0191d7fac5f74562a47a8b4a498d7744",
            "0x0192153c5a854e6891a1b8b259d5bca1",
            "0x0193c70f24b2449490f62664b41caaac",
            "0x0194b024169c4648b376160bcaed076f",
            "0x0194f58607674c55b487afa43339cbf9",
            "0x0196606710c34e1a9a6a2ff6c315a079",
            "0x0197db8eab7c43ae873c9ae93b70ebf1",
            "0x01986938d66a4d2a905c819909371ad6",
            "0x019997531e9d46099541034af02b87fd",
            "0x019a2f6b0d06407eb69913f5ec8e1e67",
            "0x019ce13038084d5d9154e15b03510cdc",
            "0x019d667c0a6f4567a15161c2a1ae162f",
            "0x01a2a82c24704536b9cd2c87292fca0c"
        ],
        "traceback": "Traceback (most recent call last):\n  File \"codalab/migration.py\", line 326, in migrate_bundle\n    raise ValueError(f\"SanityCheck failed with
{reason}\")\nValueError: SanityCheck failed with Directory file lists differ.\n",
        "count": 147
    },
    "[Errno 2] No such file or directory: '/tmp/tmpynvttqoz/tmp.zip'": {
        "uuid": [
            "0x004b29f3e7d544d3b61b04259e40dc9a"
        ],
        "traceback": "Traceback (most recent call last):\n  File \"codalab/migration.py\", line 320, in migrate_bundle\n    self.adjust_quota_and_upload_to_blob(bundle_
uuid, bundle_location, is_dir)\n  File \"codalab/migration.py\", line 288, in adjust_quota_and_upload_to_blob\n    raise e  # still raise the expcetion to outer try-cat
ch wrapper\n  File \"codalab/migration.py\", line 280, in adjust_quota_and_upload_to_blob\n    self.upload_to_azure_blob(bundle_uuid, bundle_location, is_dir)\n  File \
"codalab/migration.py\", line 164, in upload_to_azure_blob\n    source_fileobj = zip_directory(bundle_location)\n  File \"/opt/codalab-worksheets/codalab/worker/file_ut
il.py\", line 150, in zip_directory\n    with open(tmp_zip_name, \"rb\") as out:\nFileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpynvttqoz/tmp.zip'\n",
        "count": 1
    },
    "[Errno 2] No such file or directory: '/tmp/tmp8xj8ykwv/tmp.zip'": {
        "uuid": [
            "0x00672de37b284331bb845d1d25b47e91"
        ],
        "traceback": "Traceback (most recent call last):\n  File \"codalab/migration.py\", line 320, in migrate_bundle\n    self.adjust_quota_and_upload_to_blob(bundle_
uuid, bundle_location, is_dir)\n  File \"codalab/migration.py\", line 288, in adjust_quota_and_upload_to_blob\n    raise e  # still raise the expcetion to outer try-cat
ch wrapper\n  File \"codalab/migration.py\", line 280, in adjust_quota_and_upload_to_blob\n    self.upload_to_azure_blob(bundle_uuid, bundle_location, is_dir)\n  File \
"codalab/migration.py\", line 164, in upload_to_azure_blob\n    source_fileobj = zip_directory(bundle_location)\n  File \"/opt/codalab-worksheets/codalab/worker/file_ut
il.py\", line 150, in zip_directory\n    with open(tmp_zip_name, \"rb\") as out:\nFileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmp8xj8ykwv/tmp.zip'\n",
        "count": 1
    },
    "[Errno 2] No such file or directory: '/tmp/tmpvl9n7bq8/tmp.zip'": {
        "uuid": [
            "0x0097a58f2edf46318389c205c50f9cab"
        ],
        "traceback": "Traceback (most recent call last):\n  File \"codalab/migration.py\", line 320, in migrate_bundle\n    self.adjust_quota_and_upload_to_blob(bundle_
uuid, bundle_location, is_dir)\n  File \"codalab/migration.py\", line 288, in adjust_quota_and_upload_to_blob\n    raise e  # still raise the expcetion to outer try-cat
ch wrapper\n  File \"codalab/migration.py\", line 280, in adjust_quota_and_upload_to_blob\n    self.upload_to_azure_blob(bundle_uuid, bundle_location, is_dir)\n  File \
"codalab/migration.py\", line 164, in upload_to_azure_blob\n    source_fileobj = zip_directory(bundle_location)\n  File \"/opt/codalab-worksheets/codalab/worker/file_ut
il.py\", line 150, in zip_directory\n    with open(tmp_zip_name, \"rb\") as out:\nFileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpvl9n7bq8/tmp.zip'\n",
        "count": 1
    },
    "[Errno 2] No such file or directory: '/tmp/tmp5l77pcba/tmp.zip'": {
        "uuid": [
            "0x00a7a75182d2455e8f71362c68a826da"
        ],
        "traceback": "Traceback (most recent call last):\n  File \"codalab/migration.py\", line 320, in migrate_bundle\n    self.adjust_quota_and_upload_to_blob(bundle_
uuid, bundle_location, is_dir)\n  File \"codalab/migration.py\", line 288, in adjust_quota_and_upload_to_blob\n    raise e  # still raise the expcetion to outer try-cat
ch wrapper\n  File \"codalab/migration.py\", line 280, in adjust_quota_and_upload_to_blob\n    self.upload_to_azure_blob(bundle_uuid, bundle_location, is_dir)\n  File \
"codalab/migration.py\", line 164, in upload_to_azure_blob\n    source_fileobj = zip_directory(bundle_location)\n  File \"/opt/codalab-worksheets/codalab/worker/file_ut
il.py\", line 150, in zip_directory\n    with open(tmp_zip_name, \"rb\") as out:\nFileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmp5l77pcba/tmp.zip'\n",
        "count": 1
    },
    "[Errno 12] Cannot allocate memory": {
        "uuid": [
            "0x00bb612c9e104683a9e4355cbff89543",
            "0x00be9044ed9f4e17ab5e048ad9c2df40"
        ],
        "traceback": "Traceback (most recent call last):\n  File \"codalab/migration.py\", line 320, in migrate_bundle\n    self.adjust_quota_and_upload_to_blob(bundle_
uuid, bundle_location, is_dir)\n  File \"codalab/migration.py\", line 288, in adjust_quota_and_upload_to_blob\n    raise e  # still raise the expcetion to outer try-cat
ch wrapper\n  File \"codalab/migration.py\", line 280, in adjust_quota_and_upload_to_blob\n    self.upload_to_azure_blob(bundle_uuid, bundle_location, is_dir)\n  File \
"codalab/migration.py\", line 164, in upload_to_azure_blob\n    source_fileobj = zip_directory(bundle_location)\n  File \"/opt/codalab-worksheets/codalab/worker/file_ut
il.py\", line 148, in zip_directory\n    proc = subprocess.Popen(args, stdout=subprocess.PIPE, cwd=directory_path)\n  File \"/opt/conda/lib/python3.7/subprocess.py\", l
ine 756, in __init__\n    restore_signals, start_new_session)\n  File \"/opt/conda/lib/python3.7/subprocess.py\", line 1430, in _execute_child\n    restore_signals, sta
rt_new_session, preexec_fn)\nOSError: [Errno 12] Cannot allocate memory\n",
        "count": 2
    },
    "[Errno 2] No such file or directory: '/tmp/tmpskc2htsa/tmp.zip'": {
        "uuid": [
            "0x00f20b8728c44c0a9b297cb794275ece"
        ],
        "traceback": "Traceback (most recent call last):\n  File \"codalab/migration.py\", line 320, in migrate_bundle\n    self.adjust_quota_and_upload_to_blob(bundle_
uuid, bundle_location, is_dir)\n  File \"codalab/migration.py\", line 288, in adjust_quota_and_upload_to_blob\n    raise e  # still raise the expcetion to outer try-cat
ch wrapper\n  File \"codalab/migration.py\", line 280, in adjust_quota_and_upload_to_blob\n    self.upload_to_azure_blob(bundle_uuid, bundle_location, is_dir)\n  File \
"codalab/migration.py\", line 164, in upload_to_azure_blob\n    source_fileobj = zip_directory(bundle_location)\n  File \"/opt/codalab-worksheets/codalab/worker/file_util.py\", line 150, in zip_directory\n    with open(tmp_zip_name, \"rb\") as out:\nFileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpskc2htsa/tmp.zip'\n",
        "count": 1
    },
    "[Errno 2] No such file or directory: '/tmp/tmpfvh4g25t/tmp.zip'": {
        "uuid": [
            "0x0119e1a1d86c4845b46fb77ef2a9f72a"
        ],
        "traceback": "Traceback (most recent call last):\n  File \"codalab/migration.py\", line 320, in migrate_bundle\n    self.adjust_quota_and_upload_to_blob(bundle_
uuid, bundle_location, is_dir)\n  File \"codalab/migration.py\", line 288, in adjust_quota_and_upload_to_blob\n    raise e  # still raise the expcetion to outer try-cat
ch wrapper\n  File \"codalab/migration.py\", line 280, in adjust_quota_and_upload_to_blob\n    self.upload_to_azure_blob(bundle_uuid, bundle_location, is_dir)\n  File \
"codalab/migration.py\", line 164, in upload_to_azure_blob\n    source_fileobj = zip_directory(bundle_location)\n  File \"/opt/codalab-worksheets/codalab/worker/file_ut
il.py\", line 150, in zip_directory\n    with open(tmp_zip_name, \"rb\") as out:\nFileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpfvh4g25t/tmp.zip'\n",
        "count": 1
    },
    "[Errno 2] No such file or directory: '/tmp/tmpa5cotuv5/tmp.zip'": {
        "uuid": [
            "0x0130e6a3aa654a2a99fb5b6d03ef0a40"
        ],
        "traceback": "Traceback (most recent call last):\n  File \"codalab/migration.py\", line 320, in migrate_bundle\n    self.adjust_quota_and_upload_to_blob(bundle_
uuid, bundle_location, is_dir)\n  File \"codalab/migration.py\", line 288, in adjust_quota_and_upload_to_blob\n    raise e  # still raise the expcetion to outer try-cat
ch wrapper\n  File \"codalab/migration.py\", line 280, in adjust_quota_and_upload_to_blob\n    self.upload_to_azure_blob(bundle_uuid, bundle_location, is_dir)\n  File \
"codalab/migration.py\", line 164, in upload_to_azure_blob\n    source_fileobj = zip_directory(bundle_location)\n  File \"/opt/codalab-worksheets/codalab/worker/file_ut
il.py\", line 150, in zip_directory\n    with open(tmp_zip_name, \"rb\") as out:\nFileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpa5cotuv5/tmp.zip'\n",
        "count": 1
    },
    "[Errno 2] Not found: azfs://storageclwsprod1/bundles/0x0137e17add6648b893945ed29752bd6c/index.sqlite": {
        "uuid": [
            "0x0137e17add6648b893945ed29752bd6c"
        ],
        "traceback": "Traceback (most recent call last):\n  File \"/opt/conda/lib/python3.7/site-packages/azure/storage/blob/_blob_client.py\", line 999, in get_blob_pr
operties\n    **kwargs)\n  File \"/opt/conda/lib/python3.7/site-packages/azure/storage/blob/_generated/operations/_blob_operations.py\", line 393, in get_properties\n
  raise models.StorageErrorException(response, self._deserialize)\nazure.storage.blob._generated.models._models_py3.StorageErrorException: Operation returned an invalid
 status 'The specified blob does not exist.'\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/opt
/conda/lib/python3.7/site-packages/apache_beam/io/azure/blobstorageio.py\", line 609, in __init__\n    properties = self._get_object_properties()\n  File \"/opt/conda/l
ib/python3.7/site-packages/apache_beam/utils/retry.py\", line 253, in wrapper\n    return fun(*args, **kwargs)\n  File \"/opt/conda/lib/python3.7/site-packages/apache_b
eam/io/azure/blobstorageio.py\", line 623, in _get_object_properties\n    return self._blob_to_download.get_blob_properties()\n  File \"/opt/conda/lib/python3.7/site-pa
ckages/azure/core/tracing/decorator.py\", line 83, in wrapper_use_tracer\n    return func(*args, **kwargs)\n  File \"/opt/conda/lib/python3.7/site-packages/azure/storag
e/blob/_blob_client.py\", line 1001, in get_blob_properties\n    process_storage_error(error)\n  File \"/opt/conda/lib/python3.7/site-packages/azure/storage/blob/_share
d/response_handlers.py\", line 147, in process_storage_error\n    raise error\nazure.core.exceptions.ResourceNotFoundError: Operation returned an invalid status 'The sp
ecified blob does not exist.'\nErrorCode:BlobNotFound\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  Fi
le \"codalab/migration.py\", line 316, in migrate_bundle\n    bundle_uuid, disk_location, bundle_info, is_dir, target_location\n  File \"codalab/migration.py\", line 23
2, in sanity_check\n    new_content = read_file_section(new_location, 5, 10)\n  File \"/opt/codalab-worksheets/codalab/worker/file_util.py\", line 460, in read_file_sec
tion\n    if offset >= get_file_size(file_path):\n  File \"/opt/codalab-worksheets/codalab/worker/file_util.py\", line 435, in get_file_size\n    with OpenFile(linked_b
undle_path.bundle_path, 'rb') as fileobj:\n  File \"/opt/codalab-worksheets/codalab/worker/file_util.py\", line 273, in __enter__\n    with OpenIndexedArchiveFile(linke
d_bundle_path.bundle_path) as tf:\n  File \"/opt/codalab-worksheets/codalab/worker/file_util.py\", line 216, in __init__\n    compression_type=CompressionTypes.UNCOMPRE
SSED,\n  File \"/opt/conda/lib/python3.7/site-packages/apache_beam/io/filesystems.py\", line 244, in open\n    return filesystem.open(path, mime_type, compression_type)
\n  File \"/opt/conda/lib/python3.7/site-packages/apache_beam/io/azure/blobstoragefilesystem.py\", line 177, in open\n    return self._path_open(path, 'rb', mime_type,
compression_type)\n  File \"/opt/conda/lib/python3.7/site-packages/apache_beam/io/azure/blobstoragefilesystem.py\", line 138, in _path_open\n    path, mode, mime_type=m
ime_type)\n  File \"/opt/conda/lib/python3.7/site-packages/apache_beam/io/azure/blobstorageio.py\", line 139, in open\n    self.client, filename, buffer_size=read_buffe
r_size)\n  File \"/opt/conda/lib/python3.7/site-packages/apache_beam/io/azure/blobstorageio.py\", line 612, in __init__\n    raise IOError(errno.ENOENT, 'Not found: %s'
 % self._path)\nFileNotFoundError: [Errno 2] Not found: azfs://storageclwsprod1/bundles/0x0137e17add6648b893945ed29752bd6c/index.sqlite\n",
        "count": 1
    },
    "[Errno 2] No such file or directory: '/tmp/tmpg621_i8v/tmp.zip'": {
        "uuid": [
            "0x014015464f1b4cd592ed07119914b1cc"
        ],
        "traceback": "Traceback (most recent call last):\n  File \"codalab/migration.py\", line 320, in migrate_bundle\n    self.adjust_quota_and_upload_to_blob(bundle_
uuid, bundle_location, is_dir)\n  File \"codalab/migration.py\", line 288, in adjust_quota_and_upload_to_blob\n    raise e  # still raise the expcetion to outer try-cat
ch wrapper\n  File \"codalab/migration.py\", line 280, in adjust_quota_and_upload_to_blob\n    self.upload_to_azure_blob(bundle_uuid, bundle_location, is_dir)\n  File \"codalab/migration.py\", line 164, in upload_to_azure_blob\n    source_fileobj = zip_directory(bundle_location)\n  File \"/opt/codalab-worksheets/codalab/worker/file_ut
il.py\", line 150, in zip_directory\n    with open(tmp_zip_name, \"rb\") as out:\nFileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpg621_i8v/tmp.zip'\n",
        "count": 1
    }
}
--------------------------------------------------------------------------------
{
    "adjust_quota_and_upload_to_blob": {
        "max": 315.96052861213684,
        "mean": 13.76251951648343,
        "median": 0.5536258220672607,
        "min": 0.337327241897583,
        "range": 315.62320137023926,
        "std": 47.48400868441128
    },
    "migrate_bundle": {
        "max": 121.69679307937622,
        "mean": 1.7871173399066493,
        "median": 0.3955371379852295,
        "min": 0.10390400886535645,
        "range": 121.59288907051086,
        "std": 6.455369480992315
    }
}
skipped 0(ready) 0(linked bundle) 0(on Azure) bundles, skipped delete due to path DNE 0, PathException 146, error 171 bundles. Succeeed 884 bundles

@AndrewJGaut
Copy link
Contributor Author

AndrewJGaut commented Nov 20, 2023

OK, looking at the current status of the script: looks like the script hangs on certain bundles. Here's some logging output from where the script is currently stuck:

2023-11-15 06:57:21,810 [migration] [process 1], status: 1995 / 48027
2023-11-15 06:57:21,872 [migration] Error: Path '' in bundle 0x42711f47d19940cf9839bbd11ba1016a not found
2023-11-15 06:57:21,872 [migration] [process 1], status: 1996 / 48027
2023-11-15 06:57:22,505 [migration] Uploading from /home/azureuser/codalab-worksheets/var/codalab/home/partitions/codalab7/bundles/0x427178e181f94512a53cbb3fb476e112 to Azure Blob Storage azfs://storageclwsprod1/bundles/0x427178e181f94512a53cbb3fb476e112/contents.tar.gz, uploaded file size is 4207
                Uploading 1 0.00MiB [0.04MiB/sec]
Error for 0x427178e181f94512a53cbb3fb476e112: Traceback (most recent call last):
  File "codalab/migration.py", line 351, in migrate_bundle
    raise ValueError(f"SanityCheck failed with {reason}")
ValueError: SanityCheck failed with Directory file lists differ.

2023-11-15 06:57:22,717 [migration] [process 1], status: 1997 / 48027
2023-11-15 06:57:22,872 [migration] Uploading from /home/azureuser/codalab-worksheets/var/codalab/home/partitions/codalab6/bundles/0x42719e54664049518a4de1dc1e5fa81f to Azure Blob Storage azfs://storageclwsprod1/bundles/0x42719e54664049518a4de1dc1e5fa81f/contents.gz, uploaded file size is 264
                Uploading 1 0.00MiB [0.01MiB/sec]                       2023-11-15 06:57:23,079 {'path': '', 'name': 'contents', 'offsetheader': None, 'offset': 0, 'size': 264, 'mtime': 0, 'mode': 511, 'type': None, 'linkname': None, 'uid': 0, 'gid': 0, 'istar': 0, 'issparse': 0}
2023-11-15 06:57:23,079 tf.index_file_name: /tmp/tmp6dwv4cn5.sqlite

2023-11-15 06:57:23,674 [migration] Modifying bundle info 0x42719e54664049518a4de1dc1e5fa81f in database
2023-11-15 06:57:23,724 [migration] [process 1], status: 1998 / 48027
2023-11-15 06:58:53,505 [migration] Uploading from /home/azureuser/codalab-worksheets/var/codalab/home/partitions/codalab9/bundles/0x4271d090b8cc405483725d7bcdcc5c05 to Azure Blob Storage azfs://storageclwsprod1/bundles/0x4271d090b8cc405483725d7bcdcc5c05/contents.tar.gz, uploaded file size is 1346187918
                Uploading 1 518.92MiB [10.39MiB/sec]

We see that it's been uploading this bundle since Novemebr 15th. It's now November 20th. And here is the time output from when I killed the process:

real    8255m27.175s
user    1m27.820s
sys     2m48.028s

It's clear that it's stuck.

Looking at that bundle, I don't see anything special about it... I wonder if there's an issue with uploading certain bundles to Azure? still exploring

@pranavjain
Copy link
Contributor

@dma1dma1 Do we need this mega issue? Can you file individual issues for the tasks you are working on and close this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
p1 Do it in the next two weeks.
Projects
None yet
Development

No branches or pull requests

4 participants