Skip to content

Latest commit

 

History

History
 
 

diskimage

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
layout title permalink
page
PCjs DiskImage Utility
/tools/diskimage/

DiskImage is a Node command-line application that reads/writes PCjs v2 disk images, using the DiskInfo PCx86 machine module to parse the data, and it supersedes the older PCjs DiskDump utility.

PCjs Disk Images

PCjs v2 (version 2.x) disk images are JSON objects with the following properties:

  • imageInfo: an object describing the type of disk (eg, "CHS") and other characteristics
  • volTable: an array of VolInfo objects, one for each logical FAT volume on the disk (optional)
  • fileTable: an array of FileInfo objects, one for each file across all FAT volumes on the disk (optional)
  • diskData: arrays of sector data; for CHS disk images, the data is organized as an array of sector objects within an array of heads within an array of cylinders

Sector objects contain the raw sector data, using 32-bit signed decimal values, and for sectors that contain data from a file in fileTable, the object will also contain an index into fileTable, along with the data's offset within the file.

For example, look at the PC DOS 2.00 diskette and examine the following sector object:

{"c":6,"h":0,"s":1,"l":512,[-402170023,521076672,...],"f":2,"o":1024}

The f property is a zero-based index into fileTable that refers to the following file entry:

{"hash":"f0314c0d450f343fdc6d1778be088ffc","path":"/COMMAND.COM","attr":"0x20","date":"1983-03-08 12:00:00","size":17664}

and the o property indicates that sector's byte offset (1024) within COMMAND.COM. Having all this information about the FAT file system encoded in the disk image makes it easy to observe low-level I/O and immediately know which portions of which files are being accessed. Other tasks, such as cataloging the contents of disk images, locating identical files, finding files older or newer than a certain date, or extracting other information about the disks or their files, become much simpler as well.

Unusual sectors, like those used in copy-protection schemes, may have non-standard sector IDs or additional properties that simulate special behavior; for example, sectors with a dataError property can trigger a read or write failure at a certain point within the sector. Some of these properties have been discussed in PCjs blog posts and may be documented more fully at a later date.

Older PCjs v1 (version 1.x) disk images were basically just an array of CHS sector data (what is now called the diskData object), without any other information. Such disk images are still supported, but all the disk images now stored on PCjs disk servers, such as diskettes.pcjs.org, have been converted to the v2 format.

Building PCjs Disk Images from IMG files

To build a PCjs disk image, such as this PC DOS 2.00 diskette, from an IMG file:

node diskimage.js /diskettes/pcx86/sys/dos/2.00/PCDOS200-DISK1.img PCDOS200-DISK1.json

In addition to IMG files, DiskImage also includes (experimental) support for PSI (PCE Sector Image) files, which can in turn be built from Kryoflux RAW files. Here are the basic steps, using tools from PCE:

  1. From the Kryoflux RAW files, create a PFI ("PCE Flux Image") file
  2. Next, create a PRI ("PCE Raw Image") file, with the flux reversal pulses converted to bits
  3. From the PRI file, create a PSI ("PCE Sector Image") file
  4. From the PSI file, create a JSON-encoded disk image file, using the PCjs DiskImage utility

which translates to these commands (using a 360K PC diskette named "disk1" as an example):

pfi disk1.00.0.raw disk1.pfi
pfi disk1.pfi -p double-step -r 600000 -p decode pri disk1.pri
pri disk1.pri -c 40-99 -p delete disk1.pri
pri disk1.pri -p decode mfm disk1.psi
node diskimage.js disk1.psi disk1.json

Building PCjs Disk Images from Directories

To build a VisiCalc diskette from a directory containing VC.COM, specify the name of the directory, including a trailing slash, like so:

node diskimage.js /miscdisks/pcx86/app/other/visicalc/1981/VISICALC-1981/ VISICALC-1981.json

By default, the diskette will be given an 11-character volume label derived from the directory name (eg, "VISICALC-19"); however, you can use --label to specify your own label (eg, --label=VISICALC81), or use --label=none to suppress the volume label.

The smallest standard PC diskette format that can accommodate all the files will be automatically selected, but you can specify a different target size (in Kb) using --target=N, where N is 160, 180, 320, 360, 720, 1200, or 1440. For example, if your diskette must work with PC DOS 1.0, use --target=160.

Another useful option is --normalize, which will transform the line-endings in all recognized text files from LF to CR/LF; a recognized text file is any file ending with one of these extensions (.MD, .ME, .BAS, .BAT, .ASM, .LRF, .MAK, .TXT, OR .XML) AND which contains only 7-bit ASCII characters -- since some files, like .bas files, can contain either ASCII or non-ASCII data. The list of recognized text file extensions is likely to grow over time.

Building PCjs Disk Images from ZIP (and ARC) files

There are many large software collections where the diskette contents have been archived as ZIP files rather than as disk images, and in theory, it's trivial to unzip them into separate folders and then use DiskImage to build new images from those folders (see above).

For example, I originally recreated all the PC-SIG Library diskette images from the "PC-SIG Library Eighth Edition" CD-ROM files stored at cd.textfiles.com. Some of the diskettes on the CD-ROM had been completely archived as single ZIP files -- probably because the diskettes contained filenames that were not allowed on CD-ROM -- so I used unzip on macOS to extract those ZIP files to folders, and then recreated disk images from those folders.

However, this process doesn't always work well. DISK0798 highlights a few issues that have already been discussed on GitHub.

First, the original order of the filenames was not preserved. Modern operating systems (eg, macOS) list files alphabetically, and as a result, the files on the recreated diskettes were sorted alphabetically as well.

Second, while the ZIP archives appeared to more-or-less preserve non-ASCII filenames, unzip did not. IBM PCs used a character set now known as Code Page 437 (CP437), which included a variety of line-drawing characters and other symbols that unzip failed to translate to their modern (UTF-8) counterparts.

To resolve all these issues, I updated DiskImage with an option (--zip) to read ZIP archives directly. I started with an NPM package called node-stream-zip, which is essentially a module that understands the ZIP file format, identifies all the compressed files inside the ZIP file, and uses Node's built-in zlib functionality to decompress them.

Unfortunately, I quickly discovered that zlib could not decompress the contents of many old ZIP files, because instead of the now-popular Deflate compression algorithm, older ZIP files used compression methods such as Shrink, Reduce, and Implode. So I imported node-stream-zip into PCjs as StreamZip, modernized it, and then extended it with a new decompression module named LegacyZip, which I wrote by hand-translating the excellent C code at hanshq.net into JavaScript.

Here's an example of --zip in action:

node diskimage.js --zip=/Volumes/PCSIG_13B/BBS/DISK0042.ZIP --output=DISK0042.json --verbose

DiskImage v2.11
Copyright © 2012-2023 Jeff Parsons <[email protected]>
options: --zip=/Volumes/PCSIG_13B/BBS/DISK0042.ZIP --output=DISK0042.json --verbose

/Volumes/PCSIG_13B/BBS/DISK0042.ZIP
Filename        Length   Method      Size  Ratio   Date       Time       CRC
--------        ------   ------      ----  -----   ----       ----       ---
MSVIBM.EXE      131392   Implode    73651    44%   1990-02-19 19:30:30   ac9163ba
MSR300.UPD       20338   Implode     8627    58%   1990-02-19 19:35:12   61372fc6
MSKERM.HLP       35263   Implode    13799    61%   1990-02-19 19:39:16   1c61d95c
MSKERM.BWR       27985   Implode    12132    57%   1990-02-19 19:42:28   353a76ed
MSKERMIT.INI      4760   Implode     2309    51%   1990-02-19 20:07:20   00e884a5
GO.BAT              40   Shrink        38     5%   1980-01-01 06:00:08   75d72756
FILE0042.TXT      3870   Implode      896    77%   1990-11-12 01:46:16   3a817bda
GO.TXT            1002   Implode      307    69%   1990-11-09 06:21:54   e64455e9
processing DISK0042: 327680 bytes (checksum -1217186896, hash bba045788185bc8284f5e4cde0929b70)
writing DISK0042.json...

The --verbose option generates the PKZIP-style file listing, displaying the individual file names, compressed and uncompressed file sizes, compression ratio, etc.

In fact, creating a disk image is entirely optional; you can use DiskImage to simply examine the contents of ZIP file:

node diskimage.js --zip=/Volumes/PCSIG_13B/BBS/DISK0042.ZIP --verbose

To simplify dealing with large collections of files, I also added an --all option:

node diskimage.js --all="/Volumes/PCSIG_13B/**/*.ZIP" --verbose

That command will locate all matching ZIP files and process each one with any other options you specify (eg, --verbose to display their contents). --all also supports file extensions JSON and IMG; --disk is assumed for any file ending with one of those extensions, whereas --zip is assumed for any file ending with a ZIP extension.

If you want to create a disk image for every ZIP file:

node diskimage.js --all="/Volumes/PCSIG_13B/**/*.ZIP" --output=tmp --type=img

--output specifies the output folder and --type specifies the output file type (either IMG or JSON). Each output file will have the same basename as the ZIP file. You can use also "%d" anywhere in the --output value to represent the directory of the corresponding input file (eg, --output=%d, --output=%d/tmp, etc).

Any ZIP files inside disk images can be automatically expanded during disk image processing as well; just add the new --expand option. Each ZIP file will be replaced with a folder of the same name, and that folder will contain the entire uncompressed contents of the archive; the original ZIP file will not be included in the disk image:

node diskimage.js --all="/Volumes/PCSIG_13B/**/*.ZIP" --expand --output=tmp

Finally, support for the ARC file format (ZIP's predecessor) is now available. Just use --arc instead of --zip, or specify input files with .ARC extensions instead of .ZIP. All the same capabilities apply.

Extracting Files from PCjs Disk Images

You can extract the contents of a single disk image to your current directory, or to a specific directory using --extdir:

node diskimage.js DISK0001.IMG --extract
node diskimage.js DISK0001.IMG --extract --extdir=tmp

You can also extract the contents of an entire collection of disk images, placing the contents of each either in the same directory as the original disk image or in a specific directory:

node diskimage.js --all="*.IMG" --extract --extdir=%d
node diskimage.js --all="*.IMG" --extract --extdir=tmp

You can also expand any ZIP files during the extraction process, by including the --expand option:

node diskimage.js --all="*.IMG" --extract --expand --extdir=tmp

Also, while the --normalize option was originally created to "normalize" files read from the host (eg, to convert LF to CR/LF in text files), it can also be used during extraction now, when files are being written to the host.

For example, if you want any filenames with CP437 characters to be created properly on the host, or you want the contents of any CP437 text files, BASIC files, etc, to be stored in readable form on the host, use the --normalize option along with the --extract option; eg:

node diskimage.js --all="/Volumes/PCSIG_13B/**/*.ZIP" --extract --expand --normalize --extdir=tmp

In addition to converting line-endings back from CR/LF to LF, --normalize will also convert any tokenized .BAS files to plain-text UTF-8 files on the host, as well as decrypt any .BAS files that have been "protected" by BASIC with the P option of the SAVE command.

The --output option is available with all of the above commands as well, but that option only affects disk image creation, not file extraction. If you don't want any disk images created at the same time, don't use --output.

Examining PCjs Disk Images

Both local and remote diskette images can be examined. To examine a remote image, you must use the --disk option, with either an explicit URL, as in:

node diskimage.js --disk=https://diskettes.pcjs.org/pcx86/sys/dos/ibm/2.00/PCDOS200-DISK1.json

or with one of PCjs' implicit diskette paths, such as /diskettes, which currently maps to disk server https://diskettes.pcjs.org:

node diskimage.js --disk=/diskettes/pcx86/sys/dos/ibm/2.00/PCDOS200-DISK1.json

If you happen to have a local file that exists in the same location as the implicit diskette path, use --server to force the server mapping. The list of implicit paths for PC disks currently includes (but is not limited to):

NOTE: Implicit disk paths should normally begin with /disks, because when running PCjs locally, that's where any local copies of PCjs disk repositories are assumed to exist; however, PCjs and DiskImage will allow you to omit that portion, for convenience (and backward compatibility). In other words, /diskettes will be automatically mapped to /disks/diskettes for local access and https://diskettes.pcjs.org for remote access.

Commonly Used DiskImage Options

To get a DOS-compatible directory listing of a disk image:

node diskimage.js /diskettes/pcx86/sys/dos/ibm/2.00/PCDOS200-DISK1.json --list

To display all the unused bytes of a disk image (JSON-encoded disk images only):

node diskimage.js /diskettes/pcx86/sys/dos/ibm/2.00/PCDOS200-DISK1.json --list=unused

NOTE: Unused bytes are a superset of free bytes. Free bytes are always measured in terms of unused clusters, multiplied by the cluster size, whereas unused bytes are the combination of all completely unused cluster space plus any partially unused cluster space. Being able to see all the unused bytes on a disk can be useful for studying disk image usage, or simply making sure that a disk is free of any unwanted data.

TODO: Update the unused byte report to include unused bytes, if any, in all FAT sectors and directory sectors.

To extract all the files from a disk image:

node diskimage.js /diskettes/pcx86/sys/dos/ibm/2.00/PCDOS200-DISK1.json --extract

To extract a specific file from a disk image:

node diskimage.js /diskettes/pcx86/sys/dos/ibm/2.00/PCDOS200-DISK1.json --extract=COMMAND.COM

To extract files from a disk image into a specific directory (eg, tmp):

node diskimage.js /diskettes/pcx86/sys/dos/ibm/2.00/PCDOS200-DISK1.json --extract --extdir=tmp

To dump a specific (C:H:S) sector from a disk image:

node diskimage.js /diskettes/pcx86/sys/dos/ibm/2.00/PCDOS200-DISK1.json --dump=0:0:1

To dump multiple (C:H:S) sectors from a disk image track, follow the C:H:S values with a sector count; eg:

node diskimage.js /diskettes/pcx86/sys/dos/ibm/2.00/PCDOS200-DISK1.json --dump=0:0:1:4