CS3103-Lecture-9

File Concept

  • File is a contiguous logical address space for storing information

    • database, audio, video, web pages…
  • There are different types of file

    • data: numeric, character, binary, multimedia

    • program

    • device

    • proc file system

      • file-system interface to retrieve system information

File Attributes

  • Name - the only information kept in human-readable form
  • Identifier - uniuqe tag (number) identifies file within file system
  • Type - needed for systems that support different types
  • Location - pointer to file location on device
  • Size - current file size
  • Protection - controls who can do reading, writing, executing
  • Time, date, and user identification - data for protection, securtiy, and usage monitoring
  • Many variations, including extended file attributes such as file checksum

File Operations

  • OS provides file operations to
    • create:
      • space in the file system should be found
      • an entry must be allocated in the directory
    • open:
      • return a handler for other perations
      • most operations need the file to be opened first
    • read/write:
      • need to maintain a pointer
    • repostion within file -seek
    • delete
      • release file space
    • truncate:
      • shortening the length of a file. When truncated, its contents are preserved up to a certain point, after which any data that falls beyond that point is deleted or lost
    • other operations can be implemented using these ones
      • copy: create and read/write

Open Files

  • Data needed to manage open files:
    • File handle: a data structure that represents an open file. It countains information such as the file’s name, its current position in the file, and any file-related flags or attributes (such as read-only or write-through mode)

    • File descriptor: a small integer that identifier an open file within a process. It is used by the operating system to manage the file’s access permissions and to track its usage by various processes.

    • File table: a data structure that maintains a list of all open files in the system, along with their associated file handles and file descriptors.

    • File cache: A file cache is a portion of memory that the operating system uses to store recently accessed portions of open files.

    • Lock table: keeps track of which processes have locked which files and ensures that conflicting lock requests are resolved appropriately.

    • I/O buffer: a portion of memory that is used to temporarily store file data as it is read or written. The OS typically uses a pool of I/O buffers to improve file access performance and to minimize disk accesses.

File Types - Name, Extension

Access Methods

  • Sequential access
    • a group of elements is access in a predetermined order
    • for some media types, the only access mode (e.g., tape)
  • Direct access
    • access an element at an arbitrary position in a sequence in (roughly) equal time,
      independent of sequence size
      • it is possible to emulate random access in a tape, but access time varies
    • sometime called random access

Directory Structure

  • Directory: a collection of nodes containing information about files

A Typical File-system Organization

  • Operations performed on directory
    • Search for a file
    • Create a file
    • Delete a file
    • List a directory
    • Rename a file
    • Traverse the file system

Directory Organization

  • Organize the directory (logically) to obtain
    • Efficiency – locating a file quickly
    • Naming – convenient to users
      • Two users can have the same name for different files
      • The same file can have several different names
    • Grouping – logical grouping of files by properties, (e.g., all Java programs, all games, …)

Single-Level Directory

  • A single directory for all users
    • naming problems and grouping problems
      • Two users want to have same file names
      • Hard to group files

Two-Level Directory

  • Separate directory for each user
    • different users can have the same name for different files
      • Each user has his own user file directory (UFD), it is in the master file directory (MFD)
    • efficient to search
    • cannot group files
    • How to share files between different users, and how to share the system files?

Tree-Structured Directories

  • Files organizaed into trees

    • efficient in searching, can group files, convenient naming
  • File can be accessed using absolute or relative path name

    • absolute path name: /home/alice/..
    • relative path is relative to the current directory (pwd)
      • creating a new file, delete a file, or create a sub-directory
        • e.g., if current directory is /mail, a mkdir count will create /mail/count
    • How to share a file/directory?
      • It’s not allowed
  • Delete directory

    • If directory is empty, then it’s easy to handle
    • If not
      • Option I: directory cannot be deleted, unless it’s empty
      • Option II: delete all the files, directories and sub-directories
      • rm -rf /

Acyclic-Graph Directories

  • Organize directories into acyclic-graphs
    • Allow links to a directory entry/files for
      aliasing (no longer a tree)
  • Dangling pointer problem
    • e.g., if delete file /dict/all, then /dict/w/list and /spell/words/list are dangling pointers
    • Solution: backpointers/reference counter
  • backpointers record all the pointers to the entity, a variable size record
  • Or count # of links to it and only (physically) delete it when counter is zero
  • Accesses the data available in the original file
  • If earlier selected file deleted, the hard link still
    contain the data of that file

  • Does not access the data available in the original file
  • If earlier file deleted, the soft link will point to a file that does not exist anymore

General Graph Directory

  • Allowing arbitrary links may generate cycles in the directory structure
  • Solution
    • Allow cycles, but use garbage collection to reclaim disk spaces
    • Detect the cycle every time a new link is added

File

File System Mounting

  • A file system must be mounted before it can be accessed
    • Mounting a file system attaches that file system to a directory (mount point) and makes it available to the system
  • The root (/) file system is always mounted. Any other file system can be connected or disconnected from the root (/) file system
    • The old directories connected to the mount point becomes invisible

File Sharing

  • Sharing of files on multi-user systems is desirable
    • Sharing must be done through a protection scheme
      • User IDs identify users, allowing protections to be per user
      • Group IDs allow users to be in groups, permitting group access rights
    • On distributed systems, files may be shared across a network
      • Network File System (NFS) is a common distributed filesharing method

Remote File Sharing

  • Use networking to allow file system access between systems
    • manually via programs like FTP (File Transfer Protocol) clients
    • automatically, seamlessly using distributed file systems
    • semi-automatically via the world wide web (e.g., hyperlinked documents)
  • Client-server model allows clients to mount remote FS from servers
    • a server can serve multiple clients
    • client and user-on-client identification is complicated
  • server cannot assume the client is trusted
  • standard OS file calls are translated into remote calls
  • NFS (Network File System) is standard for UNIX
  • CIFS (Common Internet File System) is standard for Windows

Protection

  • File owner/creator should be able to control
    – what can be done
    – by whom
  • Types of access
    – read
    – write, append
    – execute
    – delete
    – list

Access Countrol List

  • Assign each file and directory with an access control list (ACL)
  • Fine-grained control
  • Issues
    • How to construct the list?
    • How to store the list in directory?
  • Not widely used in modern file systems

Unix Access Control

  • Three access modes: read, write, execute (encoded in three bits)
  • Three classes of users: owner, group and others
  • To grant access to users, change access mode
    • E.g., use chmod

File Access Control Examples

File-System Structure

  • File structure
    – Logical storage unit
    – Collection of related information
  • File system on secondary storage (disks)
    – Provides user interface to storage, mapping logical to physical
    – Provides efficient and convenient access to disk by allowing data to be stored,
    located retrieved easily
  • File control block (FCB): structure consisting of information of a file
    – E.g., Unix inode structure
  • Device driver controls the physical device

File System Layers

  • File system organized into layers

  • Device drivers manage I/O devices at the I/O control layer

    • Receive commands like “read drive1, cylinder 72, track 2, sector 10, into memory location 1060”
    • Outputs low-level hardware-specific commands to hardware controller
  • Basic file system given command like “retrieve block 123” translates to device driver

    • Also manages memory buffers and caches
      (allocation, freeing, replacement)
  • Buffers hold data in transit

  • Caches hold frequently used data

  • File-organization module understands files, logical address, and physical blocks

    • Translates logical block # to physical block #
    • Manages free space
  • maintains the list of free blocks, and allocates free blocks to files as needed

  • Logical file system manages metadata information

    • The level at which users can request file operations by system calls
    • Provides a consistent view of what might be multiple physical file systems and multiple file system implementations
    • Translates file name into file number, file handle, location by maintaining file control blocks
    • Directory management
    • Protection
  • Many file systems, sometimes many within an OS

    • CD-ROM is ISO 9660
    • Unix has UFS, FFS
    • Windows has FAT, FAT32, NTFS
    • Linux has more than 130 types, with extended file system ext3 and ext4 leading; plus distributed file systems, etc.
    • New ones still arriving – ZFS, GoogleFS, Oracle ASM, FUSE
    • … …

File Control Block (FCB)

  • OS maintains FCB per file, contains many details about the file
    • E.g., inode number, permissions, size, dates
    • Example

Structures to Implement a File System

  • On-Disk
    • Boot control block per volume
    • Volume control block per volume
    • Directory structure per file system
    • A FCB per file
  • In-memory
    • In-memory mount table with entries for each mounted volume
    • In-memory directory structure cache
    • System-wide open-file table : contain FCB, etc

Inode

  • An inode is a data structure in Unix that denotes a file or a directory on file system
    • It contains information about file, such as location of file on the disk, access mode, ownership, file type …
  • Each inode has a number that is used in the index table
    • Unix kernel uses inode number to access the contents of an inode

Directory Implementation

  • Linear list of file names with pointer to the data blocks
    • simple to program
    • time-consuming to execute
  • Hash Table: linear list with hash data structure
    • hash function on the file name determines the key directory
  • Potential collisions
    • decreases directory search time
    • fixed size

File Creation

  • Application process requests the creation of a new file
  • Logical file system allocates a new FCB
  • Appropriate directory updated with the new file name and FCB
  • Disk blocks allocated for files

Allocation Methods

  • Allocation method: how disk blocks are allocated for files
  • The main problem is how to allocate space to files so that disk space is utilized effectively and files can be accessed quickly
  • Three major methods:
    • Contiguous allocation
    • Linked allocation
    • Indexed allocation

Contiguous Allocation

  • Each file occupies a set of contiguous blocks on the disk

  • Simple – only starting location (block #) and length (number of blocks) are required

  • Random access

    • the address of the kth block starting at block b can easily be obtained as (b+k)
  • Advantages:

    • Both the Sequential and Random Accesses are supported
    • Extremely fast since the number of seeks are minimal because of contiguous allocation of file blocks
  • Disadvantages:

    • Suffers external fragmentation (similar to memory allocation)
  • Increasing file size is difficult as it depends on the availability of contiguous space

Linked Allocation

  • Linked allocation: each file is a linked list of disk blocks

    • each block contains pointer to next block, file ends at null pointer
    • blocks may be scattered anywhere on the disk (no external fragmentation)
  • Advantages

    • Flexible in terms of file size
    • Does not suffer from external fragmentation
  • Disadvantages

    • Slow: locating a file block can take many I/Os and disk seeks
  • Possible improvements: cluster blocks; however, has internal fragmentation

    • Does not support random/direct access
    • Pointer size overhead
    • Reliability: what if the pointer has corrupted

Flie Allocation Table

  • A file allocation table (FAT) is a table that an OS maintains on a disk that provides a map of the blocks that a file is stored in
  • Example:

Index Allocation

  • Indexed allocation: each file
    has its own index blocks of
    pointers to its data blocks

  • Advantages:

    • Fast
    • Supports random/direct access to the blocks
    • No external fragmentation
  • Disadvantages:

    • The pointer overhead for indexed allocation is greater than linked allocation
    • for very small files, say files that expand only 2-3 blocks, the indexed allocation would keep one entire block (index block) for the pointers
  • Multiple-level index blocks (e.g., 2-level)

  • Combined approach

    • Used in Unix

Extent-Based Systems

  • Many new file systems (e.g., Veritas File System) use a partially contiguous allocation scheme
  • Extent-based file systems allocate disk blocks in extents
  • An extent is a contiguous block of disks
    • Extents are allocated for file allocation
    • A file consists of one or more extents
  • Mitigate external fragmentation

Free-Space Management

  • File system maintains free-space list to track available blocks
    • The space of deleted files should be reclaimed
  • Many management methods:
    • bit vector (bit map)
    • linked free space

Bitmap Free-Space Management

  • Use one bit for each block, track its allocation status
    • Searching contiguous blocks is easy/efficient
    • requires extra space
  • example:
    • block size = 4KB = 212 bytes
    • disk size = 240 bytes (1 terabyte)
    • n = 240/212 = 228 bits (or 256 MB)
      >> if clusters of 4 blocks -> 64MB of memory

Linked Free Space

  • Keep free blocks in linked list

    • no wate of space, just use the memory in the free block for pinters
    • usually no need to traverse the entire list:
      return the first one
    • cannot get contiguous space easily
  • Simple linked list of free-space is inefficient

    • one extra disk I/O to allocate one free block disk

      • I/O is extremely slow
    • allocating multiple free blocks require traverse the list

    • difficult to allocate contiguous free blocks

Grouping

  • Grouping: using indexs to group free blocks
    • stroe address of n-1 free blocks in the first free block, and the next index block