CS3103-Lecture-9
File Concept
- File is a contiguous logical address space for storing information - database, audio, video, web pages…
 
- There are different types of file - data: numeric, character, binary, multimedia 
- program 
- device 
- proc file system - file-system interface to retrieve system information
 
- … 
 
File Attributes
- Name - the only information kept in human-readable form
- Identifier - uniuqe tag (number) identifies file within file system
- Type - needed for systems that support different types
- Location - pointer to file location on device
- Size - current file size
- Protection - controls who can do reading, writing, executing
- Time, date, and user identification - data for protection, securtiy, and usage monitoring
- Many variations, including extended file attributes such as file checksum
File Operations
- OS provides file operations to - create:- space in the file system should be found
- an entry must be allocated in the directory
 
- open:- return a handler for other perations
- most operations need the file to be opened first
 
- read/write:- need to maintain a pointer
 
- repostion within file -seek
- delete- release file space
 
- truncate:- shortening the length of a file. When truncated, its contents are preserved up to a certain point, after which any data that falls beyond that point is deleted or lost
 
- other operations can be implemented using these ones- copy: create and read/write
 
 
- create:
Open Files
- Data needed to manage open files:- File handle: a data structure that represents an open file. It countains information such as the file’s name, its current position in the file, and any file-related flags or attributes (such as read-only or write-through mode) 
- File descriptor: a small integer that identifier an open file within a process. It is used by the operating system to manage the file’s access permissions and to track its usage by various processes. 
- File table: a data structure that maintains a list of all open files in the system, along with their associated file handles and file descriptors. 
- File cache: A file cache is a portion of memory that the operating system uses to store recently accessed portions of open files. 
- Lock table: keeps track of which processes have locked which files and ensures that conflicting lock requests are resolved appropriately. 
- I/O buffer: a portion of memory that is used to temporarily store file data as it is read or written. The OS typically uses a pool of I/O buffers to improve file access performance and to minimize disk accesses. 
 
File Types - Name, Extension

Access Methods
- Sequential access- a group of elements is access in a predetermined order
- for some media types, the only access mode (e.g., tape)
 
- Direct access- access an element at an arbitrary position in a sequence in (roughly) equal time,
 independent of sequence size- it is possible to emulate random access in a tape, but access time varies
 
- sometime called random access 
 
- access an element at an arbitrary position in a sequence in (roughly) equal time,
Directory Structure
- Directory: a collection of nodes containing information about files 
A Typical File-system Organization
- Operations performed on directory- Search for a file
- Create a file
- Delete a file
- List a directory
- Rename a file
- Traverse the file system
- … 
 
Directory Organization
- Organize the directory (logically) to obtain- Efficiency – locating a file quickly
- Naming – convenient to users- Two users can have the same name for different files
- The same file can have several different names
 
- Grouping – logical grouping of files by properties, (e.g., all Java programs, all games, …)
 
Single-Level Directory
- A single directory for all users- naming problems and grouping problems- Two users want to have same file names
- Hard to group files 
 
 
- naming problems and grouping problems
Two-Level Directory
- Separate directory for each user- different users can have the same name for different files- Each user has his own user file directory (UFD), it is in the master file directory (MFD)
 
- efficient to search
- cannot group files
- How to share files between different users, and how to share the system files? 
 
- different users can have the same name for different files
Tree-Structured Directories
- Files organizaed into trees - efficient in searching, can group files, convenient naming 
 
- efficient in searching, can group files, convenient naming
- File can be accessed using absolute or relative path name - absolute path name: /home/alice/..
- relative path is relative to the current directory (pwd)- creating a new file, delete a file, or create a sub-directory- e.g., if current directory is /mail, a mkdir count will create /mail/count
 
 
- creating a new file, delete a file, or create a sub-directory
- How to share a file/directory?- It’s not allowed
 
 
- Delete directory - If directory is empty, then it’s easy to handle
- If not- Option I: directory cannot be deleted, unless it’s empty
- Option II: delete all the files, directories and sub-directories
- rm -rf /
 
 
Acyclic-Graph Directories
- Organize directories into acyclic-graphs- Allow links to a directory entry/files for
 aliasing (no longer a tree)
 
- Allow links to a directory entry/files for
- Dangling pointer problem- e.g., if delete file /dict/all, then /dict/w/list and /spell/words/list are dangling pointers
- Solution: backpointers/reference counter
 
- backpointers record all the pointers to the entity, a variable size record
- Or count #of links to it and only (physically) delete it when counter is zero
Hardlink
- Accesses the data available in the original file
- If earlier selected file deleted, the hard link still
 contain the data of that file  
Softlink(Symblic link)
- Does not access the data available in the original file
- If earlier file deleted, the soft link will point to a file that does not exist anymore  
General Graph Directory
- Allowing arbitrary links may generate cycles in the directory structure 
- Solution- Allow cycles, but use garbage collection to reclaim disk spaces
- Detect the cycle every time a new link is added
 
File
File System Mounting
- A file system must be mounted before it can be accessed- Mounting a file system attaches that file system to a directory (mount point) and makes it available to the system
 
- The root (/) file system is always mounted. Any other file system can be connected or disconnected from the root (/) file system- The old directories connected to the mount point becomes invisible 
 
- The old directories connected to the mount point becomes invisible
File Sharing
- Sharing of files on multi-user systems is desirable- Sharing must be done through a protection scheme- User IDs identify users, allowing protections to be per user
- Group IDs allow users to be in groups, permitting group access rights
 
- On distributed systems, files may be shared across a network- Network File System (NFS) is a common distributed filesharing method
 
 
- Sharing must be done through a protection scheme
Remote File Sharing
- Use networking to allow file system access between systems- manually via programs like FTP (File Transfer Protocol) clients
- automatically, seamlessly using distributed file systems
- semi-automatically via the world wide web (e.g., hyperlinked documents)
 
- Client-server model allows clients to mount remote FS from servers- a server can serve multiple clients
- client and user-on-client identification is complicated
 
- server cannot assume the client is trusted
- standard OS file calls are translated into remote calls
- NFS (Network File System) is standard for UNIX
- CIFS (Common Internet File System) is standard for Windows
Protection
- File owner/creator should be able to control
 – what can be done
 – by whom
- Types of access
 – read
 – write, append
 – execute
 – delete
 – list
Access Countrol List
- Assign each file and directory with an access control list (ACL)
- Fine-grained control
- Issues- How to construct the list?
- How to store the list in directory?
 
- Not widely used in modern file systems
Unix Access Control
- Three access modes: read, write, execute (encoded in three bits)
- Three classes of users: owner, group and others 
- To grant access to users, change access mode- E.g., use chmod 
 
- E.g., use chmod
File Access Control Examples

File-System Structure
- File structure
 – Logical storage unit
 – Collection of related information
- File system on secondary storage (disks)
 – Provides user interface to storage, mapping logical to physical
 – Provides efficient and convenient access to disk by allowing data to be stored,
 located retrieved easily
- File control block (FCB): structure consisting of information of a file
 – E.g., Unix inode structure
- Device driver controls the physical device
File System Layers
- File system organized into layers  
- Device drivers manage I/O devices at the I/O control layer - Receive commands like “read drive1, cylinder 72, track 2, sector 10, into memory location 1060”
- Outputs low-level hardware-specific commands to hardware controller
 
- Basic file system given command like “retrieve block 123” translates to device driver - Also manages memory buffers and caches
 (allocation, freeing, replacement)
 
- Also manages memory buffers and caches
- Buffers hold data in transit 
- Caches hold frequently used data 
- File-organization module understands files, logical address, and physical blocks - Translates logical block # to physical block #
- Manages free space
 
- maintains the list of free blocks, and allocates free blocks to files as needed 
- Logical file system manages metadata information - The level at which users can request file operations by system calls
- Provides a consistent view of what might be multiple physical file systems and multiple file system implementations
- Translates file name into file number, file handle, location by maintaining file control blocks
- Directory management
- Protection
 
- Many file systems, sometimes many within an OS - CD-ROM is ISO 9660
- Unix has UFS, FFS
- Windows has FAT, FAT32, NTFS
- Linux has more than 130 types, with extended file system ext3 and ext4 leading; plus distributed file systems, etc.
- New ones still arriving – ZFS, GoogleFS, Oracle ASM, FUSE
- … …
 
File Control Block (FCB)
- OS maintains FCB per file, contains many details about the file- E.g., inode number, permissions, size, dates
- Example 
 
Structures to Implement a File System
- On-Disk- Boot control block per volume
- Volume control block per volume
- Directory structure per file system
- A FCB per file
 
- In-memory- In-memory mount table with entries for each mounted volume
- In-memory directory structure cache
- System-wide open-file table : contain FCB, etc 
 
Inode
- An inode is a data structure in Unix that denotes a file or a directory on file system- It contains information about file, such as location of file on the disk, access mode, ownership, file type …
 
- Each inode has a number that is used in the index table- Unix kernel uses inode number to access the contents of an inode 
 
- Unix kernel uses inode number to access the contents of an inode
Directory Implementation
- Linear list of file names with pointer to the data blocks- simple to program
- time-consuming to execute 
 
- Hash Table: linear list with hash data structure- hash function on the file name determines the key directory
 
- Potential collisions- decreases directory search time
- fixed size 
 
File Creation
- Application process requests the creation of a new file
- Logical file system allocates a new FCB
- Appropriate directory updated with the new file name and FCB
- Disk blocks allocated for files
Allocation Methods
- Allocation method: how disk blocks are allocated for files
- The main problem is how to allocate space to files so that disk space is utilized effectively and files can be accessed quickly
- Three major methods:- Contiguous allocation
- Linked allocation
- Indexed allocation
 
Contiguous Allocation
- Each file occupies a set of contiguous blocks on the disk 
- Simple – only starting location (block #) and length (number of blocks) are required 
- Random access - the address of the kth block starting at block b can easily be obtained as (b+k) 
 
- the address of the kth block starting at block b can easily be obtained as (b+k)
- Advantages: - Both the Sequential and Random Accesses are supported
- Extremely fast since the number of seeks are minimal because of contiguous allocation of file blocks
 
- Disadvantages: - Suffers external fragmentation (similar to memory allocation)
 
- Increasing file size is difficult as it depends on the availability of contiguous space 
Linked Allocation
- Linked allocation: each file is a linked list of disk blocks - each block contains pointer to next block, file ends at null pointer
- blocks may be scattered anywhere on the disk (no external fragmentation) 
 
- Advantages - Flexible in terms of file size
- Does not suffer from external fragmentation
 
- Disadvantages - Slow: locating a file block can take many I/Os and disk seeks
 
- Possible improvements: cluster blocks; however, has internal fragmentation - Does not support random/direct access
- Pointer size overhead
- Reliability: what if the pointer has corrupted
 
Flie Allocation Table
- A file allocation table (FAT) is a table that an OS maintains on a disk that provides a map of the blocks that a file is stored in
- Example:   
Index Allocation
- Indexed allocation: each file 
 has its own index blocks of
 pointers to its data blocks
- Advantages: - Fast
- Supports random/direct access to the blocks
- No external fragmentation
 
- Disadvantages: - The pointer overhead for indexed allocation is greater than linked allocation
- for very small files, say files that expand only 2-3 blocks, the indexed allocation would keep one entire block (index block) for the pointers 
 
- Multiple-level index blocks (e.g., 2-level)  
- Combined approach - Used in Unix 
 
- Used in Unix
Extent-Based Systems
- Many new file systems (e.g., Veritas File System) use a partially contiguous allocation scheme
- Extent-based file systems allocate disk blocks in extents
- An extent is a contiguous block of disks- Extents are allocated for file allocation
- A file consists of one or more extents
 
- Mitigate external fragmentation 
Free-Space Management
- File system maintains free-space list to track available blocks- The space of deleted files should be reclaimed
 
- Many management methods:- bit vector (bit map)
- linked free space
- …
 
Bitmap Free-Space Management
- Use one bit for each block, track its allocation status- Searching contiguous blocks is easy/efficient
- requires extra space
 
- example:- block size = 4KB = 212 bytes
- disk size = 240 bytes (1 terabyte)
- n = 240/212 = 228 bits (or 256 MB)>>if clusters of 4 blocks -> 64MB of memory 
 
Linked Free Space
- Keep free blocks in linked list - no wate of space, just use the memory in the free block for pinters
- usually no need to traverse the entire list:
 return the first one
- cannot get contiguous space easily
 
- Simple linked list of free-space is inefficient - one extra disk I/O to allocate one free block disk - I/O is extremely slow
 
- allocating multiple free blocks require traverse the list 
- difficult to allocate contiguous free blocks  
 
Grouping
- Grouping: using indexs to group free blocks- stroe address of n-1 free blocks in the first free block, and the next index block
 
- …