CS3103-Lecture-9
File Concept
File is a contiguous logical address space for storing information
- database, audio, video, web pages…
There are different types of file
data: numeric, character, binary, multimedia
program
device
proc file system
- file-system interface to retrieve system information
…
File Attributes
- Name - the only information kept in human-readable form
- Identifier - uniuqe tag (number) identifies file within file system
- Type - needed for systems that support different types
- Location - pointer to file location on device
- Size - current file size
- Protection - controls who can do reading, writing, executing
- Time, date, and user identification - data for protection, securtiy, and usage monitoring
- Many variations, including extended file attributes such as file checksum
File Operations
- OS provides file operations to
- create:
- space in the file system should be found
- an entry must be allocated in the directory
- open:
- return a handler for other perations
- most operations need the file to be opened first
- read/write:
- need to maintain a pointer
- repostion within file -seek
- delete
- release file space
- truncate:
- shortening the length of a file. When truncated, its contents are preserved up to a certain point, after which any data that falls beyond that point is deleted or lost
- other operations can be implemented using these ones
- copy: create and read/write
- create:
Open Files
- Data needed to manage open files:
File handle: a data structure that represents an open file. It countains information such as the file’s name, its current position in the file, and any file-related flags or attributes (such as read-only or write-through mode)
File descriptor: a small integer that identifier an open file within a process. It is used by the operating system to manage the file’s access permissions and to track its usage by various processes.
File table: a data structure that maintains a list of all open files in the system, along with their associated file handles and file descriptors.
File cache: A file cache is a portion of memory that the operating system uses to store recently accessed portions of open files.
Lock table: keeps track of which processes have locked which files and ensures that conflicting lock requests are resolved appropriately.
I/O buffer: a portion of memory that is used to temporarily store file data as it is read or written. The OS typically uses a pool of I/O buffers to improve file access performance and to minimize disk accesses.
File Types - Name, Extension
Access Methods
- Sequential access
- a group of elements is access in a predetermined order
- for some media types, the only access mode (e.g., tape)
- Direct access
- access an element at an arbitrary position in a sequence in (roughly) equal time,
independent of sequence size- it is possible to emulate random access in a tape, but access time varies
- sometime called random access
- access an element at an arbitrary position in a sequence in (roughly) equal time,
Directory Structure
- Directory: a collection of nodes containing information about files
A Typical File-system Organization
- Operations performed on directory
- Search for a file
- Create a file
- Delete a file
- List a directory
- Rename a file
- Traverse the file system
- …
Directory Organization
- Organize the directory (logically) to obtain
- Efficiency – locating a file quickly
- Naming – convenient to users
- Two users can have the same name for different files
- The same file can have several different names
- Grouping – logical grouping of files by properties, (e.g., all Java programs, all games, …)
Single-Level Directory
- A single directory for all users
- naming problems and grouping problems
- Two users want to have same file names
- Hard to group files
- naming problems and grouping problems
Two-Level Directory
- Separate directory for each user
- different users can have the same name for different files
- Each user has his own user file directory (UFD), it is in the master file directory (MFD)
- efficient to search
- cannot group files
- How to share files between different users, and how to share the system files?
- different users can have the same name for different files
Tree-Structured Directories
Files organizaed into trees
- efficient in searching, can group files, convenient naming
- efficient in searching, can group files, convenient naming
File can be accessed using absolute or relative path name
- absolute path name: /home/alice/..
- relative path is relative to the current directory (pwd)
- creating a new file, delete a file, or create a sub-directory
- e.g., if current directory is /mail, a mkdir count will create /mail/count
- creating a new file, delete a file, or create a sub-directory
- How to share a file/directory?
- It’s not allowed
Delete directory
- If directory is empty, then it’s easy to handle
- If not
- Option I: directory cannot be deleted, unless it’s empty
- Option II: delete all the files, directories and sub-directories
- rm -rf /
Acyclic-Graph Directories
- Organize directories into acyclic-graphs
- Allow links to a directory entry/files for
aliasing (no longer a tree)
- Allow links to a directory entry/files for
- Dangling pointer problem
- e.g., if delete file /dict/all, then /dict/w/list and /spell/words/list are dangling pointers
- Solution: backpointers/reference counter
- backpointers record all the pointers to the entity, a variable size record
- Or count
#
of links to it and only (physically) delete it when counter is zero
Hardlink
- Accesses the data available in the original file
- If earlier selected file deleted, the hard link still
contain the data of that file
Softlink(Symblic link)
- Does not access the data available in the original file
- If earlier file deleted, the soft link will point to a file that does not exist anymore
General Graph Directory
- Allowing arbitrary links may generate cycles in the directory structure
- Solution
- Allow cycles, but use garbage collection to reclaim disk spaces
- Detect the cycle every time a new link is added
File
File System Mounting
- A file system must be mounted before it can be accessed
- Mounting a file system attaches that file system to a directory (mount point) and makes it available to the system
- The root (/) file system is always mounted. Any other file system can be connected or disconnected from the root (/) file system
- The old directories connected to the mount point becomes invisible
- The old directories connected to the mount point becomes invisible
File Sharing
- Sharing of files on multi-user systems is desirable
- Sharing must be done through a protection scheme
- User IDs identify users, allowing protections to be per user
- Group IDs allow users to be in groups, permitting group access rights
- On distributed systems, files may be shared across a network
- Network File System (NFS) is a common distributed filesharing method
- Sharing must be done through a protection scheme
Remote File Sharing
- Use networking to allow file system access between systems
- manually via programs like FTP (File Transfer Protocol) clients
- automatically, seamlessly using distributed file systems
- semi-automatically via the world wide web (e.g., hyperlinked documents)
- Client-server model allows clients to mount remote FS from servers
- a server can serve multiple clients
- client and user-on-client identification is complicated
- server cannot assume the client is trusted
- standard OS file calls are translated into remote calls
- NFS (Network File System) is standard for UNIX
- CIFS (Common Internet File System) is standard for Windows
Protection
- File owner/creator should be able to control
– what can be done
– by whom - Types of access
– read
– write, append
– execute
– delete
– list
Access Countrol List
- Assign each file and directory with an access control list (ACL)
- Fine-grained control
- Issues
- How to construct the list?
- How to store the list in directory?
- Not widely used in modern file systems
Unix Access Control
- Three access modes: read, write, execute (encoded in three bits)
- Three classes of users: owner, group and others
- To grant access to users, change access mode
- E.g., use chmod
- E.g., use chmod
File Access Control Examples
File-System Structure
- File structure
– Logical storage unit
– Collection of related information - File system on secondary storage (disks)
– Provides user interface to storage, mapping logical to physical
– Provides efficient and convenient access to disk by allowing data to be stored,
located retrieved easily - File control block (FCB): structure consisting of information of a file
– E.g., Unix inode structure - Device driver controls the physical device
File System Layers
File system organized into layers
Device drivers manage I/O devices at the I/O control layer
- Receive commands like “read drive1, cylinder 72, track 2, sector 10, into memory location 1060”
- Outputs low-level hardware-specific commands to hardware controller
Basic file system given command like “retrieve block 123” translates to device driver
- Also manages memory buffers and caches
(allocation, freeing, replacement)
- Also manages memory buffers and caches
Buffers hold data in transit
Caches hold frequently used data
File-organization module understands files, logical address, and physical blocks
- Translates logical block # to physical block #
- Manages free space
maintains the list of free blocks, and allocates free blocks to files as needed
Logical file system manages metadata information
- The level at which users can request file operations by system calls
- Provides a consistent view of what might be multiple physical file systems and multiple file system implementations
- Translates file name into file number, file handle, location by maintaining file control blocks
- Directory management
- Protection
Many file systems, sometimes many within an OS
- CD-ROM is ISO 9660
- Unix has UFS, FFS
- Windows has FAT, FAT32, NTFS
- Linux has more than 130 types, with extended file system ext3 and ext4 leading; plus distributed file systems, etc.
- New ones still arriving – ZFS, GoogleFS, Oracle ASM, FUSE
- … …
File Control Block (FCB)
- OS maintains FCB per file, contains many details about the file
- E.g., inode number, permissions, size, dates
- Example
Structures to Implement a File System
- On-Disk
- Boot control block per volume
- Volume control block per volume
- Directory structure per file system
- A FCB per file
- In-memory
- In-memory mount table with entries for each mounted volume
- In-memory directory structure cache
- System-wide open-file table : contain FCB, etc
Inode
- An inode is a data structure in Unix that denotes a file or a directory on file system
- It contains information about file, such as location of file on the disk, access mode, ownership, file type …
- Each inode has a number that is used in the index table
- Unix kernel uses inode number to access the contents of an inode
- Unix kernel uses inode number to access the contents of an inode
Directory Implementation
- Linear list of file names with pointer to the data blocks
- simple to program
- time-consuming to execute
- Hash Table: linear list with hash data structure
- hash function on the file name determines the key directory
- Potential collisions
- decreases directory search time
- fixed size
File Creation
- Application process requests the creation of a new file
- Logical file system allocates a new FCB
- Appropriate directory updated with the new file name and FCB
- Disk blocks allocated for files
Allocation Methods
- Allocation method: how disk blocks are allocated for files
- The main problem is how to allocate space to files so that disk space is utilized effectively and files can be accessed quickly
- Three major methods:
- Contiguous allocation
- Linked allocation
- Indexed allocation
Contiguous Allocation
Each file occupies a set of contiguous blocks on the disk
Simple – only starting location (block #) and length (number of blocks) are required
Random access
- the address of the kth block starting at block b can easily be obtained as (b+k)
- the address of the kth block starting at block b can easily be obtained as (b+k)
Advantages:
- Both the Sequential and Random Accesses are supported
- Extremely fast since the number of seeks are minimal because of contiguous allocation of file blocks
Disadvantages:
- Suffers external fragmentation (similar to memory allocation)
Increasing file size is difficult as it depends on the availability of contiguous space
Linked Allocation
Linked allocation: each file is a linked list of disk blocks
- each block contains pointer to next block, file ends at null pointer
- blocks may be scattered anywhere on the disk (no external fragmentation)
Advantages
- Flexible in terms of file size
- Does not suffer from external fragmentation
Disadvantages
- Slow: locating a file block can take many I/Os and disk seeks
Possible improvements: cluster blocks; however, has internal fragmentation
- Does not support random/direct access
- Pointer size overhead
- Reliability: what if the pointer has corrupted
Flie Allocation Table
- A file allocation table (FAT) is a table that an OS maintains on a disk that provides a map of the blocks that a file is stored in
- Example:
Index Allocation
Indexed allocation: each file
has its own index blocks of
pointers to its data blocksAdvantages:
- Fast
- Supports random/direct access to the blocks
- No external fragmentation
Disadvantages:
- The pointer overhead for indexed allocation is greater than linked allocation
- for very small files, say files that expand only 2-3 blocks, the indexed allocation would keep one entire block (index block) for the pointers
Multiple-level index blocks (e.g., 2-level)
Combined approach
- Used in Unix
- Used in Unix
Extent-Based Systems
- Many new file systems (e.g., Veritas File System) use a partially contiguous allocation scheme
- Extent-based file systems allocate disk blocks in extents
- An extent is a contiguous block of disks
- Extents are allocated for file allocation
- A file consists of one or more extents
- Mitigate external fragmentation
Free-Space Management
- File system maintains free-space list to track available blocks
- The space of deleted files should be reclaimed
- Many management methods:
- bit vector (bit map)
- linked free space
- …
Bitmap Free-Space Management
- Use one bit for each block, track its allocation status
- Searching contiguous blocks is easy/efficient
- requires extra space
- example:
- block size = 4KB = 212 bytes
- disk size = 240 bytes (1 terabyte)
- n = 240/212 = 228 bits (or 256 MB)
>>
if clusters of 4 blocks -> 64MB of memory
Linked Free Space
Keep free blocks in linked list
- no wate of space, just use the memory in the free block for pinters
- usually no need to traverse the entire list:
return the first one - cannot get contiguous space easily
Simple linked list of free-space is inefficient
one extra disk I/O to allocate one free block disk
- I/O is extremely slow
allocating multiple free blocks require traverse the list
difficult to allocate contiguous free blocks
Grouping
- Grouping: using indexs to group free blocks
- stroe address of n-1 free blocks in the first free block, and the next index block
- …