[ Table Of Contents ][ Answer Guy Current Index ] greetings   bios   1   2   3   4   5   6   7   8   9   10   12   13   14   15   16   17   18   19   20 [ Index of Past Answers ]

(?) The Answer Gang (!)


By Jim Dennis, Ben Okopnik, Dan Wilder, Breen, Chris, and the Gang, the Editors of Linux Gazette... and You!
Send questions (or interesting answers) to tag@ssc.com

There is no guarantee that your questions here will ever be answered. You can be published anonymously - just let us know!


(?) LFS: Large File Summit/Support

From Albert

Answered By Jim Dennis

Hi,

I have an Intel-based box running RedHat 7.x, 2.4.x kernel and I'm trying to write code to support large file (>4GB) writes and seeks. According to the manual pages, the llseek() would handle 64-bit seeks if the kernel supported. However, I can't get my compiler to recognize the llseek() call, perhaps an indication that the 2.4 kernel still doesn't support large files. Do you know of anything else I could try? Is there any other way of manipulating large files on 32-bit Linux? Is there going to be a 64-bit Linux version anytime soon? Please help. Thanks!

-Albert

(!) [JimD] I have to say that I'm surprised that this question hasn't come up before and more often.
As you are aware Linux on 32-bit platforms (x86, SPARC/classic, PowerPC, MIPS, etc) using a signed 32 bit value for off_t (the type for expressing and return offsets for the lseek(), ftell(), and related system calls and library functions). You may know be aware that the off_t on 64 bit platforms (Alpha, UltraSPARC, IA64/Merced) is already set to 64 bits.
Clearly a signed 32 value can only express an offset up to about 2Gb (the negative offsets seek from backwards, either from the end of the file or the current file offset back towards the beginning of the file). This has led to Linux historical 2Gb file size limit on the most common platforms.
This 2Gb limit was common for UNIX on 32-bit. At some point a number of UNIX vendors (well, some engineers from the major UNIX vendors and some major database and other applications vendors) got together and held a "summit" to discuss some way to overcome this limitation and to agree on a reasonably portable interface so that the ISV (the independent software vendors) could write reasonably portable code to cope with this change. So the specification that they agreed upon has been called the LFS ("large file summit" or "large file support").
Linus used to say that anyone who needed to work with larger files really should migrate to Alpha or to Merced or some other 64 bit system. This was around the time that someone had submitted LFS patches to him. However, somewhere over the years since then he changed his mind.
I suspect that his change had a couple of elements (though I hate to second guess him; but I'd hate to waste his time asking about it, even worse). First, I think it became apparent that the need for large file support was growing much faster than the market for 64 bit systems. The 64-bit platforms haven't seen nearly the growth that Linux has; and the cheap availability of very large hard drives and RAID arrays as exacerbated that need (numbers and sizes of files send to grow larger as disk capacity make room for them; demand grows to exceed supply). The increasing use of Linux in imaging compute farms (Hollywood animation production) and for scientific clustering (Beowulf) --- and the continued preference for commodity PC/x86 hardware for those applications has also underscored the need for Linux to support LFS.
I suspect that another thing that helped influence Linus opinion on this is that I think someone submitted a different or cleaned up version of the LFS patches. I seem to recall that Linus didn't like the implementation of one of the early submissions --- so his rejection was on both grounds (implementation, the surmountable one, and perceived need/elegance --- a design judgement call).
Anyway, the 2.4 kernels do support LFS. Now you need to be able to actually compile software to use this support.
What you need to do is sit down and read the libc TexInfo pages (from a shell prompt issue the command 'info libc' or just 'info' or from within EMACS or Xemacs use the M-x info function; usually bound to [F1],[i] or C-h,i )
Here's an excerpt:
 - Macro: _LARGEFILE_SOURCE
     If this macro is defined some extra functions are available which
     rectify a few shortcomings in all previous standards.  More
     concrete the functions `fseeko' and `ftello' are available.
     Without these functions the difference between the ISO C interface
     (`fseek', `ftell') and the low-level POSIX interface (`lseek')
     would lead to problems.

     This macro was introduced as part of the Large File Support
     extension (LFS).

 - Macro: _LARGEFILE64_SOURCE
     If you define this macro an additional set of function gets
     available which enables to use on 32 bit systems to use files of
     sizes beyond the usual limit of 2GB.  This interface is not
     available if the system does not support files that large.  On
     systems where the natural file size limit is greater than 2GB
     (i.e., on 64 bit systems) the new functions are identical to the
     replaced functions.

     The new functionality is made available by a new set of types and
     functions which replace existing.  The names of these new objects
     contain `64' to indicate the intention, e.g., `off_t' vs.
     `off64_t' and `fseeko' vs. `fseeko64'.

     This macro was introduced as part of the Large File Support
     extension (LFS).  It is a transition interface for the time 64 bit
     offsets are not generally used (see `_FILE_OFFSET_BITS').


 - Macro: _FILE_OFFSET_BITS
     This macro lets decide which file system interface shall be used,
     one replacing the other.  While `_LARGEFILE64_SOURCE' makes the
     64 bit interface available as an additional interface
     `_FILE_OFFSET_BITS' allows to use the 64 bit interface to replace
     the old interface.

     If `_FILE_OFFSET_BITS' is undefined or if it is defined to the
     value `32' nothing changes.  The 32 bit interface is used and
     types like `off_t' have a size of 32 bits on 32 bit systems.

     If the macro is defined to the value `64' the large file interface
     replaces the old interface.  I.e., the functions are not made
     available under different names as `_LARGEFILE64_SOURCE' does.
     Instead the old function names now reference the new functions,
     e.g., a call to `fseeko' now indeed calls `fseeko64'.

     This macro should only be selected if the system provides
     mechanisms for handling large files.  On 64 bit systems this macro
     has no effect since the `*64' functions are identical to the
     normal functions.

... this is in a discussion about "feature test macros" (allowing you to code up your #ifdef blocks). You may also need to define some macros to include support for the LFS functions and APIs.
You see in these excerpts hints about the FSF/Glibc maintainers view of LFS. They consider the adoption of LFS to be a three stage process; before and old/legacy code, transitional code that explicity calls the *64 functions, and finally a future where LFS is the default (controlled by a #define?) and there is optional support for the older interfaces.
Further evidence of this is seen in the following:
     When the sources are compiling with `_FILE_OFFSET_BITS == 64' on a
     32 bits machine this function is in fact `fopen64' since the LFS
     interface replaces transparently the old interface.
(in a discussion on "Opening Streams" and the fopen() function).
There is a subtle gotchya in using the LFS support with some
of the f* functions, especially fgetpos for example. Many people
would use off_t (or even long int!) for storing the return values
from fgetpos(). That would be a bug. You should explicitly define
your variables for storing file positions as fpos_t (which is
defined as off_t or off64_t as appropriate to your system and the
#define settings in your sources.
That's why I say you should read the libc info pages. Be meticulous in following the prototypes that they offer for these functions.
There is a portion of these info pages which describes some of these problems and recommends that you use the fgetpos() and fsetpos() functions in preference to the ftell() and fseek() functions.


This page edited and maintained by the Editors of Linux Gazette Copyright © 2001
Published in issue 67 of Linux Gazette June 2001
HTML script maintained by Heather Stern of Starshine Technical Services, http://www.starshine.org/


[ Table Of Contents ][ Answer Guy Current Index ] greetings   bios   1   2   3   4   5   6   7   8   9   10   12   13   14   15   16   17   18   19   20 [ Index of Past Answers ]