4.文件缓冲区

# 1. 基本理论

缓冲区实际是内存上的一段空间，内存比磁盘具有更快的读写速度，将磁盘的数据读到内存中，可以防止CPU等待读写时间过长。
C语言中，在struct _IO_FILE结构体（通过grep -ER 'struct _IO_FILE' /usr/include/查找）中还包含了维护了一段缓冲区的结构体信息，这使得由C语言提供的文件读写操作会通过C的缓冲区来实行，即用户级缓冲区。而C语言函数在执行相关的文件操作时，并不会直接操作磁盘上的文件，实际调用的是系统的相关接口，而系统也会为读写操作提供一段缓冲区，称为内核级缓冲区。

缓冲区刷新：进程退出时，会将C语言中FILE维护的缓冲区刷新到系统内核，而内核会自动刷新到磁盘。一些C函数也有刷新缓冲区的功能。数据从用户到内核的刷新策略又分为：
- 立即刷新不缓存。
- 行缓存，遇到'\n'就刷新本行：显示器打印。
- 全缓存，缓冲区满了才刷新：写入普通文件。

#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

int main()
{
    close(1);
    int fid = open("./test.txt", O_CREAT | O_WRONLY, 0644); 
    // 关闭文件描述符1，新打开的文件会被重定向到 1，即新文件的文件指针(FILE*)中的文件描述符会
    // 置成1，打印到显示器的内容会打印到这个文件
    // 由于刷新的目标发生了变化，显示器->文件，则刷新策略也发生了变化，行刷新->全缓存。
    
    fprintf(stdout, "hello world\n"); 
    // FILE 结构体中包含用户级缓冲区
    // 对于标准输出(显示器)的缓存区刷新策略是行刷新 '\n'
    // 打印到磁盘的文件是全刷新(缓冲区满了才刷新)
    // 由于此时的刷新策略是全缓存，此时缓冲区未满，因此文件中看不到字符串。
    
    // fflush(stdout); // 调用函数立即刷新
    
    write(1, "I am father\n", strlen("I am father\n"));//系统调用，没有通过C语言缓冲区写入，直接写入OS缓存
    
    close(fid); 
    // 原本打印的内容会被存放在C语言缓冲区，程序结束时会被OS刷新到文件缓冲区和磁盘
    // 关闭fid，进程退出时找不到维护的文件，则无法将缓冲区的内容刷新到OS，文件内容为空
    // 若不关闭fid，即注释掉这句话，在进程退出后，可以看到test.txt文件中打印了"hello world\n"
 	return 0;   
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

运行结果：

[test@VM-12-4-centos dup2]$ ./test 
[test@VM-12-4-centos dup2]$ cat test.txt 
I am father

1
2
3

# 2. `fclose`刷新缓冲区

#include <stdio.h>
#include <unistd.h>

int main()
{
    FILE* pf = fopen("./test.txt", "w+"); 
    fprintf(pf, "hello world\n");
    //close(pf->_fileno);  // 若关闭了pf的文件描述符，则无法刷新。
    fclose(pf); //进程没有退出，但调用了fclose仍然可以刷新C缓冲到内核
    while(1);
 	return 0;   
}

1
2
3
4
5
6
7
8
9
10
11
12

# 3. 再谈输出重定向

bash中使用>将一个进程在显示器上的输出重定向到一个文件中，本质是调用了系统重定向函数，如dup2。默认只会重定向标准输出，若要重定向标准错误需要特殊指定。而将显示器打印的内容打印到文件中，会使得行缓存变成全缓存，并不会立即刷新，只有进程结束时才会刷新；若此时关闭文件1，进程结束找不到需要刷新的目标文件，则无法打印相应的内容。

#include <stdio.h>
#include <unistd.h>

int main()
{
    printf("hello world\n");
    close(1);
 	return 0;   
}

1
2
3
4
5
6
7
8
9

运行结果：

[test@VM-12-4-centos dup2]$ ./test 
hello world
[test@VM-12-4-centos dup2]$ ./test > test.txt
[test@VM-12-4-centos dup2]$ cat test.txt

1
2
3
4

# 4. 验证缓冲区

我们知道子进程会继承父进程的数据和代码，包括缓冲区里的数据。通过以上获得的理论结果，也可以将子进程的缓冲区数据进行输出。

#include <stdio.h>
#include <unistd.h>

int main()
{
    FILE* pf = fopen("./test.txt", "w+"); 
    fprintf(pf, "hello world\n"); //打印到文件，全缓冲策略
    fork();
 	return 0;   
}

1
2
3
4
5
6
7
8
9
10

运行结果：

[test@VM-12-4-centos dup2]$ ./test 
[test@VM-12-4-centos dup2]$ cat test.txt  # 可以看到"hello world\n"输出了两次
hello world 
hello world								
# 其中一次是fork的缓冲区在进程结束时刷新到内核的。由于子进程中也有指向pf文件的FILE*，pf的引用计数+1，进程结束时发生了写时拷贝

1
2
3
4
5

Nevermore

4.文件缓冲区

# 1. 基本理论

# 2. fclose刷新缓冲区

# 3. 再谈输出重定向

# 4. 验证缓冲区

# 2. `fclose`刷新缓冲区