구글 파일 시스템

구글 파일 시스템
운영 체제	리눅스 커널
종류	분산 파일 시스템
라이선스	사유

구글 파일 시스템(Google File System, GFS 또는 GoogleFS)은 구글에 의해 자기 회사 사용 목적으로 개발된 분산 파일 시스템이다.^[1] 일반 상용 하드웨어를 이용하여 대량의 서버를 연결하여 데이터에 대한 접근이 효율적이고 안정적이다. 새로운 버전의 구글 파일 시스템 코드이름은 콜로서스(Colossus)이다.^[2]

설계

GFS는 엄청나게 많은 데이터를 보유해야 하는 구글의 핵심 데이터 스토리지와 구글 검색 엔진을 위해 최적화되었다.^[3] 구글 초창기에 래리 페이지와 세르게이 브린에 의해 개발된 “빅파일”에서 개선된 것이다.^[4] 파일들은 일반적인 파일 시스템에서의 클러스터들과 섹터들과 비슷하게 64MB로 고정된 크기의 청크들로 나뉜다. 이것들은 덮어쓰거나 크기를 줄이는 경우가 극히 드물며 보통 추가되거나 읽혀지기만 한다. 가격이 저렴한 범용 컴퓨터들로 구성되고 집적도가 높은 구글의 컴퓨팅 클러스터들에서 잘 동작하도록 최적화되었다. 가격이 저렴한 서버에서도 사용되도록 설계되었기 때문에 하드웨어 안정성이나 자료들의 유실에 대해서 고려하여 설계되었고 레이턴시가 조금 길더라도 데이터의 높은 스루풋에 중점을 두었다.

같이 보기

각주

↑ Carr2 2006: ‘Despite having published details on technologies like the Google File System, Google has not released the software as open source and shows little interest in selling it. The only way it is available to another enterprise is in embedded form—if you buy a high-end version of the Google Search Appliance, one that is delivered as a rack of servers, you get Google's technology for managing that cluster as part of the package’
↑ 〈Google's Colossus Makes Search Real-Time by Dumping MapReduce〉, 《High Scalability》 (월드 와이드 웹 log), 2010년 9월 11일 .
↑ Carr3 2006: ‘All this analysis requires a lot of storage. Even back at Stanford, the Web document repository alone was up to 148 gigabytes, reduced to 54 gigabytes through file compression, and the total storage required, including the indexes and link database, was about 109 gigabytes. That may not sound like much today, when you can buy a Dell laptop with a 120-gigabyte hard drive, but in the late 1990s commodity PC hard drives maxed out at about 10 gigabytes.’
↑ Carr4 2006: ‘To cope with these demands, Page and Brin developed a virtual file system that treated the hard drives on multiple computers as one big pool of storage. They called it BigFiles. Rather than save a file to a particular computer, they would save it to BigFiles, which in turn would locate an available chunk of disk space on one of the computers in the server cluster and give the file to that computer to store, while keeping track of which files were stored on which computer. This was the start of what essentially became a distributed computing software infrastructure that runs on top of Linux.

참고 문헌

Carr, David F (2006년 7월 6일), “How Google Works”, 《Baseline》 ^{[깨진 링크(과거 내용 찾기)]}.
Ghemawat, S.; Gobioff, H.; Leung, S. T. (2003). 〈The Google file system〉. 《Proceedings of the nineteenth ACM Symposium on Operating Systems Principles - SOSP '03》 (PDF). 29쪽. doi:10.1145/945445.945450. ISBN 1581137575. 10.1.1.125.789.

외부 링크

〈GFS: Evolution on Fast-forward〉, 《Queue》, ACM .
〈Google File System Eval : Part I〉, 《Storage mojo》 .
〈NetGFS〉, 《Code》, Google .
〈Distributed Systems〉, 《Code》 (recordings from a course), Google , also featuring a lecture on GFS.
《GFS: the secret at the heart of Google》 (article), 영국: ZDnet, 2005년 3월 7일 ^{[깨진 링크(과거 내용 찾기)]}.

[FOOTNOTECarr22006-1] Carr2 2006: ‘Despite having published details on technologies like the Google File System, Google has not released the software as open source and shows little interest in selling it. The only way it is available to another enterprise is in embedded form—if you buy a high-end version of the Google Search Appliance, one that is delivered as a rack of servers, you get Google's technology for managing that cluster as part of the package’

[2] 〈Google's Colossus Makes Search Real-Time by Dumping MapReduce〉, 《High Scalability》 (월드 와이드 웹 log), 2010년 9월 11일 .

[FOOTNOTECarr32006-3] Carr3 2006: ‘All this analysis requires a lot of storage. Even back at Stanford, the Web document repository alone was up to 148 gigabytes, reduced to 54 gigabytes through file compression, and the total storage required, including the indexes and link database, was about 109 gigabytes. That may not sound like much today, when you can buy a Dell laptop with a 120-gigabyte hard drive, but in the late 1990s commodity PC hard drives maxed out at about 10 gigabytes.’

[FOOTNOTECarr42006-4] Carr4 2006: ‘To cope with these demands, Page and Brin developed a virtual file system that treated the hard drives on multiple computers as one big pool of storage. They called it BigFiles. Rather than save a file to a particular computer, they would save it to BigFiles, which in turn would locate an available chunk of disk space on one of the computers in the server cluster and give the file to that computer to store, while keeping track of which files were stored on which computer. This was the start of what essentially became a distributed computing software infrastructure that runs on top of Linux.

[1]

[2]

[3]

[4]