Code-hosting site GitLab has suffered an outage after sustaining a “serious” incident on Tuesday with one of its databases that has required emergency maintenance.
The company today said it lost six hours of database data, including issues, merge requests, users, comments, and snippets, for GitLab.com and was in the process restoring data from a backup. Data was accidentally deleted, according to a Twitter message.
“Losing production data is unacceptable, and in a few days we’ll post the five whys of why this happened and a list of measures we will implement,” GitLab said in a bulletin this morning. Git.wiki repositories and self-hosted installations were unaffected.
The restoration means any data between 17:20 UTC and 23:25 UTC from the database is lost by the time GitLab.com goes live again. Providing a chronology of events, GitLab said it detected Monday that spammers were hammering its database by creating snippets and rendering it unstable. GitLab blocked the spammers based on an IP address and removed a user from using a repository as a form of CDN. This resulted in 47,000 IPs signing in using the same account and causing a high database load, and GitLab removed the users for spamming.
The company provided a statement this morning: “This outage did not affect our Enterprise customers or the wide majority of our users. As part of our ongoing recovery efforts, we are actively investigating a potential data loss. If confirmed, this data loss would affect less than one percent of our user base, and specifically peripheral metadata that was written during a six-hour window,” the company said. “We have been working around the clock to resume service on the affected product, and set up long-term measures to prevent this from happening again. We will continue to keep our community updated through Twitter, our blog and other channels.”
While dealing with the problem, GitLab found database replication lagged far behind, effectively stopping. “This happened because there was a spike in writes that were not processed on time by the secondary database.” GitLab has been dealing with a series of database issues, including a refusal to replicate.
GitLab.com went down at 6:28 pm PST on Tuesday and was back up at 9:57 am PST today, said Tim Anglade, interim vice president of marketing at the company.
[Source:- Javaworld]