2022 Census
IBGE will have dedicated server with cloud computing and artificial intelligence
June 27, 2022 10h00 AM | Last Updated: June 30, 2022 08h18 AM
The processing of 2022 Population Census information is already guaranteed with a robust world-class data center, developed by the IBGE in 2019. Cloud computing, high performance databases, security with encryption, redundant fiber optic links, duplicated environment for disaster recovery and even artificial intelligence are some of the capabilities that make up the technological infrastructure dedicated to processing the census. There will be 200 virtual machines, bought in the last two years, running on a national private cloud.
For Carlos Renato Cotovio, IBGE's data processing director, the major challenge is to go from a company of 10,000 to one of 200,000 employed persons. The first initiative was to move the data center on Rua General Canabarro, in the North Zone of Rio, to the second floor of the building, in order to avoid flooding, which is common in the area. The main data center has the Tier 3 classification rating, a security level that identifies a high availability, high performance and low latency (response time) facility. The secondary data center in São Paulo has a Tier 2 rating, also considered a good level of security and performance for contingency solutions. In both data centers, there are features such as cold and hot aisles, with optimized airflow to keep the equipment temperature cool, while reducing energy consumption. In addition, the entire fire detection and fire fighting systems are optimized.
“This is the census of a lifetime. I am a career employee at BNDES, where I implemented the digital transformation project, and I was transferred to IBGE to live the experience and challenge of the 2022 Census, a giant project that takes the IBGE from 10,000 employees to 200,000. This rise is vertiginous and required the acquisition of equipment, the hiring of personnel and distribution of technological resources. It's a professional and personal experience. This is a short-term operation – just four months of the survey of the surroundings, which began in June, and the collection from August to October –, which is constantly adjusted while being conducted. It takes a huge sense of purpose, and it's a unique experience for anyone,” says Mr. Cotovio.
Two environments, one for regular surveys and one for the Census
The IBGE's IT Services Coordinator, José Luiz Thomaselli, points out that the 2022 Census is different because there are currently large continuous surveys. The Continuous PNAD is the IBGE's largest survey after the census and will continue to be collected, processed and disseminated in parallel. In 2010, such surveys were annual, and carried out in the intercensal periods.
“So, we decided to create two data centers within the IBGE Data Processing Center. There is equipment for the regular work of the IBGE and a dedicated infrastructure for the work of the IBGE in the census. The model is the same, the big difference is that all the equipment for processing the census is new”, says Mr. Thomaselli.
He explains that all 200 servers are virtualized and redundant, in clusters of four machines so that, if one or two fails, the others will continue to run. Storage systems are high-performance, that is, they are SSD disks – similar to memory cards – with high performance.
In the case of a physical server, the systems are installed in the equipment, and we become dependent on it. The advantage of the virtual server is that it can work as a file installed in a pen drive and you can take it anywhere. That gives you flexibility. If a physical machine had its disk damaged, it would be necessary to install all the content in another physical machine. Now you have a physical equipment with several virtual machines. If the equipment fails, the virtual machines migrate to the other equipment and everything goes on running”, explains Mr. Thomaselli.
For that to happen, the IBGE implemented a private cloud that comprises between 800 to 900 machines. The cloud is not restricted to the data center of General Canabarro Street, in Rio de Janeiro: it also reaches the State Branches making a national internal cloud. “We can move a machine from São Paulo to Rio and vice-verse. We also use the Microsoft Azure cloud to download inputs and applications to the Mobile Collection Devices (DCM) –smartphones that will be used by enumerators in data collection”, adds the data processing manager.
Census will have 10 Gbps connectivity, 100 times faster
Today the internet links at the IBGE are of 100 Mbps. The ones bought to be used in the Census are of 10 Gbps, nearly 100 times faster. The connection of the data center in Rio de Janeiro and the contingency in São Paulo also uses two fiber optic circuits of 10 Gbps, in a LAN-to-LAN configuration, as if it were a local connection.
“There was a triangular structure forming a redundant ring. Canabarro’s data center, in Rio, is linked to that of Urussuí, in São Paulo, by a 10 Gbps link; and São Paulo’s data center is linked to that of Chile Av., another entrance in Rio, which, in turn, is linked to Canabarro’s data center. It is a ring. In case there is a problem in São Paulo or in Canabarro’s data center, we can get in by Chile”, highlights Mr. Thomaselli.
“The current world imposes velocity and the IBGE works with information that portrays the country. The Census needs to be fast and the collected data must be made available quickly to the population, to researchers and to public policy-makers. The investment is necessary”, emphasizes Mr. Cotovio.
Security against attacks during Census
Every day the IBGE receives dozens of attack attempts, but it has an infrastructure of security layers. The first one is the firewall, which isolates the internal network from the external one. The second layer is the application firewall, protecting the systems against undue accesses. There is also a set of equipment gathered within an internal barrier, separating it from the IBGE machines.
“The IBGE hired a specialist software to monitor the environment and a company to verify the websites’ vulnerability. It also bought administration tools. The IBGE works with sensitive data and has a strong security system when compared to other enterprises, managing to achieve good isolation. In addition to separating the census’ data center, we hired specialized companies to give us support”, guarantees Mr. Thomaselli.
The databases running microdata use the high-performance server Exadata - an optimized computing platform to run the Oracle database. The server has a dedicated firewall for data protection. The database is installed at Canabarro’s data center and replicated in the secondary data center of São Paulo.
“We have another database, the SQL Server from Microsoft, which has the database of the hiring of professionals to work in the census and is also in Rio, being replicated in São Paulo. All the IBGE machines are of open architecture. We shut down the IBM mainframe in 2017”, says the data processing coordinator.
Redundant data center for disaster recovery
Mr. Thomaselli explains that the systems are more and more dependent on the internet. For the Census, data collection continuity must be guaranteed: the enumerator has to reach households and collect people’s data. But there is great risk for data stored in the DMC: the device can be stolen, fall from the enumerator's hands or stop working. Therefore, the faster data is transmitted, the safer the operation, preventing data loss. “São Paulo’s data center has the purpose of guaranteeing the field operation, in case there is failure in the main data center. Collection takes place in real time”, says Mr. Thomaselli.
Artificial intelligence
The coordinator explains that all transmitted data enter the data center of Canabarro and São Paulo, so they feed the databases used by the technicians to verify data consistency and run the editing software. In parallel, there is a data lake structure, a great repository of databases where the Business Intelligence software are run. These software buy data from other surveys and run warning signs whenever there is something irregular.
“That will generate dashboards with graphs for data exhibition. Besides, we have SAS, a statistical tool widely used by the directorate of surveys to run questions and check if the information has quality or needs some adjustment”, completes the coordinator.
The IBGE will also use IA capabilities to assess the information based on the coding used in the previous censuses. It was a solution internally developed by the technical teams of the IBGE, to identify inconsistences such as a 10-year-old person claiming to be retired.
“With so much editing, we can have a census of quality, a much higher quality than that of the collected data – which is a huge differential, not only for the census operation, but also for the census results, which will be the foundation of so many public policies in Brazil”, says Mr. Cotovio, director of technology.
DMC chips allow real time collection transmission
The whole technological investment in the IBGE’s infrastructure was carried out in 2019. For 2022, the remaining expenses relate to costs, such as the communication links or DMC chips. Mr. Thomaselli highlights that the technology area was already challenged in the pandemic to quickly give support to the teams working from home. Now the challenge is to manage the work of 200 thousand persons.
“This census brings improvements, such as internal DCM chips that allow the transmission as soon as enumerators find a signal. Before that, it was a blind collection. Today, supervisors can see each enumerator’s productivity and make a decision. And, if the enumerator is in doubt, he/she can establish a VoIP connection (Voice over IP, or voice over internet) and clear his/her doubts. They can perform interviews by phone as well. The census is now much more interactive. All moves in the field are recorded, as well as the time spent for completing the questionnaire. Monitoring, administration and decision-making are all in real time” concludes Mr. Thomaselli.
Technical challenge and unique opportunity taking part of the Census
For the manager of IT administration, Flavia Marinho de Lima, telecommunication engineer, there are several technical challenges in the 2022 Census. Currently, technology is separated into many layers: infrastructure, storage, database, monitoring, security and application. The administrative area managed by Ms. Lima deals with the application layer, which integrates alls the previous ones and accounts for the technology services as the login databases (AD), web services, domain name servers (DNS), application servers and evaluation and configuration of equipment.
In the 2022 Census, the area will be responsible for the whole infrastructure of the Administration System of Census Personnel (SAPC) and of the System of Collection Managing Indicators (SIGC), which works as a support and management control tool and will be available to coordinators.
The DMCs, on the other hand, will have many inputs as maps and the novelty in this Census is that the capabilities will be in the public cloud and in the IBGE private cloud, in the data center. First, the application used by enumerators tries to access the public cloud. If it fails to do it, it tries the IBGE data centers. Ms. Lima explains that, in the Census of Agriculture, this service of entering the input in DCMs ended up causing congestion in the IBGE network; for this reason, the public cloud is also being used.
“Participating of the 2022 Census is an enormous challenge and a responsibility. As a professional, it is a unique opportunity. The Census of Agriculture was an enriching experience itself. The 2022 Population Census will be carried out in a short period, in which errors are not allowed, for days not worked have a very high cost”, Ms. Lima concludes.