Skip to content

Enhanced CIDR enrichment with GeoIP and mmdb #24411

@naa0yama

Description

@naa0yama

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Use Cases

Hello,
In my environment, I constantly need to process large volumes of logs, and I'm focusing on structuring them.

In my current environment, I've implemented a lot of ingest pipelines using the common Elasticsearch and Filebeat, but I can't perform unit testing. Updating GROK patterns is a big mental burden.

Meanwhile, I discovered that Vector allows you to write tests in configuration files, which might allow for TDD (Test-Driven Development), so I gave it a try.

As you know, IP address information is very common in logs. Therefore, when using logs as a SIEM, the ASN, organization information, and CIDR information associated with IP addresses are very useful for aggregating and displaying information.
This allows you to group vast amounts of IP address data by CIDR and organization, which is convenient when displaying them in graphs or when viewing logs for specific CIDRs for security purposes, eliminating the need to specify each IP address one by one.
Pre-processing using a log ingestion agent is extremely beneficial because it reduces the barriers to analysis tools.

Attempted Solutions

I noticed that some fields were missing when using Vector to perform the substitution performed by Elasticsearch's GeoIP processor.

match self.dbkind {
DatabaseKind::Asn | DatabaseKind::Isp => {
let data = lookup_value::<Isp>(&self.dbreader, ip).ok()??;
add_field!("autonomous_system_number", data.autonomous_system_number);
add_field!(
"autonomous_system_organization",
data.autonomous_system_organization
);
add_field!("isp", data.isp);
add_field!("organization", data.organization);
}
DatabaseKind::City => {
let data: City = lookup_value::<City>(&self.dbreader, ip).ok()??;
add_field!("city_name", self.take_translation(&data.city.names));
add_field!("continent_code", data.continent.code);
let country = data.country;
add_field!("country_code", country.iso_code);
add_field!("country_name", self.take_translation(&country.names));
let location = data.location;
add_field!("timezone", location.time_zone);
add_field!(
"latitude",
location.latitude.map(|latitude| Value::Float(
NotNan::new(latitude).expect("latitude cannot be Nan")
))
);
add_field!(
"longitude",
location
.longitude
.map(|longitude| NotNan::new(longitude).expect("longitude cannot be Nan"))
);
add_field!("metro_code", location.metro_code);
// last subdivision is most specific per https://github.com/maxmind/GeoIP2-java/blob/39385c6ce645374039450f57208b886cf87ade47/src/main/java/com/maxmind/geoip2/model/AbstractCityResponse.java#L96-L107
let subdivision = data.subdivisions.last();
add_field!(
"region_name",
subdivision.map(|s| self.take_translation(&s.names))
);
add_field!(
"region_code",
subdivision.and_then(|subdivision| subdivision.iso_code)
);
add_field!("postal_code", data.postal.code);
}
DatabaseKind::ConnectionType => {
let data = lookup_value::<ConnectionType>(&self.dbreader, ip).ok()??;
add_field!("connection_type", data.connection_type);
}
DatabaseKind::AnonymousIp => {
let data = lookup_value::<AnonymousIp>(&self.dbreader, ip).ok()??;
add_field!("is_anonymous", data.is_anonymous);
add_field!("is_anonymous_vpn", data.is_anonymous_vpn);
add_field!("is_hosting_provider", data.is_hosting_provider);
add_field!("is_public_proxy", data.is_public_proxy);
add_field!("is_residential_proxy", data.is_residential_proxy);
add_field!("is_tor_exit_node", data.is_tor_exit_node);
}
}
Some(map)
}

I noticed that a few fields were missing when using Vector to replace what Elasticsearch's GeoIP processor does.

The network field is the one that returns the CIDR in the database from the input IP address. Using this provides information that can be grouped, such as 192.0.2.0/24.

According to the documentation, the following databases support this network field:

Proposal

Please refer to "Attempted Solutions" as it is the same content as the above.

References

source

MaxMind Databases docs

Version

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    type: featureA value-adding code addition that introduce new functionality.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions