日志解析--Logparsing
发布日期:2021-06-28 21:02:41 浏览次数:2 分类:技术文章

本文共 22935 字,大约阅读时间需要 76 分钟。

Apache HTTPD和NGINX访问日志解析器

这是一个Logparsing框架,旨在简化和访问日志文件的解析。

基本思想是,您应该能够拥有一个解析器,可以通过简单地告诉该行写入了哪些配置选项来构造该解析器。这些配置选项是访问日志行的架构。

github地址:https://github.com/nielsbasjes/logparser

需要IDEA先安装Lombok插件

导入依赖
nl.basjes.parse.httpdlog
httpdlog-parser
5.2
nginx日志样本

在nginx的conf目录下的nginx.conf文件中可以配置日志打印的格式,如下:

#log_format  main   '$remote_addr - $remote_user [$time_local] [$msec] 	 						[$request_time] [$http_host] "$request" '                    '$status $body_bytes_sent "$request_body" "$http_referer" '                    '"$http_user_agent" $http_x_forwarded_for'

$remote_addr 对应客户端的地址

$remote_user 是请求客户端请求认证的用户名,如果没有开启认证模块的话是值为空。

$time_local 表示nginx服务器时间

$msec 访问时间与时区字符串形式

$request_time 请求开始到返回时间

$http_host 请求域名

$request 请求的url与http协议

$status 请求状态,如成功200

$body_bytes_sent 表示从服务端返回给客户端的body数据大小

$request_body 访问url时参数

$http_referer 记录从那个页面链接访问过来的

$http_user_agent 记录客户浏览器的相关信息

$http_x_forwarded_for 请求转发过来的地址

$upstream_response_time: 从 Nginx 建立连接 到 接收完数据并关闭连接

范例:

125.88.xxx.xx - - [02/Nov/2018:14:28:49 +0800] [1541140129.431] [0.095] [ma.xx.game.com] “POST /?log3/gameReport HTTP/1.1” 200 358 “type=1&id=NaN” “http://ma.xx.game.com/?log3/gameReport2” “Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36” -

点击日流日志样本
2001-980:91c0:1:8d31:a232:25e5:85d 222.68.172.190 - [05/Sep/2010:11:27:50 +0200] \"GET /images/my.jpg HTTP/1.1\" 404 23617 \"http://www.angularjs.cn/A00n\" \"Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_4; nl-nl) AppleWebKit/533.17.8 (KHTML, like Gecko) Version/5.0.1 Safari/533.17.8\" \"jquery-ui-theme=Eggplant; BuI=SomeThing; Apache=127.0.0.1.1351111543699529\" \"beijingshi\"
定义格式化字符串

参考:https://httpd.apache.org/docs/current/mod/mod_log_config.html

%u %h %l %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" \"%{Cookie}i\" \"%{Addr}i\"
使用方式

打印日志格式字符串的所有参数

public class Test {
public static void main(String[] args) throws MissingDissectorsException, NoSuchMethodException, DissectionFailure, InvalidDissectorException {
new Test().run(); } String logformat = "%u %h %l %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" \"%{Cookie}i\" \"%{Addr}i\""; String logline = "2001-980:91c0:1:8d31:a232:25e5:85d 222.68.172.190 - [05/Sep/2010:11:27:50 +0200] \"GET /images/my.jpg HTTP/1.1\" 404 23617 \"http://www.angularjs.cn/A00n\" \"Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_4; nl-nl) AppleWebKit/533.17.8 (KHTML, like Gecko) Version/5.0.1 Safari/533.17.8\" \"jquery-ui-theme=Eggplant; BuI=SomeThing; Apache=127.0.0.1.1351111543699529\" \"beijingshi\""; private void run() throws InvalidDissectorException, MissingDissectorsException, NoSuchMethodException, DissectionFailure {
//打印日志格式的参数列表 printAllPossibles(logformat);} private static final Logger LOG = LoggerFactory.getLogger(Test.class); //打印所有参数 private void printAllPossibles(String logformat) throws NoSuchMethodException, MissingDissectorsException, InvalidDissectorException {
Parser dummyParser = new HttpdLoglineParser<>(Object.class, logformat); List
possiblePaths = dummyParser.getPossiblePaths(); dummyParser.addParseTarget(String.class.getMethod("indexOf", String.class), possiblePaths); LOG.info("=================================="); System.out.println("=================================="); LOG.info("Possible output:"); System.out.println("Possible output:"); for (String path : possiblePaths) {
System.out.println(path + " " + dummyParser.getCasts(path)); LOG.info("{} {}", path, dummyParser.getCasts(path)); } System.out.println("=================================="); LOG.info("=================================="); }}
入门案例

定义实体对象

public class MyRecord {
@Getter @Setter private String connectionClientUser = null; @Getter @Setter private String connectionClientHost = null; @Getter @Setter private String requestReceiveTime = null; @Getter @Setter private String method = null; @Getter @Setter private String referrer = null; @Getter @Setter private String screenResolution = null; @Getter @Setter private String requestStatus = null; @Getter @Setter private String responseBodyBytes = null; @Getter @Setter private Long screenWidth = null; @Getter @Setter private Long screenHeight = null; @Getter @Setter private String googleQuery = null; @Getter @Setter private String bui = null; @Getter @Setter private String useragent = null; @Getter @Setter private String asnNumber = null; @Getter @Setter private String asnOrganization = null; @Getter @Setter private String ispName = null; @Getter @Setter private String ispOrganization = null; @Getter @Setter private String continentName = null; @Getter @Setter private String continentCode = null; @Getter @Setter private String countryName = null; @Getter @Setter private String countryIso = null; @Getter @Setter private String subdivisionName = null; @Getter @Setter private String subdivisionIso = null; @Getter @Setter private String cityName = null; @Getter @Setter private String postalCode = null; @Getter @Setter private Double locationLatitude = null; @Getter @Setter private Double locationLongitude = null; private final Map
results = new HashMap<>(32); @Field("STRING:request.firstline.uri.query.*") public void setQueryDeepMany(final String name, final String value) {
results.put(name, value); } @Field("STRING:request.firstline.uri.query.img") public void setQueryImg(final String name, final String value) {
results.put(name, value); } @Field("IP:connection.client.host") public void setIP(final String value) {
results.put("IP:connection.client.host", value); } public String getUser() {
return results.get("IP:connection.client.host"); } @Field({
"STRING:connection.client.user", "HTTP.HEADER:request.header.addr", "IP:connection.client.host.last", "TIME.STAMP:request.receive.time.last", "TIME.DAY:request.receive.time.last.day", "TIME.MONTHNAME:request.receive.time.last.monthname", "TIME.MONTH:request.receive.time.last.month", "TIME.WEEK:request.receive.time.last.weekofweekyear", "TIME.YEAR:request.receive.time.last.weekyear", "TIME.YEAR:request.receive.time.last.year", "TIME.HOUR:request.receive.time.last.hour", "TIME.MINUTE:request.receive.time.last.minute", "TIME.SECOND:request.receive.time.last.second", "TIME.MILLISECOND:request.receive.time.last.millisecond", "TIME.MICROSECOND:request.receive.time.last.microsecond", "TIME.NANOSECOND:request.receive.time.last.nanosecond", "TIME.DATE:request.receive.time.last.date", "TIME.TIME:request.receive.time.last.time", "TIME.ZONE:request.receive.time.last.timezone", "TIME.EPOCH:request.receive.time.last.epoch", "TIME.DAY:request.receive.time.last.day_utc", "TIME.MONTHNAME:request.receive.time.last.monthname_utc", "TIME.MONTH:request.receive.time.last.month_utc", "TIME.WEEK:request.receive.time.last.weekofweekyear_utc", "TIME.YEAR:request.receive.time.last.weekyear_utc", "TIME.YEAR:request.receive.time.last.year_utc", "TIME.HOUR:request.receive.time.last.hour_utc", "TIME.MINUTE:request.receive.time.last.minute_utc", "TIME.SECOND:request.receive.time.last.second_utc", "TIME.MILLISECOND:request.receive.time.last.millisecond_utc", "TIME.MICROSECOND:request.receive.time.last.microsecond_utc", "TIME.NANOSECOND:request.receive.time.last.nanosecond_utc", "TIME.DATE:request.receive.time.last.date_utc", "TIME.TIME:request.receive.time.last.time_utc", "HTTP.URI:request.referer", "HTTP.PROTOCOL:request.referer.protocol", "HTTP.USERINFO:request.referer.userinfo", "HTTP.HOST:request.referer.host", "HTTP.PORT:request.referer.port", "HTTP.PATH:request.referer.path", "HTTP.QUERYSTRING:request.referer.query", "STRING:request.referer.query.*", "HTTP.REF:request.referer.ref", "TIME.STAMP:request.receive.time", "TIME.DAY:request.receive.time.day", "TIME.MONTHNAME:request.receive.time.monthname", "TIME.MONTH:request.receive.time.month", "TIME.WEEK:request.receive.time.weekofweekyear", "TIME.YEAR:request.receive.time.weekyear", "TIME.YEAR:request.receive.time.year", "TIME.HOUR:request.receive.time.hour", "TIME.MINUTE:request.receive.time.minute", "TIME.SECOND:request.receive.time.second", "TIME.MILLISECOND:request.receive.time.millisecond", "TIME.MICROSECOND:request.receive.time.microsecond", "TIME.NANOSECOND:request.receive.time.nanosecond", "TIME.DATE:request.receive.time.date", "TIME.TIME:request.receive.time.time", "TIME.ZONE:request.receive.time.timezone", "TIME.EPOCH:request.receive.time.epoch", "TIME.DAY:request.receive.time.day_utc", "TIME.MONTHNAME:request.receive.time.monthname_utc", "TIME.MONTH:request.receive.time.month_utc", "TIME.WEEK:request.receive.time.weekofweekyear_utc", "TIME.YEAR:request.receive.time.weekyear_utc", "TIME.YEAR:request.receive.time.year_utc", "TIME.HOUR:request.receive.time.hour_utc", "TIME.MINUTE:request.receive.time.minute_utc", "TIME.SECOND:request.receive.time.second_utc", "TIME.MILLISECOND:request.receive.time.millisecond_utc", "TIME.MICROSECOND:request.receive.time.microsecond_utc", "TIME.NANOSECOND:request.receive.time.nanosecond_utc", "TIME.DATE:request.receive.time.date_utc", "TIME.TIME:request.receive.time.time_utc", "HTTP.URI:request.referer.last", "HTTP.PROTOCOL:request.referer.last.protocol", "HTTP.USERINFO:request.referer.last.userinfo", "HTTP.HOST:request.referer.last.host", "HTTP.PORT:request.referer.last.port", "HTTP.PATH:request.referer.last.path", "HTTP.QUERYSTRING:request.referer.last.query", "STRING:request.referer.last.query.*", "HTTP.REF:request.referer.last.ref", "NUMBER:connection.client.logname", "BYTESCLF:response.body.bytes", "BYTES:response.body.bytes", "HTTP.USERAGENT:request.user-agent.last", "HTTP.COOKIES:request.cookies.last", "HTTP.COOKIE:request.cookies.last.*", "STRING:request.status.last", "HTTP.USERAGENT:request.user-agent", "STRING:connection.client.user.last", "HTTP.FIRSTLINE:request.firstline.original", "HTTP.METHOD:request.firstline.original.method", "HTTP.URI:request.firstline.original.uri", "HTTP.PROTOCOL:request.firstline.original.uri.protocol", "HTTP.USERINFO:request.firstline.original.uri.userinfo", "HTTP.HOST:request.firstline.original.uri.host", "HTTP.PORT:request.firstline.original.uri.port", "HTTP.PATH:request.firstline.original.uri.path", "HTTP.QUERYSTRING:request.firstline.original.uri.query", "STRING:request.firstline.original.uri.query.*", "HTTP.REF:request.firstline.original.uri.ref", "HTTP.PROTOCOL_VERSION:request.firstline.original.protocol", "HTTP.PROTOCOL:request.firstline.original.protocol", "HTTP.PROTOCOL.VERSION:request.firstline.original.protocol.version", "BYTESCLF:response.body.bytes.last", "BYTES:response.body.bytes.last", "NUMBER:connection.client.logname.last", "HTTP.FIRSTLINE:request.firstline", "HTTP.METHOD:request.firstline.method", "HTTP.URI:request.firstline.uri", "HTTP.PROTOCOL:request.firstline.uri.protocol", "HTTP.USERINFO:request.firstline.uri.userinfo", "HTTP.HOST:request.firstline.uri.host", "HTTP.PORT:request.firstline.uri.port", "HTTP.PATH:request.firstline.uri.path", "HTTP.QUERYSTRING:request.firstline.uri.query", "STRING:request.firstline.uri.query.*", "HTTP.REF:request.firstline.uri.ref", "HTTP.PROTOCOL_VERSION:request.firstline.protocol", "HTTP.PROTOCOL:request.firstline.protocol", "HTTP.PROTOCOL.VERSION:request.firstline.protocol.version", "HTTP.COOKIES:request.cookies", "HTTP.COOKIE:request.cookies.*", "BYTES:response.body.bytesclf", "BYTESCLF:response.body.bytesclf", "IP:connection.client.host", }) public void setValue(final String name, final String value) {
results.put(name, value); } public String toString() {
StringBuilder sb = new StringBuilder(); TreeSet
keys = new TreeSet<>(results.keySet()); for (String key : keys) {
sb.append(key).append(" = ").append(results.get(key)).append('\n'); } return sb.toString(); } public void clear() {
results.clear(); }}

核心代码实现:

public class Test {
public static void main(String[] args) throws MissingDissectorsException, NoSuchMethodException, DissectionFailure, InvalidDissectorException {
new Test().run(); } String logformat = "%u %h %l %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" \"%{Cookie}i\" \"%{Addr}i\""; String logline = "2001-980:91c0:1:8d31:a232:25e5:85d 222.68.172.190 - [05/Sep/2010:11:27:50 +0200] \"GET /images/my.jpg HTTP/1.1\" 404 23617 \"http://www.angularjs.cn/A00n\" \"Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_4; nl-nl) AppleWebKit/533.17.8 (KHTML, like Gecko) Version/5.0.1 Safari/533.17.8\" \"jquery-ui-theme=Eggplant; BuI=SomeThing; Apache=127.0.0.1.1351111543699529\" \"beijingshi\""; private void run() throws InvalidDissectorException, MissingDissectorsException, NoSuchMethodException, DissectionFailure {
//打印日志格式的参数列表 printAllPossibles(logformat); //将日志参数映射成对象 Parser
parser = new HttpdLoglineParser<>(MyRecord.class, logformat); parser.addParseTarget("setConnectionClientUser", "STRING:connection.client.user"); parser.addParseTarget("setConnectionClientHost", "IP:connection.client.host"); parser.addParseTarget("setRequestReceiveTime", "TIME.STAMP:request.receive.time"); parser.addParseTarget("setMethod", "HTTP.METHOD:request.firstline.method"); parser.addParseTarget("setRequestStatus", "STRING:request.status.last"); parser.addParseTarget("setScreenResolution", "HTTP.URI:request.firstline.uri"); parser.addParseTarget("setResponseBodyBytes", "BYTES:response.body.bytes"); parser.addParseTarget("setReferrer", "HTTP.URI:request.referer"); parser.addParseTarget("setUseragent", "HTTP.USERAGENT:request.user-agent"); MyRecord record = new MyRecord(); System.out.println("=============================================="); parser.parse(record, logline); LOG.info(record.toString()); System.out.println(record.toString()); System.out.println("=============================================="); System.out.println(record.getConnectionClientUser()); System.out.println(record.getConnectionClientHost()); System.out.println(record.getRequestReceiveTime()); System.out.println(record.getMethod()); System.out.println(record.getScreenResolution()); System.out.println(record.getRequestStatus()); System.out.println(record.getResponseBodyBytes()); System.out.println(record.getReferrer()); System.out.println(record.getUseragent()); } //打印所有参数 private void printAllPossibles(String logformat) throws NoSuchMethodException, MissingDissectorsException, InvalidDissectorException {
Parser
dummyParser = new HttpdLoglineParser<>(Object.class, logformat); List
possiblePaths = dummyParser.getPossiblePaths(); dummyParser.addParseTarget(String.class.getMethod("indexOf", String.class), possiblePaths); System.out.println("=================================="); System.out.println("Possible output:"); for (String path : possiblePaths) {
System.out.println(path + " " + dummyParser.getCasts(path)); } System.out.println("=================================="); }}

创建点击流样例类

参考代码

import com.alibaba.fastjson.JSONclass ClickLogBean {
//用户id信息 private[this] var _connectionClientUser: String = _ def setConnectionClientUser (value: String): Unit = {
_connectionClientUser = value } def getConnectionClientUser = {
_connectionClientUser } //ip地址 private[this] var _ip: String = _ def setIp (value: String): Unit = {
_ip = value } def getIp = {
_ip } //请求时间 private[this] var _requestTime: String = _ def setRequestTime (value: String): Unit = {
_requestTime = value } def getRequestTime = {
_requestTime } //请求方式 private[this] var _method:String = _ def setMethod(value:String) = {
_method = value} def getMethod = {
_method} //请求资源 private[this] var _resolution:String = _ def setResolution(value:String) = {
_resolution = value} def getResolution = {
_resolution } //请求协议 private[this] var _requestProtocol: String = _ def setRequestProtocol (value: String): Unit = {
_requestProtocol = value } def getRequestProtocol = {
_requestProtocol } //响应码 private[this] var _responseStatus: Int = _ def setRequestStatus (value: Int): Unit = {
_responseStatus = value } def getRequestStatus = {
_responseStatus } //返回的数据流量 private[this] var _responseBodyBytes: String = _ def setResponseBodyBytes (value: String): Unit = {
_responseBodyBytes = value } def getResponseBodyBytes = {
_responseBodyBytes } //访客的来源url private[this] var _referer: String = _ def setReferer (value: String): Unit = {
_referer = value } def getReferer = {
_referer } //客户端代理信息 private[this] var _userAgent: String = _ def setUserAgent (value: String): Unit = {
_userAgent = value } def getUserAgent = {
_userAgent } //跳转过来页面的域名:HTTP.HOST:request.referer.host private[this] var _referDomain: String = _ def setReferDomain (value: String): Unit = {
_referDomain = value } def getReferDomain = {
_referDomain }}object ClickLogBean{
//定义点击流日志解析规则 val getLogFormat: String = "%u %h %l %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" //解析字符串转换成对象 def apply(parser:HttpdLoglineParser[ClickLogBean], clickLog:String): ClickLogBean ={
val clickLogBean = new ClickLogBean parser.parse(clickLogBean, clickLog) clickLogBean } //创建点击流日志解析规则 def createClickLogParser() ={
val parser = new HttpdLoglineParser[ClickLogBean](classOf[ClickLogBean], getLogFormat) parser.addTypeRemapping("request.firstline.uri.query.g", "HTTP.URI") parser.addTypeRemapping("request.firstline.uri.query.r", "HTTP.URI") parser.addParseTarget("setConnectionClientUser", "STRING:connection.client.user") parser.addParseTarget("setIp", "IP:connection.client.host") parser.addParseTarget("setRequestTime", "TIME.STAMP:request.receive.time") parser.addParseTarget("setMethod", "HTTP.METHOD:request.firstline.method") parser.addParseTarget("setResolution", "HTTP.URI:request.firstline.uri") parser.addParseTarget("setRequestProtocol", "HTTP.PROTOCOL_VERSION:request.firstline.protocol") parser.addParseTarget("setResponseBodyBytes", "BYTES:response.body.bytes") parser.addParseTarget("setReferer", "HTTP.URI:request.referer") parser.addParseTarget("setUserAgent", "HTTP.USERAGENT:request.user-agent") parser.addParseTarget("setReferDomain", "HTTP.HOST:request.referer.host") //返回点击流日志解析规则 parser } def main(args: Array[String]): Unit = {
val logline = "2001:980:91c0:1:8d31:a232:25e5:85d 222.68.172.190 - [05/Sep/2010:11:27:50 +0200] \"GET /images/my.jpg HTTP/1.1\" 404 23617 \"http://www.angularjs.cn/A00n\" \"Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_4; nl-nl) AppleWebKit/533.17.8 (KHTML, like Gecko) Version/5.0.1 Safari/533.17.8\"" val record = new ClickLogBean() val parser = createClickLogParser() parser.parse(record, logline) println(record.getConnectionClientUser) println(record.getIp) println(record.getRequestTime) println(record.getMethod) println(record.getResolution) println(record.getRequestProtocol) println(record.getResponseBodyBytes) println(record.getReferer) println(record.getUserAgent) println(record.getReferDomain) }}case class ClickLogWideBean(@BeanProperty uid:String, //用户id信息 @BeanProperty ip:String, //ip地址 @BeanProperty requestTime:String, //请求时间 @BeanProperty requestMethod:String, //请求方式 @BeanProperty requestUrl:String, //请求地址 @BeanProperty requestProtocol:String, //请求协议 @BeanProperty responseStatus:Int, //响应码 @BeanProperty responseBodyBytes:String,//返回的数据流量 @BeanProperty referrer:String, //访客的来源url @BeanProperty userAgent:String, //客户端代理信息 @BeanProperty referDomain: String, //跳转过来页面的域名:HTTP.HOST:request.referer.host var province: String, //ip所对应的省份 var city: String, //ip所对应的城市 var timestamp:Long //时间戳 )object ClickLogWideBean {
def apply(clickLogBean: ClickLogBean): ClickLogWideBean = {
ClickLogWideBean( clickLogBean.getConnectionClientUser, clickLogBean.getIp, "", //DateUtil.datetime2date(clickLogBean.getRequestTime), clickLogBean.getMethod, clickLogBean.getResolution, clickLogBean.getRequestProtocol, clickLogBean.getRequestStatus, clickLogBean.getResponseBodyBytes, clickLogBean.getReferer, clickLogBean.getUserAgent, clickLogBean.getReferDomain, "", "", 0) }}

转载地址:https://blog.csdn.net/yangshengwei230612/article/details/116403157 如侵犯您的版权,请留言回复原文章的地址,我们会给您删除此文章,给您带来不便请您谅解!

上一篇:scala样例类与普通类区别
下一篇:离线--千亿级数仓项目-黑马

发表评论

最新留言

感谢大佬
[***.8.128.20]2024年04月07日 03时51分38秒

关于作者

    喝酒易醉,品茶养心,人生如梦,品茶悟道,何以解忧?唯有杜康!
-- 愿君每日到此一游!

推荐文章