发布日期：2021-05-15 07:11:55 浏览次数：22 分类：精选文章

本文共 3684 字，大约阅读时间需要 12 分钟。

Hive Query SQL Guide

Using Regular Expressions to Specify Columns

In Hive, you can specify columns using regular expressions. For example:

select age, `name.*` from test_stu_3;

If you encounter errors, ensure Hive支持的参数设置如下：

set hive.support.quoted_identifiers=none;

Arithmetic Operators

Hive supports all typical arithmetic operators. Below is a quick reference:

Operator	Type	Description
A + B	Number	Sum of A and B
A - B	Number	Difference of A and B
A * B	Number	Product of A and B
A / B	Number	Division of A by B
A % B	Number	Remainder of A divides B
A & B	Number	Bitwise AND of A and B
A	B	Number	Bitwise OR of A and B
A ^ B	Number	Bitwise XOR of A and B
~A	Number	Bitwise NOT of A

Arithmetic operations accept any numeric type. If the data types differ, the type with the narrower range is cast to the wider range (e.g., INT to BIGINT). Hive does not automatically convert narrower types to wider ones during arithmetic operations.

Mathematical Functions

Hive provides various mathematical functions for data analysis. For instance:

floor(), round(), and ceil() accept DOUBLE values and return BIGINT.

Use these functions for type conversion instead of type casting operations.

Aggregate Functions

Aggregate functions operate on multiple rows to produce a single result. Two common examples are COUNT() and AVG().

Table-Generating Functions

Hive provides functions that expand single columns into multiple columns or rows. For example:

explode(): Converts a string column into multiple rows.

posexplode(): Similar to explode(), but requries a specific format.

When to Avoid MapReduce

In Hive, MapReduce tasks are typically triggered for most queries. However, the following scenarios do not require it:

Selecting all columns without WHERE conditions.

WHERE conditions only involve partition columns, regardless of LIMIT.

You可以通过设置如下参数优化性能：

set hive.exec.mode.local.auto=true;

Predicates Operators

Like Operator

The LIKE operator matches strings using specific patterns. The syntax is A LIKE B, where B can use % and _ for wildcards. The comparison is character-by-character, so mismatched characters result in FALSE.

Example	Result
'abcde' LIKE 'abc'	FALSE
'abcde' LIKE 'abc__'	TRUE
'abcde' LIKE 'abc%'	TRUE
'abcde' LIKE '%abc%'	TRUE
'abcde' LIKE 'bc%'	FALSE
'abcde' LIKE '_bc%'	TRUE
'abcde' LIKE '_b%'	TRUE

RLike Operator

The RLIKE operator checks if a regex pattern exists within a string. The syntax is A RLIKE B, where B is a valid regex.

Example	Result
'foobar' RLIKE 'oo%'	FALSE
'foobar' RLIKE 'foo'	TRUE
'foobar' RLIKE '.oo.*'	TRUE

Comparing Floating-Point Numbers

Comparing floats in Hive can lead to precision issues due to how numbers are stored in memory. For example, 0.2 cannot be represented exactly as a floating-point number. This can cause unexpected results, like:

SELECT * FROM employees
WHERE deductions['FederalTaxes'] >= 0.2;

might return rows where FederalTaxes is 0.2 due to precision differences.

To avoid issues:

Store relevant columns as DOUBLE.

Use explicit type casting when necessary:

SELECT cast(FederalTaxes as FLOAT) >= 0.2 AS boolean;

Performance Optimization

To avoid MapReduce tasks:

Use SELECT * on tables without filters.

Use partition columns in WHERE clauses.

Set hive.exec.mode.local.auto=true; for local execution mode.

Examples of MapReduce Scenarios

Queries without partition filters.

Combinations of filters on non-partition columns with LIMIT.

Conclusion

Hive provides powerful SQL capabilities, including support for regular expressions, arithmetic operations, and various functions. By understanding these features and optimizing query execution, you can maximize efficiency and avoid unnecessary MapReduce tasks.

上一篇：hadoop格式化namenode后导致datanode无法启动

下一篇：星型模型&&雪花模型

发表评论

关于作者

喝酒易醉，品茶养心，人生如梦，品茶悟道，何以解忧？唯有杜康！

-- 愿君每日到此一游！